The significance level produced by a standard LR test isn't correct (that is, the test statistic isn't asymptotically chi-squared) if there are some series in each block which have unit roots. The result in Sims, Stock and Watson can be understood as meaning that hypotheses which don't somehow restrict the unit root behavior of the variables have standard asymptotics, while those that do have non-stationary asymptotics. Thus, the lag length tests are OK (at least as long as you aren't trying to test 1 vs 0), since as long as you have 1 lag, the unit root properties can still be expressed. An exogeneity hypothesis isn't OK, since the series can't be cointegrated if one is in an exogenous block---the zero restrictions affect possible unit root behavior.
The true asymptotics would have to be evaluated on a case by case basis, so bootstrapping is a better approach. I've attached an example of bootstrapping a two variable causality test, which can be extended fairly easily to a block test.
(This uses the same data set and same hypothesis as the causal.prg example from the manual).
Note that there is a "non-fix" to the problem in
Toda, H.Y., Yamamoto, T., 1995. "Statistical inference in vector autoregression with possibly integrated processes." Journal of Econometrics 66, 225-250.
In short, it tests for causality by adding lags to a VAR (to allow for possible unit roots/cointegration) and then tests zero restrictions which don't include those added lags. Rather than fixing the Sims-Stock-Watson problem, it actually confirms their results that most coefficients and linear combinations thereof are asymptotically normal, but that certain restrictions which eliminate channels of influence for unit roots aren't.
By adding extra lags and then not testing them, the "bad" behavior is shifted onto the untested lags. In effect, you're no longer testing lack of causality, since that requires testing all the lags. While their test statistic has the correct distribution under the null (which the SSW results would predict), it will suffer badly from lack of power since, if the block of coefficients aren't, in fact, zero, the "causality" will get shifted fairly easily to the untested lag(s) since an integrated process is so highly autocorrelated.