You have only 98 usable data points (after allowing for lags) which isn't much to estimate a model that size. Also, the GARCH effects are fairly weak. If you do a two-step model (taking the residuals from the VAR and running them into GARCH), you get estimates of the GARCH process which are the same for any ordering:
- Code: Select all
MV-GARCH, BEKK - Estimation by BFGS
Convergence in 109 Iterations. Final criterion was 0.0000000 <= 0.0000100
Weekly Data From 2007:11:06 To 2009:09:15
Usable Observations 98
Log Likelihood -621.7316
Variable Coeff Std Error T-Stat Signif
*************************************************************************************
1. Mean(1) 0.835531639 0.403860566 2.06886 0.03855907
2. Mean(2) 0.059098946 0.073204691 0.80731 0.41948738
3. Mean(3) -0.142325189 0.165264568 -0.86120 0.38913009
4. C(1,1) 0.766341717 0.682495052 1.12285 0.26149987
5. C(2,1) 0.344659200 0.169706041 2.03092 0.04226323
6. C(2,2) -0.000030286 0.376768379 -8.03833e-005 0.99993586
7. C(3,1) 1.409773511 0.355715978 3.96320 0.00007395
8. C(3,2) -0.000317812 3.958575791 -8.02844e-005 0.99993594
9. C(3,3) -0.000016607 1.388019400 -1.19648e-005 0.99999045
10. A(1,1) -0.079537148 0.099145659 -0.80223 0.42242269
11. A(1,2) -0.020380544 0.017248781 -1.18156 0.23737863
12. A(1,3) -0.255017109 0.051558823 -4.94614 0.00000076
13. A(2,1) 1.441254401 0.522705545 2.75730 0.00582814
14. A(2,2) 0.340559037 0.100593507 3.38550 0.00071049
15. A(2,3) 0.558354833 0.399864593 1.39636 0.16260618
16. A(3,1) 0.543279489 0.272889715 1.99084 0.04649859
17. A(3,2) -0.009838266 0.064052362 -0.15360 0.87792731
18. A(3,3) 0.982834823 0.163665017 6.00516 0.00000000
19. B(1,1) 0.957194988 0.041770509 22.91557 0.00000000
20. B(1,2) 0.025273016 0.013543218 1.86610 0.06202720
21. B(1,3) 0.030861707 0.096013411 0.32143 0.74788363
22. B(2,1) -0.411697972 0.448809555 -0.91731 0.35897966
23. B(2,2) 0.720456425 0.121676243 5.92109 0.00000000
24. B(2,3) -0.954708769 0.541083822 -1.76444 0.07765829
25. B(3,1) -0.064537177 0.307847881 -0.20964 0.83394879
26. B(3,2) -0.123061212 0.049982294 -2.46210 0.01381276
27. B(3,3) -0.050014359 0.154369909 -0.32399 0.74594535
However, this is on a boundary for the constant matrix; it's basically just a rank one matrix. Presumable, the global maximum would have a non-positive definite constant in the variance equation. With a small data set, it's quite possible that the requirement that all the data points can generate a p.d. covariance matrix without a p.s.d. constant. When you estimate the full model, with both the mean and the variance terms together, there are multiple modes, and the progression generated by the simplex iterations (which are sensitive to the order of parameters) manage to locate different modes for different orders.