How to deal with zeros in data?
Posted: Mon Jan 05, 2015 2:42 pm
Hi Tom,
I've posted before, but I still get questions about how to deal with 0s in the data set.
My research aims to figure out the influence of the introduction of short selling on feedback trading and volatility in Chinese stock market. As can be seen in the excel document, there are many zeros in the time series of each stocks' log return. Specifically, for those zeros that are in a row for all object stocks , it is because China sometimes has short-term national holidays.While for successive zeros in a column of individual stocks, it could be explained as a period of stock suspension due to the company's own operation problem. So both of these two types of zeros actually means non-trading days.
As you suggested, I've tried to delete the first type of zeros, those 4 or 5 zeros in a row, but the results seem still weird. There is always a no convergence problem. Could you please help me to have a look?
The input codes are as follows:
OPEN DATA "C:\Users\aixia\Desktop\51 Return_xls.xls"
CALENDAR(D) 2006:04:03
DATA(FORMAT=XLS,ORG=COLUMNS,RIGHT=2) 2006:04:03 2014:03:31 PUDONG
*
set r1 = PUDONG
*
nonlin b0 b1 b2 b3 a0 a1 a2 a3 d
compute d=1.0
stat(NOPRINT) r1
compute start = 2
compute end = 1943
**
set v = %variance
set u = 0.0
frml et = r1-b0-b1*v-(b2+b3*v)*r1{1}
frml ht = a0+a1*u{1}**2+a2*v{1}+%if(u{1}<0.0, a3*u{1}**2, 0.0)
***Using GED density
frml Lt = (v(t)=ht(t)), (u(t)=et(t)), $
log(.5)+log(d)+.5*%lngamma(3/d)-1.5*%lngamma(1/d)-.5*log(v)- $
exp((d/2.0)*(%lngamma(3/d)-%lngamma(1/d)))*(abs(u/sqrt(v)))**d
linreg(noprint) r1; # constant r1{1}
compute b0=%beta(1), b1=0.0, b2=%beta(2), b3=0.0
compute a0=%seesq, a1=.09781, a2=.83756, a3=0.0
nlpar(subiter=250)
maximize(method=simplex,recursive,iterations=6) Lt 2 *
maximize(method=bfgs,robust,recursive,iter=500) Lt 2 *
The output is:
MAXIMIZE - Estimation by Simplex
Daily(5) Data From 2006:04:04 To 2014:03:31
Usable Observations 19
Skipped/Missing (from 2085) 2066
Function Value 172.1642
Variable Coeff
**********************************************
1. B0 0.000057147
2. B1 -0.301938042
3. B2 0.014385197
4. B3 -0.055167436
5. A0 0.000193743
6. A1 0.077369570
7. A2 -0.064979929
8. A3 -0.070342733
9. D 0.151616428
MAXIMIZE - Estimation by BFGS
NO CONVERGENCE IN 18 ITERATIONS
LAST CRITERION WAS 0.0000010
ESTIMATION POSSIBLY HAS STALLED OR MACHINE ROUNDOFF IS MAKING FURTHER PROGRESS DIFFICULT
TRY HIGHER SUBITERATIONS LIMIT, TIGHTER CVCRIT, DIFFERENT SETTING FOR EXACTLINE OR ALPHA ON NLPAR
RESTARTING ESTIMATION FROM LAST ESTIMATES OR DIFFERENT INITIAL GUESSES MIGHT ALSO WORK
With Heteroscedasticity/Misspecification Adjusted Standard Errors
Daily(5) Data From 2006:04:04 To 2014:03:31
Usable Observations 19
Skipped/Missing (from 2085) 2066
Function Value 268.2201
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. B0 0.000057 0.000001 40.36660 0.00000000
2. B1 -0.304113 0.009320 -32.62939 0.00000000
3. B2 0.014385 0.000000 0.00000 0.00000000
4. B3 -0.055167 0.000000 0.00000 0.00000000
5. A0 0.000195 0.000016 11.87531 0.00000000
6. A1 -5514.806599 0.170650 -32316.40693 0.00000000
7. A2 -0.043771 0.025546 -1.71343 0.08663372
8. A3 58066.352495 31.382881 1850.25561 0.00000000
9. D 0.108551 0.038620 2.81076 0.00494253
Many thanks,
Aixia
I've posted before, but I still get questions about how to deal with 0s in the data set.
My research aims to figure out the influence of the introduction of short selling on feedback trading and volatility in Chinese stock market. As can be seen in the excel document, there are many zeros in the time series of each stocks' log return. Specifically, for those zeros that are in a row for all object stocks , it is because China sometimes has short-term national holidays.While for successive zeros in a column of individual stocks, it could be explained as a period of stock suspension due to the company's own operation problem. So both of these two types of zeros actually means non-trading days.
As you suggested, I've tried to delete the first type of zeros, those 4 or 5 zeros in a row, but the results seem still weird. There is always a no convergence problem. Could you please help me to have a look?
The input codes are as follows:
OPEN DATA "C:\Users\aixia\Desktop\51 Return_xls.xls"
CALENDAR(D) 2006:04:03
DATA(FORMAT=XLS,ORG=COLUMNS,RIGHT=2) 2006:04:03 2014:03:31 PUDONG
*
set r1 = PUDONG
*
nonlin b0 b1 b2 b3 a0 a1 a2 a3 d
compute d=1.0
stat(NOPRINT) r1
compute start = 2
compute end = 1943
**
set v = %variance
set u = 0.0
frml et = r1-b0-b1*v-(b2+b3*v)*r1{1}
frml ht = a0+a1*u{1}**2+a2*v{1}+%if(u{1}<0.0, a3*u{1}**2, 0.0)
***Using GED density
frml Lt = (v(t)=ht(t)), (u(t)=et(t)), $
log(.5)+log(d)+.5*%lngamma(3/d)-1.5*%lngamma(1/d)-.5*log(v)- $
exp((d/2.0)*(%lngamma(3/d)-%lngamma(1/d)))*(abs(u/sqrt(v)))**d
linreg(noprint) r1; # constant r1{1}
compute b0=%beta(1), b1=0.0, b2=%beta(2), b3=0.0
compute a0=%seesq, a1=.09781, a2=.83756, a3=0.0
nlpar(subiter=250)
maximize(method=simplex,recursive,iterations=6) Lt 2 *
maximize(method=bfgs,robust,recursive,iter=500) Lt 2 *
The output is:
MAXIMIZE - Estimation by Simplex
Daily(5) Data From 2006:04:04 To 2014:03:31
Usable Observations 19
Skipped/Missing (from 2085) 2066
Function Value 172.1642
Variable Coeff
**********************************************
1. B0 0.000057147
2. B1 -0.301938042
3. B2 0.014385197
4. B3 -0.055167436
5. A0 0.000193743
6. A1 0.077369570
7. A2 -0.064979929
8. A3 -0.070342733
9. D 0.151616428
MAXIMIZE - Estimation by BFGS
NO CONVERGENCE IN 18 ITERATIONS
LAST CRITERION WAS 0.0000010
ESTIMATION POSSIBLY HAS STALLED OR MACHINE ROUNDOFF IS MAKING FURTHER PROGRESS DIFFICULT
TRY HIGHER SUBITERATIONS LIMIT, TIGHTER CVCRIT, DIFFERENT SETTING FOR EXACTLINE OR ALPHA ON NLPAR
RESTARTING ESTIMATION FROM LAST ESTIMATES OR DIFFERENT INITIAL GUESSES MIGHT ALSO WORK
With Heteroscedasticity/Misspecification Adjusted Standard Errors
Daily(5) Data From 2006:04:04 To 2014:03:31
Usable Observations 19
Skipped/Missing (from 2085) 2066
Function Value 268.2201
Variable Coeff Std Error T-Stat Signif
************************************************************************************
1. B0 0.000057 0.000001 40.36660 0.00000000
2. B1 -0.304113 0.009320 -32.62939 0.00000000
3. B2 0.014385 0.000000 0.00000 0.00000000
4. B3 -0.055167 0.000000 0.00000 0.00000000
5. A0 0.000195 0.000016 11.87531 0.00000000
6. A1 -5514.806599 0.170650 -32316.40693 0.00000000
7. A2 -0.043771 0.025546 -1.71343 0.08663372
8. A3 58066.352495 31.382881 1850.25561 0.00000000
9. D 0.108551 0.038620 2.81076 0.00494253
Many thanks,
Aixia