Reading Panel Data Set

For questions and discussion related to reading in and working with data.
TomDoan
Posts: 7814
Joined: Wed Nov 01, 2006 4:36 pm

Re: Reading Panel Data Set

Unread post by TomDoan »

Use a SMPL option on the values. For instance, the grn7p211.rpf example (Greene textbook) has a scatter which doesn't include the most extreme values

scatter(overlay=line,smpl=(output<25000)) 2
# output c
# output chat
scatter(smpl=(output<25000))
# output fq
bok1234
Posts: 86
Joined: Tue Jan 11, 2011 8:14 pm

Re: Reading Panel Data Set

Unread post by bok1234 »

I see. By the way,' smpl=(output<25000)' is the only way to truncate some extreme data value?
I mean, is there another methods to truncate extreme data by the standands of specific percentiles of upper or lower bounds, such as 'smpl=(output<95%.and.output>5%)''?
'smpl=(output<95%.and.output>5%)' means truncating data more than upper 5% and lower 5% of whole data set distribution, I just imagined.
TomDoan
Posts: 7814
Joined: Wed Nov 01, 2006 4:36 pm

Re: Reading Panel Data Set

Unread post by TomDoan »

No. You need to figure out what the actual limits are. (STATS(FRACTILES) can give you the 5% and 95% values). Given the really odd values that you had, an arbitrary 5% or 95% might not really help.
bok1234
Posts: 86
Joined: Tue Jan 11, 2011 8:14 pm

Re: Reading Panel Data Set

Unread post by bok1234 »

Thank you, Mr.Doan. Let me ask you one more 'panel data reading problem'.
I am trying to read 2 variables as below - CVR_MUL and IPC.
The former begins in year 2011 and ends in year 2017 and the latter begins in year 2010 and ends in year 2016. I confirmed data reading with the command of 'print(dates)', and found that two series begins and ends in same years - this means data reading error. Please let me know how to fix it except inputing 'blank data' of year 2010 in CVR_MUL sheet and of year 2017 in IPC sheet.

OPEN DATA "F:\CVR_MUL.xls"
CALENDAR(PANELOBS=7,A) 2011:1
DATA(FORMAT=XLS,NOLABELS,ORG=MULTIPLEROWS,TOP=2,LEFT=3) $
1//2011:01 16//2017:01 CVR_MUL

OPEN DATA "F:\IPC.xls"
CALENDAR(PANELOBS=7,A) 2010:1
DATA(FORMAT=XLS,NOLABELS,ORG=MULTIPLEROWS,TOP=2,LEFT=2) $
1//2010:01 16//2016:01 IPC


print(dates) 1//2010:01 16//2017:01 $
CVR_MUL IPC
Attachments
IPC.xls
(24.5 KiB) Downloaded 1406 times
CVR_MUL.xls
(25 KiB) Downloaded 1393 times
TomDoan
Posts: 7814
Joined: Wed Nov 01, 2006 4:36 pm

Re: Reading Panel Data Set

Unread post by TomDoan »

Changing the CALENDAR's won't do what you want---that will simply change the interpretation of the original data. Because the data files don't have usable dates as you have them formatted, you need to use the new CALENDAR option on DATA to get them interpreted correctly.

CALENDAR(PANELOBS=7,A,save=cvr_cal) 2011:1
CALENDAR(PANELOBS=7,A,save=ipc_cal) 2010:1
*
cal(panelobs=8) 2010:1
OPEN DATA "F:\CVR_MUL.xls"
DATA(FORMAT=XLS,NOLABELS,ORG=MULTIPLEROWS,TOP=2,LEFT=3,cal=cvr_cal) / CVR_MUL
OPEN DATA "F:\IPC.xls"
DATA(FORMAT=XLS,NOLABELS,ORG=MULTIPLEROWS,TOP=2,LEFT=3,cal=ipc_cal) / IPC

Alternatively, you can read them in separately and save into a RATS format file.

If you only need the overlapping range, you can use the LEFT and RIGHT options to exclude the columns on the file that aren't available in the other file.
bok1234
Posts: 86
Joined: Tue Jan 11, 2011 8:14 pm

Re: Reading Panel Data Set

Unread post by bok1234 »

Thank you, Mr.Doan. I also referred to a section of 'Saving/Recalling Calendars' in calendar option.
Your last explanation begins with the code lines as below.

CALENDAR(PANELOBS=7,A,save=cvr_cal) 2011:1
CALENDAR(PANELOBS=7,A,save=ipc_cal) 2010:1

So, you meant that there is no problem to declare new or derived data series names such as 'cvr_cal' and 'ipc_cal' before their original series, 'cvr_mul' and 'ipc' come out?
TomDoan
Posts: 7814
Joined: Wed Nov 01, 2006 4:36 pm

Re: Reading Panel Data Set

Unread post by TomDoan »

It's not a series name. It's the name of an array which is saving the calendar information.
akk
Posts: 4
Joined: Mon Aug 30, 2021 7:46 am

Re: Reading Panel Data Set

Unread post by akk »

I am trying to read in panel data set that is monthly for each of the 50 states. It is organized by time and I realize I need to use the pform command to flip things around, but I can't get the program to read all the data to get started.
Attachments
Copyofstatedata.xlsx
here are the data, trying to use the first sheet
(5.09 MiB) Downloaded 1105 times
TomDoan
Posts: 7814
Joined: Wed Nov 01, 2006 4:36 pm

Re: Reading Panel Data Set

Unread post by TomDoan »

It looks like you need to skip the first two columns (dates and state labels). The rest should work OK:
OPEN DATA "Copyofstatedata.xlsx"
DATA(FORMAT=XLSX,ORG=COLUMNS,LEFT=3) 1 26150 statefip mm yyyy yyyymm emppop lfpr ur
dofor i = statefip mm yyyy yyyymm emppop lfpr ur
   pform(block=50,transpose) i
   # i
end dofor i
cal(panelobs=26150/50,m) 1978:1
akk
Posts: 4
Joined: Mon Aug 30, 2021 7:46 am

Re: Reading Panel Data Set

Unread post by akk »

thanks I would not have understood that the loop was needed.

is there an easy way to make the statefip be associated with labels showing the states?
Post Reply