
Oftentimes, data for an ICPSR study will be available only as a large ASCII text file that looks something like this. These files, which we will often refer to as "raw" data files, are in a "State of Nature" and are, in their present form, unusable by a statistics package such as Stata. Thankfully, however, an increasing number of ICPSR studies have a "setup" file (also referred to here as a .do file) and a "dictionary" that together read the raw data into something Stata can make sense of and use for analysis - a Leviathan, if you will. This page will walk you through the process of how to use and edit Stata setup files from the ICPSR to extract oneself from the "State of Nature" and create usable Stata data files.
Start by downloading the data file, Stata .do file, and Stata dictionary file from the ICPSR website for the study you want to
download. Here, we are using
ICPSR #4262:

Click here for full-size image
Once you access the dataset you want to download, click on the "Download Data" tab. You will be taken to the ICPSR MyData Login page. All ICPSR users are required to have accounts if they wish to download data, so you will need to set one up if you do not have one already. Enter
your email address and password and, once you have been authenticated, you will see something like the following
screen:

Click here for full-size image
The ICPSR presents users with multiple options for downloading data. Generally speaking, the ICPSR will have files available for
different statistical packages. Here, for instance, there are ASCII data and "setup" files available for SAS, SPSS, and Stata as
well as a SAS "transport" file, an SPSS portable file, and a Stata "system" file. [You can get additional detail about the files
available for a study by reading the file manifest that is available on the "Description" page.] You can download just those files
for a particular program (e.g. "ASCII Data File and Stata Setup Files" or "Stata System") or download all the available files for
a particular study. Whichever files you select will then be added into your "data cart" for download. If you go the data-cart
route, you will be downloading a zipped archive of whatever files you chose. Alternately, you can click on the "download individual
files" link at the right and download files one at a time. For our purposes, we only want the raw ASCII data file and the Stata
setup and dictionary files, so we will choose this last route. Click on the "download individual files" link and you will be taken
to this page:

Click here for full-size image
If you scroll down you will see that an ASCII data file, a Stata dictionary, and a Stata setup file are available
and ready to download:

Click here for full-size image
Now we can save the ASCII data file, the Stata dictionary, and the Stata setup file. First, click on the "Data" link and choose
"Save Target As...":

Click here for full-size image
Save the data in the desired location in your computer.
For our example here, we're saving all our files to
"C:\patrons\oelgun\STATA_and_SPSS_tutorials\ICPSR_4262_data\4262".
Be sure to add ".txt" to the end of the filename - the
"Save as Type" field should be "Text Document".
Here, we'll save the data file as "st4262-Data.txt":

Click here for full-size image
[Note that in this tutorial, we are not using the default file names (e.g. 04262-0001-Data.txt) that the ICPSR assigns. Instead, we are using an older ICPSR convention for naming files, mainly because the resulting file names are shorter. How you wish to name the files is up to you - just be careful about what file extension you specify.]
Next, we need to save the "Stata setup" file, which is the Stata .do file that we will use to move the raw ICPSR data into Stata.
Save the .do file by right-clicking on the "Stata setup" link and saving it to your hard drive. Here, we will name the file
"st4262-Setup.do." Be sure to add ".do" to the end of the filename; otherwise, Stata will not recognize the file as a Stata
program file. So, the "Save as type" field should be either "All Files" or "Stata Do-file" (either option will
work - just make
sure you attach the .do extension).

Click here for full-size image
Finally, we need to save the "Stata dictionary" file. We will need this file in addition to the .do file we have already
downloaded, to convert the raw ICPSR data into a Stata file. Save the Stata dictionary by right-clicking on the "Stata dictionary"
link and saving it to your hard drive. Here, we will name the file "st4262-dictionary.dct." Be sure to add ".dct" to the end of
the filename; otherwise, Stata may not recognize the file as a Stata dictionary. So, the "Save as type" field should be either
"All Files" or "Stata Dictionary" (either option will work - just make sure you attach the .dct extension):
Click here for full-size image
Click here for full-size image
[Note - for reference, Stata program files for executing commands have a .do extension, while Stata data files have a .dta extension. Stata dictionary files have a .dct extension. Stata log files, which contain tabular output from Stata commands and procedures and records of commands executed during a Stata session, can have either a .log extension or a .smcl (Stata Mark-Up Control Language) extension.]
It is probably worth spending a bit of time here discussing what a Stata data dictionary is. As stated above, dictionary files contain information about variables in raw data files that Stata uses when reading in those files. The dictionary provides Stata with information about the structure of the data file and the variables within it - e.g. the logical record length of the file, the location of each variable in the file and that variable's name, and so on.
Dictionary files can be created, modified, or viewed with a text editor. To view a dictionary, you can simply right-click on
its icon, choose "Open With" from the menu that appears, and choose a program with which to open the file. In our example here,
WordPad is listed as one of the options for viewing the file, and it's the option we'll choose:

Click here for full-size image
[Note that your computer may prompt you to search for a particular program to view the dictionary if it does not recognize the file type.]
Once we open up our dictionary, we see something like the following::

Click here for full-size image
We will not get into exacting technical detail about the contents of this dictionary, or of Stata data dictionaries in general - that information is available in Stata's reference manuals, such as the Data Management manual for Stata 9 in the sections for -infix- and -infile-. To summarize the contents of the dictionary, it is telling Stata the following information: the # of rows in the data file for each respondent; the location of the first line of actual data in the file; the column(s) in the file in which each variable is located; the format of each variable; the name of each variable; the value label associated with each variable; the display format for each variable; and the label for each variable. All of this information is used by Stata when it is digesting the raw data file and converting into a Stata format.
[If you want more information about the contents of Stata dictionaries, you can also read Stata's help pages for the -infile- command and the -infix- command. UCLA's Academic Technology Services help site for Stata also includes a page reviewing the format and layout of Stata dictionaries.]
Stata .do files, meanwhile, are Stata program files that you can create or modify to execute Stata commands for processing and
analyzing data and are similar to .sps files in SPSS and .sas files in SAS. The .do files that the ICPSR provides contain Stata
commands for functions such as actually reading in the raw data, processing it using the information in the data dictionary, defining
value labels, and identifying codes for missing values. If a .do file is under 130K in size, you can view it via Stata's internal
"Do-File Editor," which you can open up by typing "doedit" in
Stata's command window:

Click here for full-size image
The Do-File Editor will then pop up as a new window, and you can click on the "File" menu and choose "Open" to view a .do file:
Click here for full-size image
Click here for full-size image
The .do file that we are working with here, however, is 231K in size and
cannot be opened by the Do-File Editor. Thus, we
will have
to use a text editor instead. To open up our .do file, we will right-click on it, choose "Open With" from the menu that appears, and
choose a program with which to open the file. In our example here, WordPad is listed as one of the options for viewing the file, and
it's the option we'll choose:

Click here for full-size image
[Note that your computer may prompt you to search for a particular program to view the dictionary if it does not recognize the file type.]
The first thing we'll do with our .do file is edit it so that it will create a Stata log that will record the commands and command
output of your Stata session. While this step is not necessary as such, we still strongly recommend it, if for no other reason than
to help you figure out any syntax errors in your .do file when you run it. You can create the log with the
-log- command,
which will then tell Stata what your log's name will be, where it will be created
and saved, and what format it will be in (e.g. as a text file with a .log extension or as .smcl file). Generally, log commands look
something like this:
log using "name-and-location-and-type-of-logfile"
EXAMPLE: log using "C:\patrons\oelgun\STATA_and_SPSS_tutorials\ICPSR_4262_data\4262\Statalog.smcl"
In our example here, we're creating a log called "Statalog.smcl" and saving it in the same location as all our other files
(C:\patrons\oelgun\STATA_and_SPSS_tutorials\ICPSR_4262_data\4262). We'll edit our .do file to start logging our commands and output
right from the very beginning, so our the file should look something like this:

Click here for full-size image
Just as you need to edit an SPSS syntax file or SAS program file from the ICPSR to get it to read in raw data and
create a new file, you need to edit a Stata .do file to tell it the name and location of both your raw data file and
the new Stata file you'll be creating. You will also need to tell it the name and location of the data dictionary
Stata uses to make heads or tails of the raw data. If we examine the .do file closely, we'll actually see
instructions about these very matters:

Click here for full-size image
Note that the .do file tells us to "change directories" to the location where all our files are located. By default,
Stata will generally look for files in a directory called "DATA" on your hard drive (where this directory is located
will vary with how Stata was installed on your system). Since our files are not located in this directory, we'll
need to tell Stata where our files are located, which we can accomplish via the
-cd-
command. This command, which should be familiar to Linux and Unix fans, generally looks like the following:
cd "directory-location-and-name"
EXAMPLE: cd "C:\patrons\oelgun\STATA_and_SPSS_tutorials\ICPSR_4262_data\4262"
You can execute this command by either typing it into your .do file or into Stata's command window - either means will
produce the same result, which should be something like the following:

Click here for full-size image
If you are a veteran Stata user, you will probably know that (1) Stata allocates 1.00 MB of memory for data as a
default and that (2) this is often not sufficient for data files. The ICPSR has already taken this into account,
however; as we can see here, the .do file we have here has
-set-
Stata's
-memory-
to 9 MB in order to process the data at hand:

Click here for full-size image
You may also note that the ICPSR has written the .do file so that the
-more-
setting is "off."
This option concerns Stata's output viewer. By default, the output viewer will pause when the size of the output
is large so that you can view the output in sections and browse through it at your own pace via the space key
instead of having it all race by in a blur. Here, we have created a log file so that we can see each command and
the associated output. In order to allow the .do file to run in one step without any pauses, we'll stick with the
ICPSR's default setting and keep the more option "off:"

Click here for full-size image
Now the real fun begins - we can edit the .do file to tell it where our raw data are, where our dictionary
is, and where Stata should save the data that it produces. If we look at our .do file, we'll see that the ICPSR has
already provided us with some instructions about what we need to do here:

Click here for full-size image
We won't spend much time here discussing "macros" in Stata; interested parties are advised to consult Stata's technical documentation (e.g. the Programming manual for Stata 9) or proceedings from the various Stata users' groups such as this article on programming and macros. In this context, you can think of a macro as being like an alias that refers to a particular file (similar in some ways to SPSS file handles or SAS libnames and filenames). Once you've created a macro and told Stata what file the macro refers to, Stata will look for that file whenever it encounters a reference to that macro (again, we are simplifying quite a bit here). Once the .do file has finished running, that macro will then be cleared from Stata's memory and cast upon the ash heap of Stata macro history.
For our file here, there are three macros we need to assign - raw_data, dict, and outfile, which will respectively refer to the raw data file, the data dictionary, and the actual Stata data file we'll be creating. So, we need to edit our .do file so that the macros refer to the correct files. Note the instructions that the ICPSR has provided us - if all our files are in the same location and we have told Stata to change directories to that location, then all we need to do is refer to the names of the files. If, however, our files are in different locations and/or we haven't told Stata to change directories, then we also need to tell Stata where those files are located by including their full path names in the macros.
An example will hopefully clarify this somewhat abstract concept. In our .do file, the syntax for assigning the
macro for the raw data file reads as follows:
local raw_data "data-filename"
To assign the macro correctly, we have two choices. If we used the
-cd-
command to change directories to where all our files are located, we would edit the syntax so that it reads as
follows:
local raw_data "da4262-Data.txt"
We would not need to specify the full file path, because we've changed directories to where our files are located
("C:\patrons\oelgun\STATA_and_SPSS_tutorials\ICPSR_4262_data\4262") and Stata will look for a file called
"da4262-Data.txt" in that directory. If, however, we haven't changed directories to where our raw data file is
located, the syntax for assigning the macro has to read as follows:
local raw_data "C:\patrons\oelgun\STATA_and_SPSS_tutorials\ICPSR_4262 data\4262\da4262-Data.txt"
Then, Stata will know where to look for our raw data file. Editing the syntax for the dict
macro and the outfile macro is the same as above - we need to edit the syntax to point to the correct
file and, if necessary, to that file's location on our computer. In our file here, we'll edit the syntax for each
macro so that it includes the full file path. When we're done editing, our .do file will look as follows:

Click here for full-size image
As we scroll down the .do file, we'll encounter some text and syntax referring to Stata's
-infile-
command, which is the command that will actually read in our raw data file using the data dictionary to identify and
format the variables We can see here that the syntax uses macros we've just assigned, one for the dictionary and
one for the raw data file:

Click here for full-size image
So, when Stata is processing and executing this command, it will look for a file called "st4262-Dictionary.dct" located in "C:\patrons\oelgun\STATA_and_SPSS_tutorials\ICPSR_4262 data\4262" when it sees the macro dict (note the characters that enclose the macro name in the .do file - they tell Stata that dict is a macro specifically). Likewise, Stata will look for a called "da4262-Data.txt" located at "C:\patrons\oelgun\STATA_and_SPSS_tutorials\ICPSR_4262 data\4262" when it sees the macro raw_data. This section of our syntax file is thus fine as is - it does not need to be edited.
As we continue to scroll down our .do file, we will encounter a section that uses Stata's
-label-
command to define descriptive value labels for some of the variables in our data file and to provide a descriptive
label for the data file itself:

Click here for full-size image
label data attaches the label "Afrobarometer: Round I Survey of Malawi, November-December 1999, Dataset 0001" to the Stata file we're creating, and that label will be displayed whenever we -describe- our file. label define, meanwhile, creates value labels that will provide descriptive information about the values for particular variables in our file. For instance, the .do file creates a collection of labels called Q1 that associates text with particular values - e.g. it attaches the labels "Very dissatisfied", "Dissatisfied", "Neither satisfied nor dissatisfied", "Satisfied", "Very satisfied", and "Dont Know" to the values 1.00, 2.00, 3.00, 4.00, 5.00, and 6.00, respectively. Our data dictionary will then tell Stata to attach this collection of labels to variable Q1 in our data file. Labeling a data file is not necessary, but it can be useful in distinguishing files from each other if you're working with multiple datasets. Value labels aren't necessary either, but they can make frequency distributions and crosstabulations easier to understand. So, we'll go ahead and leave this section in our file. We do not, as the ICPSR points out to us, need to edit this portion of the .do file - it is fine as is.
You may be wondering about the
-#delimit-
syntax that appears in the .do file:

Click here for full-size image
By default, Stata uses a carriage return to figure out whether it has reached the end of one command and can move on
to another.
-#delimit-
can change this default setting and let you choose between a carriage return or a semi-colon
as the delimiter that Stata uses to figure out when a command has ended. If you have a line of syntax that is
very long, it may make your .do files difficult to read and understand. However, you can use
-#delimit-
to change the delimiter from a carriage return to a semi-colon via the syntax "#delimit ;". You can then break your
syntax into multiple lines and end it with a ";" so that Stata will not get confused as to where the command ends
and your syntax will be more intelligible:

Click here for full-size image
Note that you can toggle back and forth between delimiters in your .do file. Our ICPSR .do file, for instance,
switches back to the default carriage-return delimiter once the section defining value labels is over via the
syntax "#delimit cr":

Click here for full-size image
From here on, Stata will go back to using carriage returns to determine when a command ends.
[Note - with many .do file for ICPSR datasets, there will also be a section setting codes for missing values within the data. By default, this section is often commented out so that Stata will not execute the relevant commands when you run the .do file, and you will usually see this section enclosed by /* and */. If you want Stata to execute this set of commands, simply remove the /* and */ and the commands will run along with the rest of the contents of your .do file.]
As we continue to scroll down our .do file and our
Long March nears an end, we encounter a section
for actually saving the Stata file we've created:

Click here for full-size image
Here, the
-save-
command will save our newly-created Stata file with a name and at a location that we choose. Note that the syntax
here refers to the outfile macro we assigned earlier:
save `outfile', replace
Thus, Stata will save a Stata data file named "st4262-Data.dta" at the location
"C:\patrons\oelgun\STATA_and_SPSS_tutorials\ICPSR_4262 data\4262" and all will be well.
Now that we finished editing the .do file, we can run it to create and save our data in a Stata format. If
your file is small enough that you can open and edit it in Stata's Do-File Editor, you can simply run it from the
Editor itself by going to the "Tools" menu and clicking on the "Run" option:

Click here for full-size image
In our case, however, our .do file is too large for the Do-File Editor (remember that the Do-File Editor cannot
read a .do file larger than 130K in size and that our .do file is over 230K). In situations like this,
you can run
a .do file as follows: (1) click on the "File" menu; (2) choose the "Do..." option; (3) locate and highlight
your .do file; and (4) click on the "Open" button:
Click here for full-size image
Click here for full-size image
Once we click the "Open" button, the .do file will be executed. At the end of this process, we will have the data file
in Stata format, saved in the location that we have specified in the do file:
Click here for full-size image
Click here for full-size image