Converting SAS Transport/XPORT Files into SAS Data Files

For most of the studies in the ICPSR's archive, the data are distributed in the form of large ASCII files with SAS (or SPSS) syntax files ("setup files," in the ICPSR's terms) that take the ASCII data and turn them into SAS (or SPSS) data files. For a growing number of studies, however, the ICPSR is providing a SAS "transport" file for download. SAS transport files are SAS files that can be used across operating systems (e.g. Windows, UNIX/Linux) - in other words, they are "transportable" across platforms and environments. There are two types of SAS transport files - transport files created via SAS' XPORT engine and transport files created via SAS' CPORT procedure. SAS XPORT files are usable across different platforms and different versions of SAS and are backwards-compatible, but they can contain data only. SAS CPORT files are usable across different platforms and can also contain catalog files for variable formats (i.e. "value labels," in SPSS/Stata terminology), but are not backwards-compatible. [For a general overview of different types of SAS files, see this guide from UNC's Carolina Population Center.] The ICPSR favors the XPORT file format, so that is the format we will focus on here.

To use a SAS transport file from the ICPSR in your version of SAS, you need to "restore" it into a SAS data file appropriate for the OS you are using. The ICPSR provides a supplementary SAS syntax file (with a .sas extension) that will create a catalog file for variable formats. With some simple editing and modification, you can also use this supplementary file to "restore" the transport file into a usable SAS data file complete with value labels and ready for analysis. This guide will walk you through how to edit the supplementary file for these purposes. [Note - If you need help working with a SAS syntax file written for use with an ASCII data file, please see this help guide for getting ICPSR data files into SAS.]

(1) Start by downloading the SAS transport file and supplementary syntax file from the ICPSR website for the study you want to download. Here, we'll use ICPSR #4293 as an example:



Click here for full-size image

Click on the "Download Data" tab. You will be taken to the ICPSR MyData Login page. All ICPSR users are required to have accounts if they wish to download data, so you will need to set one up if you do not have one already. Enter your email address and password and, once you have been authenticated, you will see something like the following screen:



Click here for full-size image

The ICPSR presents users with multiple options for downloading data. Generally speaking, the ICPSR will have files available for different statistical packages. Here, for instance, there are ASCII data and setup files available for SAS, Stata and SPSS as well as an SPSS "portable" file and a SAS transport file and supplementary syntax file. [You can get additional detail about the files available for a study by reading the file manifest that is available on the "Description" page.] You can download just those files for a particular program (e.g. "SAS Transport") or download all the available files for a particular study. Whichever files you select will then be added into your "data cart" for download. If you go the data-cart route, you will be downloading a zipped archive of whatever files you chose. Alternately, you can click on the "download individual files" link at the right and download files one at a time. For our purposes, we only want the SAS transport file and its supplementary syntax file, so we will choose this last route. Click on the "download individual files" link and you will be taken to this page:



Click here for full-size image

If you scroll down, you will see a full list of the files available for your study. And, in our study here, we can see that there is a SAS transport file available and ready to download.



Click here for full-size image

From this page, you can click a link and save the relevant files. First, save the SAS transport file (e.g. right-click on the link to the file, choose "Save Target As..." or "Save Link As..." and choose where you want to save the file). Here, we're saving all our file to "C:\patrons\ICPSR\4293". Be sure that the file has an .xpt extension at the end:



Click here for full-size image

[Note that when we are saving files here, we are not using the default file name (e.g. 04293-0001-Data.xpt) that the ICPSR assigns. Instead, we are using an older ICPSR convention for naming files, mainly because the resulting file names are shorter. How you wish to name the files is up to you - just be careful about what file extension you specify.]

Next, we need to save the supplementary SAS syntax file/setup file. We'll save the syntax file in same folder as the transport file. Make sure that the syntax file has a .sas extension:



Click here for full-size image

(2) Next, open the SAS program and open the SAS supplementary syntax file you just saved in your working directory.



Click here for full-size image

Click here for full-size image

The file will open in SAS' Program Editor window in the bottom right of the SAS interface. In its "raw," unmodifed state, the syntax file should look something like this:



Click here for full-size image

You may notice that some of the text in the file is green. Such text is "commented out," which means that it will be ignored by SAS when it is processing the command file. Generally, such text contain descriptions of the study and/or some instructions for the user. Sometimes, however, the ICPSR will also comment out commands that you may want to run when reading the data into SAS. Text and commands that are commented out are usually enclosed by a /* at the start of the commented-out text and a */ at the end. If you remove these characters, the text should no longer be green.

(3) Now we will start editing the syntax file to "restore" the transport file. Once we make a few small changes to the syntax file, we can run that file and SAS will read the contents of the transport file into a data file complete with value labels. As we scroll down our syntax file, the first set of commands that we encounter should be for SAS' FORMAT procedure, the syntax for which should be in dark blue. So, you should see something like this:



Click here for full-size image

(A) We will start by adding syntax to create libraries in SAS. In SAS, a "library" is essentially a pointer or shortcut to a particular location on your computer. If you do not create a library, the data will be stored in SAS's "work" library and will be purged when your session is over. Creating the library as part of the command file ensures that (a) the resulting dataset will be stored in a place specified by you, the user, and (b) your hard work will not be deleted when you end your SAS session. To create a library, insert the following syntax in the command file just above "PROC FORMAT":

LIBNAME libraryname "file path location";

Example: LIBNAME da4293 "C:\patrons\ICPSR\4293";

The file path in the double quotes should be the location where you want to save your data.

(IMPORTANT - Don't forget to end command lines with a semi-colon. Otherwise, SAS will be very upset with you.)

Next, type LIBNAME transportfilelibraryname XPORT "location and name of transport file";

Example: LIBNAME xpt4293 XPORT "C:\patrons\ICPSR\4293\da4293.xpt";

The file path and file name in the double quotes should be the location and name of your SAS transport file.

These two commands set up two separate libraries. The first is the library into which your finished data will go. In our example here, we will be saving our SAS data file in a library named "da4293" that points to "C:\patrons\ICPSR\4293" on our hard drive. The second library is called "xpt4293" and points to the transport file that you downloaded, which is at "C:\patrons\ICPSR\4293\da4293.xpt". When SAS sees "xpt4293" in our syntax file, it will thus know where to look for the transport file. It is very important to include the XPORT comment in the libname statement for the transport file, by the way, as this comment tells SAS that we are working with a transport file created via SAS' XPORT engine rather than one created via the CPORT procedure. [Once again, SAS transport files that the ICPSR distributes are generally XPORT files rather than CPORT files.]

(B) Next you will use the COPY procedure to copy the contents of the transport file into the library where you want to keep your data file. So, you should edit the syntax file to include the following commands:

PROC COPY in=exportlibraryname out=datalibraryname;
RUN;.

Example:
PROC COPY in=xpt4293 out=da4293;
RUN;

With our example here, SAS will copy the contents of the "xpt4293" library (that is, the contents of the SAS XPORT file that we downloaded from the ICPSR) into the "da4293" library that we defined above. This procedure will "restore" our data from an XPORT format into a format that we can analyze in SAS.

(C) When SAS is copying the contents of the transport file into a different library to create a new data file, it will automatically assign a name to that file, and the name of the newly-created data file will be recorded in SAS' log. If you wish to assign a name of your choosing to the new data, however, you can do so via SAS' DATASETS procedure. Just edit the syntax file to include something like the following commands:

PROC DATASETS library=libraryname;
change old-file-name=new-file-name;
quit;

Example:
PROC DATASETS library=da4293;
change d4293p1=da4293tr;
quit;

So, with our example here, SAS will rename the "restored" data file from "d4293p1" to "da4293tr" (the latter is a name we chose soley for illustrative purposes).

(D) The SAS "setup" file that the ICPSR provided already includes PROC FORMAT syntax to create a catalog file that will contain the value labels for our data file. To create that catalog file in the same location as our data file so that SAS can match value labels and variables, we must tell SAS which library to save the catalog file and what to call the catalog file. We can do this via syntax something like the following:

PROC FORMAT library=libraryname.formatfilename;

Example: PROC FORMAT library=da4293.for4293;

So, SAS will now create a catalog filed called "for4293" and save it in our "da4293" library.

These are the only changes you need to make to the top of the syntax file. When you are finished editing it, the file should look something like this:



Click here for full-size image

(4) The next step is to scroll to the end of the PROC FORMAT section. You are looking for a "RUN;" command followed by a "DATA" step and a "SET" statement. [NOTE - do not scroll down to the end of the file - if you do, you will have passed the section we are discussing here.] You should see something like the following:



Click here for full-size image

In this section of the SAS program file, SAS will modify the newly-restored data file that we created and renamed above so that it will have value labels associated with it. So we need to edit the program file to tell SAS the name and location of our data file, where to look to find the catalog file with the value labels, and so forth.

(A) First, we will add an options statement to tell SAS to look for the catalog file that was created above by the FORMAT procedure. So, insert something like the following right above the syntax for the DATA step:

options fmtsearch=(libname.formatfilename);

Example: options fmtsearch=(da4293.for4293);

(B) Next, we will modify the syntax for the DATA step to identify the location and name of the data file we wish to format with value labels. So, change the DATA step syntax to read something like the following:

DATA libraryname.datafilename;

Example: DATA da4293.da4293tr (label="ICPSR #4293 - SAS XPORT File");

[NOTE: Here, we have also included the "label" option in the DATA step to attach a descriptive label to the data file that we are creating. However, this option is not necessary to restore properly the data.]

We also need to modify the syntax for the SET statement below the DATA step. The SET statement can be used with a DATA step for purposes such as creating a new data file from a pre-exising file or over-writing a file with a newer version (e.g. a newer version with values labels). Basically, the SET statement tells SAS the location and name of the original file which we will be modifying or over-writing or the like, while the DATA step tells SAS the location and name of the resulting file. Here, we will use the SET statement step to identify the data file that we restored above with the COPY procedure. The syntax here should be something like the following:

SET datalibraryname.datafilename;

Example: SET da4293.da4293tr;

So, the SET statement will tell SAS to get the "da4293tr" file from the "da4293" library, while the DATA step will tell SAS to overwrite that file using the same name and location. The FORMAT statement that we see below will then tell SAS which format in the catalog file to associate with a given variable (remember that the options statement that we inserted above the DATA step tells SAS where to look for the catalog file).

So, when we are done modifying this section of the program file, it should look like this:



Click here for full-size image


(5) Finally, at the end of our syntax file, we will add some syntax to execute SAS' CONTENTS procedure, which we will use to check to see that our restored data file has the correct number of observations and variables (i.e. the numbers are the same as those provided in the documentation for our data) and to check whether SAS did assign value labels to variables. The syntax for the CONTENTS procedure should be something like the following:

PROC CONTENTS data=libraryname.datafilename varnum;
RUN;

Example: PROC CONTENTS data=da4293.da4293tr varnum;
RUN;

In our example here, then, SAS will summarize the contents of our new SAS data file, and the varnum option will tell SAS to list the variables in the data file in the order that they are located within the data, rather than in alphabetical order.



Click here for full-size image

(6) After saving the command (.sas) file, click on the icon of the Running Man in the toolbar, or click on "Run" in the upper menu and choose "Submit". The data should be read in and successfully formatted as a SAS file:



Click here for full-size image


(7) You can view the output from PROC CONTENTS in SAS' "Output" window and look over the information provided about the data file to be sure that the number of variables and observations are correct, that value labels are associated with the correct variables, and so on:



Click here for full-size image

(8) To view the data file, click on the file drawer icon labeled 'Libraries' and then click on the library that you created above (e.g. Da4293 in this tutoral):



Click here for full-size image

The SAS datafiles in our library are represented with icons that look like spreadsheets and have a red dot in the bottom-right corner. The icons should have names but no suffixes for file extensions. Click on the icon with the name that you gave the data file, and you will see the contents of our restored SAS data file:



Click here for full-size image

(9) In the directory where you saved all your files, meanwhile, you should also see icons for our newly-restored SAS data file (with a .sas7bdat extension) and its associated SAS catalog file (with a .sas7bcat extension) containing "formats"/value labels:



Click here for full-size image




Data Analysis

Page adapted from Electronic Data Center, Emory University Libraries
Original text by Amy Yuen