
When we have a data file that is in "wide" format and has multiple variables in it, the reshaping process is
more complicated, and we essentially have to reshape the data twice to get it into our desired, "long" format.
Say that we have a data file that looks like the following:
|
Country
|
Variable
|
1981
|
1982
|
1983
|
|
Austria
|
XYZ
|
25
|
10
|
8
|
|
Belgium
|
XYZ
|
14
|
30
|
17
|
|
Denmark
|
XYZ
|
2
|
18
|
4
|
|
Austria
|
EFG
|
0
|
0
|
1
|
|
Belgium
|
EFG
|
1
|
1
|
1
|
|
Denmark
|
EFG
|
0
|
1
|
0
|
Here, there are several variables stacked vertically and indexed by country,
and each year column has values for three different variables for each
country. Now, it will require a few extra steps to reshape the data into
a time-series format. We need to use the -reshape
wide- command to separate out the stacked variables into columns
by year and then use the -reshape
long- command to sort out the country-year values and return to
long format. Essentially, to obtain time-series data, we have to switch
out the year columns for variable columns so the data are indexed by country-year
instead of by country-variable, with the end result looking something
like this:
|
Country
|
Year
|
XYZ
|
EFG
|
|
Austria
|
1981
|
25
|
0
|
|
Austria
|
1982
|
10
|
0
|
|
Austria
|
1983
|
8
|
1
|
|
Belgium
|
1981
|
14
|
1
|
|
Belgium
|
1982
|
30
|
1
|
|
Belgium
|
1983
|
17
|
1
|
|
Denmark
|
1981
|
2
|
0
|
|
Denmark
|
1982
|
18
|
1
|
|
Denmark
|
1983
|
14
|
0
|
Say that we have imported a text file with data into Stata and that the file contains multiple variables, such
as this example:

Click here for full-size image
Here, we will first need to rename the column names so that they more clearly identify the variables and the
years. Thus, we will need to enter commands along the lines of the following:
rename v1 variable
rename v2 country
rename v3 value1989, etc.
Note that column names must include a string component for Stata to accept them as "legal" variable name for use
with the reshape command.
Next, if the first line of data contains column labels, we must delete
it as was shown in the previous section (See -drop
if _n==1- above).
In our data file here, the names of the variables we want to create come
from values of the "variable" column in our current file. So, to get the
data into our desired format, we also need to use the -replace-
command with "if" statements so as to change the values listed in the
"variable" column to the names you want the variables to be when they
become the column names in our new, reshaped file. For each variable,
you must use an "if" statement that refers to the relevant value(s) to
be changed. Remember the following: (1) there should not be any spaces
in the variable names; (2) for older versions of Stata (6 or below) the
program only allows 8 characters for variable names. In the following
example, 'variable' is the name of the column that identifies which variable's
value is given in a particular row:
replace variable="gdp" if variable=="Gross national income (constant LCU)"
In this example, Stata will look for any case in which the value of "variable" equals "Gross national income
(constant LCU)" and will replace that value with "gdp". We will need to repeat this command for each variable
name listed in the "variable" column.
Once you have completed the renaming and replacing, the dataset should look like this:

Click here for full-size image
At this point, the -reshape wide- command is issued. This command takes the 'variable' variable and matches
it up with each 'valueyear' column, leaving only the country as the identifying variable. With the example above,
this will result in 12 observations (one for each country) and 3 times the number of columns in the original
dataset.
To reshape the file, you must specify what shape (long or wide) you want the resultant file to be. You must also
list the variables to be reshaped. With our example data file, these variables will be the 'valueyear' columns
because those are the columns that will be changed. Then, after a "," we must specify the i() and j() variables.
Here, we are reshaping from long to wide; thus, the "i" variable is the variable that will uniquely identify cases
in the wide datafile, while the "j" variable will provide the suffixes for the reshaped variables. For our example
datafile, the command will look like this:
reshape wide value1989 value1990 value1991 value1992 value1993 value1994 value1995 value1996 value1997 value1998
value1999 value2000, i(country) j(variable) string
[Note - in this example, we have typed "string" at the end of the command because the j() variable has string
(alpha-numeric) values; this option is necessary when the variable is non-numeric.]
This command will create output that looks like the following:

Click here for full-size image
And a dataset that looks like this:

Click here for full-size image
The next step is to rename all of the variables from 'valueyearvariable' to 'variableyear'
rename value1989gdp gdp1989
rename value1990gdp gdp1990, etc
so that the data file looks like this:

Click here for full-size image
These commands rename the column names so that we can separate out the years from the variables when we reshape the data from wide to long format. Then, we will have created the long, time-series format that we want for analysis.
The final command is the -reshape long- command that changes the dataset into a country-year format. The
command syntax will be something like this generic example:
reshape long [variable list], i(identifying variable) j(subobservation)
When executing the -reshape long-
command, the variable-list portion lists the new variables that will be
created from the old columns. For this list, we need to list the non-numeric
portion of each column name (i.e. we do not include the years in the new
variable names). The "i" variable will identify the "cluster" variable
in the time-series ('country'), and the "j" variable will identify unique
observations within each cluster ('year'). This "j" variable is not actually
in our wide file; it will instead be created by the reshape process as
part of the new long file. Its values will come from the numeric suffixes
that are part of the variable names in the file that we are reshaping.
For our specific datafile, we will type the following:
reshape long gdp growth pop, i(country) j(year)
There are three variables listed in the example above: gdp, growth and pop (which are for GDP, percent population
growth and total population, respectively). These variables will be created by the reshape process. The values
for these variables will come from the columns with prefixes that equal the new variable names. For instance, all
the values for the gdp variable will come from those columns whose names being with "gdp" - gdp1989, gdp1990, and
so on. The "year" variable, meanwhile, will be created from the numeric suffixes of variable names in the wide
file that we are reshaping. The values for a variable in a given year (e.g. gdp in 1989) in the new file will thus
come from the appropriate column in the old file (e.g. gdp1989). Note that including numbers as part of the
variable name that are not related to the year can be problematic, so avoid labeling your variables with numbers.
If need be, you can change the names after the reshape for clarity.
With our specific example, the command creates output that looks like this:

Click here for full-size image
and a dataset that looks like this:

Click here for full-size image
Be sure to save the dataset when you finish. The Data Gods are mercurial and not prone to forgiveness.
Stata's FAQ for using the -reshape- command offers additional insights and tips.