
If the dataset contains only one variable of interest recorded over time (for instance, you want data on GDP from
the WDI and plan to merge those data into a bigger dataset), there are a few simple commands in Stata you can use
to change from the 'wide' to 'long' format.
Say that we have gotten some data from the WDI on GDP for several years for multiple countries and have moved the
data into Stata. When you import datafiles into Stata, Stata often assigns generic names to variables -
v1, v2, v3, etc.:

Click here for full-size image
Consequently, it is often necessary to rename those variables before
reshaping the file. First, each year in the 'wide' format must have a
string component as part of its variable name for the reshape command
to work. So, if you import data from Excel into Stata, you need to use
the -rename- command to
rename each year with some variable name and the year of that column.
Note that you will need to assign the correct year carefully. Generally,
the syntax will be along the lines of this generic example:
rename v1 country
rename v2 xyzyear1
rename v3 xyzyear2
rename v4 xyzyear3
(Note that the string portion must be the same for every year, because Stata uses that as part of the variable
name when it reshapes the data into 'long' format; thus the string portion of the variable name should be the name
you want the variable to have in the reshaped file.)
And, after renaming the variables, you should see something like this when browsing the dataset:
|
Country
|
xyz1981
|
xyz1982
|
xyz1983
|
|
Austria
|
25
|
10
|
8
|
|
Belgium
|
14
|
30
|
17
|
|
Denmark
|
2
|
18
|
4
|
To refer to our specific example of the datafile containing data on GDP, we would enter the following
commands:
rename v1 country
rename v2 gdp1981
rename v3 gdp1982
rename v4 gdp1983
And see the following when we browse our dataset:

Click here for full-size image
Once you have finished renaming the variables, check to see if if the first row of observations contains column
labels. In our example, this is in fact the case:

Click here for full-size image
In this situation, you need to delete that row using this command:*
drop if _n==1
*The _n is an internal indexing mechanism that Stata automatically does for each row of data. _n==1 refers to the
first row of data in the dataset. The _n values do not "stick" with their rows if the data are sorted differently,
so make sure you are deleting the correct row if you have sorted the data. Also notice that the command calls for
a double equals sign (==) not just a single sign.
This will get rid of the column labels as a row of data:

Click here for full-size image
In Stata's command window, we will now type the commands to reshape the data. To do so, we must type "reshape"
(i.e. the command name); then, we must specify which shape (long or wide) we want to reshaped data to be in; then
we must list the variable(s) to be reshaped; then, after a "," we must specify the "i" and "j" variables, which are
used to uniquely identify cases. To reshape data into a 'long' format for time-series analysis, the syntax will be
something like this generic example:
reshape long xyz, i(country) j(year)
This command tells Stata to create the column variable xyz indexed by country ("i") and year ("j"). Note that xyz
is the stem of the original xyzyear variables and is the name that we want the variable to have in our reshaped
file. "i" is the original row variable; with time-series data, this is what is called the "cluster" variable.
"j" is a new variable that is created by the reshaping process; it is taken from the suffix of the xyzyear
variables in our original, "wide" datafile and it uniquely identifies cases within each cluster. Stata will
then report all of the changes made, the new observation and variable counts, and any errors it encountered.
To go back to our specific example, we would type
reshape long gdp, i(country) j(year)
This command will create a variable called gdp. The "i(country)" option will tell Stata that the "country"
variable will identify each group of observations within the datafile. The "j(year)" option, meanwhile, will
create a new variable ("year") that will uniquely identify observations within each cluster of observations.
When browsing the dataset, we then see that the data have in fact been changed into a 'long' format:

Click here for full-size image
Be sure to save the dataset when you finish. The Data Gods are mercurial and not prone to forgiveness.
Stata's FAQ for using the -reshape- command offers additional insights and tips.