Liz's Stata Guide

Favorite Commands and Tricks

This page is dedicated to commands and Stata tricks that I find particularly nifty. Have suggestions of other overlooked jewels of Stata? Email me at djiboliz@gmail.com.

string with format

This can be used to add leading 0s when creating a composite (string) id. For example
gen ea_id = string(district, "%03.0f") + "-" + string(county, "%02.0f") + "-" + string(ea_num, "%02.0f")
if district is a numeric variable that can be up to three digits long, and county and ea_num are numeric variables that can each be up to two digits long.

total

This is a function that goes along with the egen command and is very useful, especially in combination with by.
bys hhid: egen hh_adult_equiv = total(adult_equiv)
With a binary argument (which equals 1 if true and 0 if false) it can also be used to count things. To count the number of spouses of the household head in a household, you could use
bys hhid: egen num_spouses = total(relat == 2)
where relat takes on the value of 2 if the individual is a spouse of the household head.

recode

recode can save you a lot lines of code in many situations.
  • If you want to create a categorical variable, instead of
    gen x = 1 if y >= 0 & y <= 5;
    replace x = 2 if y >= 6 & y <= 10;
    replace x = . if y == . | y > 10; 
    
    use
    recode y (0/5 = 1) (6/10 = 2) (10/max = .), gen(x)
    
  • If variables q2, q3 and q4 should all be missing if q1 is equal to 2, instead of
    replace q2 = . if q1 == 2;
    replace q3 = . if q1 == 2;
    replace q4 = . if q1 == 2;
    
    use
    recode q2 q3 q4 (nonmiss = .) if q1 == 2
    
  • If the individual ids within one household need to be switched around, instead of
    replace indiv = 99 if indiv == 1 & hhid == "1234-5678";
    replace indiv = 1 if indiv == 3 & hhid == "1234-5678"
    replace indiv = 3 if indiv == 2 & hhid == "1234-5678"
    replace indiv = 2 if indiv == 99 & hhid == "1234-5678"
    replace indiv = 99 if indiv == 4 & hhid == "1234-5678";
    replace indiv = 4 if indiv == 5 & hhid == "1234-5678";
    replace indiv = 5 if indiv == 99 & hhid == "1234-5678";
    
    use
    recode indiv (1 = 2) (2 = 3) (3 = 1) (4 = 5) (5 = 4) if hhid == "1234-5678"
    
    This is not only much shorter, but also much clearer.

    numlabel _all, add

    My favorite way to deal with category variables (such as district) is to have them stored as numeric variables (usually bytes or ints) with the values labeled. This creates nice tables, etc and avoids having to manipulate strings. Sometimes you need to know the numeric value that corresponds to the label. I used to use
    tab district;
    tab district, nolabel;
    
    and compare the two tables to find the code for Bonthe. It's much easier to use numlabel which adds the numeric code to the value labels so we can easily see which goes with which.
    . numlabel _all, add;
    
    . tab district;
    
                  district |      Freq.     Percent        Cum.
    -----------------------+-----------------------------------
              11. Kailahun |        110       10.00       10.00
                12. Kenema |        140       12.73       22.73
                  13. Kono |         65        5.91       28.64
               21. Bombali |         80        7.27       35.91
                22. Kambia |         45        4.09       40.00
             23. Koinadugu |         60        5.45       45.45
             24. Port Loko |         80        7.27       52.73
             25. Tonkolili |         60        5.45       58.18
                    31. Bo |        110       10.00       68.18
                32. Bonthe |         20        1.82       70.00
               33. Moyamba |         55        5.00       75.00
               34. Pujehun |         40        3.64       78.64
    41. Western Area Rural |         60        5.45       84.09
    42. Western Area Urban |        175       15.91      100.00
    -----------------------+-----------------------------------
                     Total |      1,100      100.00
    
    Note: this adds the numberic codes to all labels (hence _all). If we want to do it for just one variable, we need to know the name of the value label for that variable -- describe will tell us.
    . describe district;
    
                  storage  display     value
    variable name   type   format      label      variable label
    -------------------------------------------------------------------------------
    district        byte   %18.0g      districts
                                                  district
    
    . numlabel districts, add;
    
  • contact: djiboliz@gmail.com
    last modified: 30 July 2014