Liz's Stata Guide

Mistakes I Have Made

This page is dedicated to the various mistakes that I have made using Stata. Learn from them! Have you made a mistake using Stata that you think others could learn from? Email me at djiboliz@gmail.com.

expecting decimal operations to be exact

Stata (being a computer program) stores numbers in binary. There is no exact binary representation of decimals such as 0.1. Stata's default for storing numbers is the float which has 7 digits of accuracy. It does calculations and comparisions to double accuracy, which is 16 digits. This means you can get the following
. gen x = 0.1

. count if x == 0.1
    0
To avoid this issue, you could store your numbers with decimal points as doubles.
. gen double n = 0.1

. count if n == 0.1
    1
But this uses twice as much space to store, and you can't use this method if you are using variables that have been entered or created by someone else. Instead, use the float function when doing comparaisons, which rounds numbers to float accuracy.
. count if float(x) == float(0.1)
    1

using if varx instead of if varx == 1

if varx evaluates to TRUE if varx is missing, so if varx is a binary variable, your command will be executed for observations where varx is equal to 1 and where it's missing. This is probably not what you want.

using if varx != . instead of if varx < .

Later versions of Stata support extended missing values such as .a and .b Some datasets use these extended missing values to differentiate between different reasons why the value is missing (for example .a might mean "don't know" while .b means "not applicable.") All of these extended missing values are counted by Stata as being greater than any number so if you want to restrict your command to observations where varx is not missing, use if varx < . to be safe. Similarily, if you want to drop all observations where varx is missing, use drop if varx >= . instead of drop if varx == .

contact: djiboliz@gmail.com
last modified: 31 July 2014