(Note: this page assumes that you know a little basic statistics.)
use "C:\Documents and Settings\EFoster\My Documents\stata guide\nps_example.dta"
. describe;
Contains data from C:\Documents and Settings\EFoster\nps_example.dta
obs: 1,095 Sierra Leone 2005 National
Public Services Survey
vars: 30 23 Nov 2007 14:43
size: 59,130 (99.9% of memory free) (_dta has notes)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
province byte %9.0g provinces
province
district byte %18.0g districts
district
localcouncil int %21.0g localcouncils
Local Council Area
ea_code long %12.0f enumeration area code
hh_no byte %9.0g household number within EA
stratum byte %8.0g rural_urban
urban or rural
...
srno float %9.0g
-------------------------------------------------------------------------------
Sorted by: ea_code hh_no
. tab religion
religion | Freq. Percent Cum.
------------+-----------------------------------
Christian | 252 23.01 23.01
Muslim | 837 76.44 99.45
Other | 6 0.55 100.00
------------+-----------------------------------
Total | 1,095 100.00
. sum age
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
age | 1088 41.3557 15.4491 18 90
Note that we only have 1088 observations for age so there are 7 observations where it is missing.
The age of respondents in our dataset ranges from 18 to 90 with a mean of 41.4.
. tab religion, sum(age)
| Summary of age
religion | Mean Std. Dev. Freq.
------------+------------------------------------
Christian | 39.458167 15.367018 251
Muslim | 42 15.440129 831
Other | 31.5 11.84483 6
------------+------------------------------------
Total | 41.355699 15.449099 1088
. ttest age, by(gender)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
male | 559 43.65653 .663978 15.69855 42.35233 44.96073
female | 529 38.92439 .6439895 14.81176 37.65929 40.18948
---------+--------------------------------------------------------------------
combined | 1088 41.3557 .4683696 15.4491 40.43669 42.27471
---------+--------------------------------------------------------------------
diff | 4.732144 .9264646 2.914281 6.550007
------------------------------------------------------------------------------
diff = mean(male) - mean(female) t = 5.1077
Ho: diff = 0 degrees of freedom = 1086
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000
This command produces a lot of output, but I've highlighted the most important parts:
the average age of men is 43.7 and of women is 38.9. The p-value for our hypothesis is essentially
0, so we reject the hypothesis that male and female respondents have the same average age.
. reg hhsize age, r
Linear regression Number of obs = 1076
F( 1, 1074) = 10.31
Prob > F = 0.0014
R-squared = 0.0140
Root MSE = 4.6486
------------------------------------------------------------------------------
| Robust
hhsize | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0358748 .011174 3.21 0.001 .0139494 .0578001
_cons | 6.013781 .4596593 13.08 0.000 5.111849 6.915714
------------------------------------------------------------------------------
(The option , r specifies that we want robust standard errors.) This command estimates
that hhsize = 6.01 + 0.036 * age. The coefficient on age is positive (older respondents
have bigger households on average) as we expected and statistically
significant (p-value of 0.001).
. gen age2 = age*age
(7 missing values generated)
. reg hhsize age age2, r
Linear regression Number of obs = 1076
F( 2, 1073) = 5.37
Prob > F = 0.0048
R-squared = 0.0143
Root MSE = 4.6501
------------------------------------------------------------------------------
| Robust
hhsize | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0111948 .0614485 0.18 0.855 -.1093781 .1317677
age2 | .0002617 .0006779 0.39 0.700 -.0010685 .0015919
_cons | 6.524329 1.28781 5.07 0.000 3.997417 9.051241
------------------------------------------------------------------------------
(Note that the coefficient on age squared is not significant, so the quadratic model
does not fit the data better.). histogram hhsize (bin=30, start=0, width=1.1666667)
scatter hhsize age

contact: djiboliz@gmail.com
last modified: 17 Sept 2008