(Note: this page assumes that you know a little basic statistics.)
use "C:\Documents and Settings\EFoster\My Documents\stata guide\nps_example.dta"
. describe; Contains data from C:\Documents and Settings\EFoster\nps_example.dta obs: 1,095 Sierra Leone 2005 National Public Services Survey vars: 30 23 Nov 2007 14:43 size: 59,130 (99.9% of memory free) (_dta has notes) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- province byte %9.0g provinces province district byte %18.0g districts district localcouncil int %21.0g localcouncils Local Council Area ea_code long %12.0f enumeration area code hh_no byte %9.0g household number within EA stratum byte %8.0g rural_urban urban or rural ... srno float %9.0g ------------------------------------------------------------------------------- Sorted by: ea_code hh_no
. tab religion religion | Freq. Percent Cum. ------------+----------------------------------- Christian | 252 23.01 23.01 Muslim | 837 76.44 99.45 Other | 6 0.55 100.00 ------------+----------------------------------- Total | 1,095 100.00
. sum age Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 1088 41.3557 15.4491 18 90Note that we only have 1088 observations for age so there are 7 observations where it is missing. The age of respondents in our dataset ranges from 18 to 90 with a mean of 41.4.
. tab religion, sum(age) | Summary of age religion | Mean Std. Dev. Freq. ------------+------------------------------------ Christian | 39.458167 15.367018 251 Muslim | 42 15.440129 831 Other | 31.5 11.84483 6 ------------+------------------------------------ Total | 41.355699 15.449099 1088
. ttest age, by(gender) Two-sample t test with equal variances ------------------------------------------------------------------------------ Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- male | 559 43.65653 .663978 15.69855 42.35233 44.96073 female | 529 38.92439 .6439895 14.81176 37.65929 40.18948 ---------+-------------------------------------------------------------------- combined | 1088 41.3557 .4683696 15.4491 40.43669 42.27471 ---------+-------------------------------------------------------------------- diff | 4.732144 .9264646 2.914281 6.550007 ------------------------------------------------------------------------------ diff = mean(male) - mean(female) t = 5.1077 Ho: diff = 0 degrees of freedom = 1086 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000This command produces a lot of output, but I've highlighted the most important parts: the average age of men is 43.7 and of women is 38.9. The p-value for our hypothesis is essentially 0, so we reject the hypothesis that male and female respondents have the same average age.
. reg hhsize age, r Linear regression Number of obs = 1076 F( 1, 1074) = 10.31 Prob > F = 0.0014 R-squared = 0.0140 Root MSE = 4.6486 ------------------------------------------------------------------------------ | Robust hhsize | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0358748 .011174 3.21 0.001 .0139494 .0578001 _cons | 6.013781 .4596593 13.08 0.000 5.111849 6.915714 ------------------------------------------------------------------------------(The option , r specifies that we want robust standard errors.) This command estimates that hhsize = 6.01 + 0.036 * age. The coefficient on age is positive (older respondents have bigger households on average) as we expected and statistically significant (p-value of 0.001).
. gen age2 = age*age (7 missing values generated) . reg hhsize age age2, r Linear regression Number of obs = 1076 F( 2, 1073) = 5.37 Prob > F = 0.0048 R-squared = 0.0143 Root MSE = 4.6501 ------------------------------------------------------------------------------ | Robust hhsize | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0111948 .0614485 0.18 0.855 -.1093781 .1317677 age2 | .0002617 .0006779 0.39 0.700 -.0010685 .0015919 _cons | 6.524329 1.28781 5.07 0.000 3.997417 9.051241 ------------------------------------------------------------------------------(Note that the coefficient on age squared is not significant, so the quadratic model does not fit the data better.)
. histogram hhsize (bin=30, start=0, width=1.1666667)
scatter hhsize age
last modified: 17 Sept 2008