set mem 20m use "C:\kate\manuscripts\bosall.dta", clear * summarize all variables (descriptives) summarize * frequences for partisanship, female tabulate party female * replace missing data for variables * note that missing data codes can be found in codebook replace female=. if female==9 replace black=. if black==9 replace latino=. if latino==9 replace educprof=. if educprof==9 replace healprof=. if healprof==9 replace welfprof=. if welfprof==9 replace chilprof=. if chilprof==9 replace vote1=. if vote1==999999 replace vote2=. if vote2==999999 replace vote3=. if vote3==999999 replace first_year=. if first_year==9999 replace last_year=. if last_year==9999 replace last_year=2008 if last_year==8888 replace yob=. if yob==9999 replace prior_exp=. if prior_exp==9 replace leg_exp=. if leg_exp==9 replace school_board=. if school_board==9 replace education=. if education==9 replace lawyer=. if lawyer==9 replace income=. if income==999999 replace college=. if college==999 replace perblack=. if perblack==999 replace perlatin=. if perlatin==999 * frequences for female, to check that missing data is replaced tabulate female * create a variable called margin, that equals the number of votes * legislator won in last election, as a proportion of the votes won * by the two top candidates generate margin=vote1 / (vote1+vote2) * descriptives for margin * means, standard deviations, ranges summarize margin * OLS regression of margin on income regress margin income * note that the b is very tiny * this is because of the scale of income * units are in dollars * and a one dollar change in average household income in the * district is not going to produce a large change in vote margin * so, create a new variable for income generate income2=income/1000 * then regress margin on the new variable * the b will change, but since the standard error will change as well * then the t and the p-value will be the same regress margin income2