set mem 20m

use "C:\kate\manuscripts\bosall.dta", clear

* summarize all variables (descriptives)

summarize

* frequences for partisanship, female

tabulate party female

* replace missing data for variables
* note that missing data codes can be found in codebook

replace female=. if female==9
replace black=. if black==9
replace latino=. if latino==9

replace educprof=. if educprof==9
replace healprof=. if healprof==9
replace welfprof=. if welfprof==9
replace chilprof=. if chilprof==9

replace vote1=. if vote1==999999
replace vote2=. if vote2==999999
replace vote3=. if vote3==999999

replace first_year=. if first_year==9999
replace last_year=. if last_year==9999
replace last_year=2008 if last_year==8888

replace yob=. if yob==9999
replace prior_exp=. if prior_exp==9
replace leg_exp=. if leg_exp==9
replace school_board=. if school_board==9

replace education=. if education==9
replace lawyer=. if lawyer==9
replace income=. if income==999999
replace college=. if college==999
replace perblack=. if perblack==999
replace perlatin=. if perlatin==999

* frequences for female, to check that missing data is replaced

tabulate female

* create a variable called margin, that equals the number of votes
* legislator won in last election, as a proportion of the votes won
* by the two top candidates

generate margin=vote1 / (vote1+vote2)

* descriptives for margin
* means, standard deviations, ranges

summarize margin

* OLS regression of margin on income

regress margin income

* note that the b is very tiny
* this is because of the scale of income
* units are in dollars
* and a one dollar change in average household income in the 
* district is not going to produce a large change in vote margin

* so, create a new variable for income

generate income2=income/1000

* then regress margin on the new variable
* the b will change, but since the standard error will change as well
* then the t and the p-value will be the same

regress margin income2