Call/WhatsApp: +1 914 416 5343

The US presidential election

The US presidential election

The US presidential election is held every four years on Tuesday after the first Monday in November. The 2008 and 2012 elections were held, respectively, on Nov 4, 2008 and Nov 6, 2012. The President of US is not elected directly by popular vote. Instead, the President is elected by electors who are selected by popular vote on a state-by-state basis. These selected electors cast direct votes for the President. Almost all the states except Maine and Nebraska, electors are selected on a “winner-take-all” basis. That is, all electoral votes go to the presidential candidate who wins the most votes in popular vote. For simplicity, we will assume all the states use the “winner-take-all” principle in this lab. The number of electors in each state is the same as the number of congressmen of that state. Currently, there are a total of 538 electors including 435 House representatives, 100 senators and 3 electors from the District of Columbia. A presidential candidate who receives an absolute majority of electoral votes (no less than 270) is elected as President.

For simplicity, our data analysis only considers the two major political parties: Democratic (Dem) and Republican (Rep). The interest is to predict which party (Dem or Rep) will win the most votes in each state. Because the chance that a third-party (except Dem and Rep) receives an electoral vote is very small, our simplification is reasonable. Our prediction will be based on election polls. An election poll is a survey that samples a small portion of voters about their vote plans. If the survey is conducted appropriately, the samples of voters should be a representation of the voting population at large. However, it is very challenging to obtain a good representative group because a good sampling strategy needs to consider many factors (e.g., sampling time, locations, methods). Therefore, a poll’s prediction 2 could be biased, and the prediction accuracy could be improved by combining multiple poll

There exist many possible factors affecting the prediction accuracy of election polls. Based on the available data sets, we consider the following three factors.

1. Sampling time. It is understandable that if the sampling time is far ahead of the election date, the accuracy could be worse than those polls conducted closer to the election date. Because there are many events that could change voters’ opinions about presidential candidates, the longer the time, the more likely voters are going to change their voting plans.

2. Pollsters. Systematic biases could occur if a false sampling method is taken. For example, if a pollster only collects samples through Internet, it would be a biased sample since the sample only includes those who have access to Internet. Each pollster uses different methods for sampling voters. Some sampling schemes could be better than the others. Therefore, it is very likely that some pollsters’ predictions are more reliable than some others. We should not give equal weights to every poll.

3. State edges. The state edge is the difference between the Democratic and Republican popular vote percentages (based on the polls) in that state. For instance, if the Democratic candidate receives 55% of the vote and Republican candidate receives 45% of the votes, then the Democratic edge is 10 percentage points. Because of the sampling errors, if the state edges are small, the prediction accuracy of a poll is more likely to be affected by the sampling errors. However, if the state edges are big, the prediction accuracy is less likely to be affected by sampling errors.

Available date sets The following data sets are available for our data analysis

1) Polling data from the 2008 US presidential election (2008-polls.csv);

2) Election results from the 2008 US presidential election (2008-results.csv);

3) Polling data from the 2012 US presidential election (2012-polls.csv);

4) Election results from the 2012 US presidential election (2012-results.csv). 3 The data sets 1) and 2) will be used for training purpose. The data set 3) will be used for prediction. The data set 4) is provided for validation purpose, which can help us to check if our predictions are correct or not.

We will first pre-process these data sets for the purpose of performing logistic regression. As a first step, using the following commands to dead the data sets “2008-polls.csv”, “2012-polls.csv” and “2008-results.csv” into R.

We will first pre-process these data sets for the purpose of performing logistic regression. As a first step, using the following commands to dead the data sets “2008-polls.csv”, “2012-polls.csv” and “2008-results.csv”into R.

setwd(“…”) ## Change the directory where you saved the data sets polls2008<-read.csv(file=”2008-polls.csv”,header=TRUE) polls2012<-read.csv(file=”2012-polls.csv”,header=TRUE) results2008<-read.csv(file=”2008-results.csv”,header=TRUE)

To simplify our data analysis, let us focus on subsets of these available data sets. We will select the subset of data sets based on pollsters because not all the pollsters conducted polls in every state. We select pollsters that conducted at least five polls in both 2008 and 2012 polling data sets 1) and 3) using the following R code.

pollsters20085<-table(polls2008$Pollster)[table(polls2008$Pollster)>=5] pollsters20125<-table(polls2012$Pollster)[table(polls2012$Pollster)>=5] subset1<- names(pollsters20085)[names(pollsters20085)%in%names(pollsters20125)] pollers<-names(pollsters20125)[names(pollsters20125)%in%subset1]

Then, we create the subsets of the 2008 and 2012 data sets that are collected by the selected pollsters using the following R code

subsamplesID2008<-polls2008[,5]%in%pollers polls2008sub<-polls2008[subsamplesID2008,] subsamplesID2012<-polls2012[,5]%in%pollers polls2012sub<-polls2012[subsamplesID2012,]

To build predictive modeling using logistic regression model, we create response variable and predictors. First, we define binary response variables (Resp), which is an indicator that indicates if the predictions given by polls are correct or not. If the prediction is correct, we define Resp to be 1 otherwise 0. To check if the prediction given by each poll is correct or not, you could first find out the predicted winner for each state, and then compare it with the actual winner in the data set “2008-results.csv”. Second, define state edges based on the definition of the state edges (see above for the definition). Finally, compute the number of days between the sampling time (polling date) and the presidential election date of 2008 (lag time). The 2008 presidential election date is Nov 4, 2008. The following R code is used for the above purpose.

winers2008<-(results2008[,2]-results2008[,3]>0)+0 StateID2008<-results2008[,1] Allresponses<-NULL for (sid in 1:51)

{ polls2008substate<-polls2008sub[polls2008sub$State==StateID2008[sid],] pollwiners2008state<-(polls2008substate[,2]- polls2008substate[,3]>0)+0 pollwinersIND<-(pollwiners2008state==winers2008[sid])+0 Allresponses<-c(Allresponses,pollwinersIND) } margins<-abs(polls2008sub[,2]-polls2008sub[,3]) lagtime<-rep(0,dim(polls2008sub)[1]) electiondate2008<-c(“Nov 04 2008″) for (i in 1:dim(polls2008sub)[1]) { lagtime[i]<-as.Date(electiondate2008, format=”%b %d %Y”)- as.Date(as.character(polls2008sub[i,4]), format=”%b %d %Y”) } dataset2008<- cbind(Allresponses,as.character(polls2008sub[,1]),margins,lagtime,as.c haracter(polls2008sub[,5]))

Q1. Fit a logistic regression model using the data set “2008-polls-subset.csv”. In the model, using Resp as the binary response variable (target variable), pollsters as categorical predictors, and lag time, the square of lag time and state edges as continuous predictors. Based on the fitted model, what is the probability of making a correct prediction for a poll conducted by SurveyUSA exactly 5 days before the election with a state edge 10%?

Q2. Is the model in Q1 reasonablely good (or acceptable)? Please justify your answer using deviance and its corresponding p-value? Is the lag time significantly associated with the probability that an election poll predicts results correctly? 5

Q3. Consider a logistic regression with Resp as the binary response variable (target variable) and lag time, the square of lag time and state edges as continuous predictors. Write down the separation hyperplane for classifying the correct and wrong predictions (defined by the target variable Resp) using the feature vector containing lag time, square of lag time and state edges. If we use the state edges as y-axis and lag time as x-axis, please draw a separation curve for the classification. For the prediction/classification purpose, we need to define new variables: State edges and the lag time for the 2012 election poll data set. The definition of these new variables is same as those described above. For computing the lag time, note that the 2012 presidential election date is Nov 6, 2012. The following R code preprocess the 2012 data sets for prediction purpose:

pollwiners2012<-(polls2012sub[,2]-polls2012sub[,3]>0)+0 margins2012<-abs(polls2012sub[,2]-polls2012sub[,3]) lagtime2012<-rep(0,dim(polls2012sub)[1]) electiondate2012<-c(“Nov 06 2012″) for (i in 1:dim(polls2012sub)[1]) { lagtime2012[i]<-as.Date(electiondate2012, format=”%b %d %Y”)- as.Date(as.character(polls2012sub[i,4]), format=”%b %d %Y”) } dataset2012<- cbind(pollwiners2012,as.character(polls2012sub[,1]),margins2012,lagtim e2012,as.character(polls2012sub[,5]))

Q4. Based on the logistic regression models fitted in Q3, predicting the probability of making a correct prediction using the 2012 election poll data set. Please predict the probabilities for all the 2012 election polls from Florida (FL).

Q5. In this question, we will predict the winner of Florida using predictions given in Q4. To this end, define the winner indicator as 1 (WIND=1) if the Democratic candidate is the winner, otherwise defines it as 0. Based on Q4, we obtained predicted probability that a poll made a correct prediction of the winner

The selection of your chief executive and the vice president of the usa is undoubtedly an indirect election by which inhabitants of the United States who definitely are listed to vote in one of the fifty U.S. suggests or maybe in Washington, D.C., cast ballots not directly for anyone places of work, but alternatively for people in the Electoral College.[notice 1] These electors then cast direct votes, called electoral votes, for leader, and for vice president. The applicant who receives a definite most of electoral votes (at least 270 out of 538, because the Twenty-Thirdly Amendment given voting privileges to citizens of D.C.) is then decided to that workplace. If no candidate is provided with a complete majority of the votes for director, the House of Reps elects the president likewise if nobody receives a definite most of the votes for v . p ., then the Senate elects the v . p ..

The Electoral College or university as well as its method are set up from the U.S. Constitution by Article II, Portion 1, Conditions 2 and 4 along with the Twelfth Amendment (which replaced Clause 3 after its ratification in 1804). Under Clause 2, each and every state casts several electoral votes as being the total quantity of its Senators and Associates in Congress, when (per the Twenty-next Amendment, ratified in 1961) Washington, D.C., casts exactly the same quantity of electoral votes because the very least-symbolized status, which can be three. Also under Clause 2, the manner for selecting electors depends on each express legislature, not directly by the government. A lot of status legislatures previously selected their electors immediately, but after a while all switched to making use of the preferred vote to pick electors. Once preferred, electors generally cast their electoral votes to the prospect who earned the plurality with their condition, but 18 states do not possess procedures that specifically deal with this conduct those that vote in opposition on the plurality are classified as “faithless” or “unpledged” electors.[1] These days, faithless and unpledged electors have not influenced the supreme upshot of an selection, hence the final results can generally be decided depending on the state-by-state popular vote. Moreover, most of the time, the champion of the US presidential selection also wins the national popular vote. There were four exclusions since all claims had the electoral method we understand these days. They took place in 1876, 1888, 2000, and 2016 and had been all loss of three percent details or a lot less.

Presidential elections arise quadrennially on hop years with signed up voters casting their ballots on Election Day time, which since 1845 continues to be the very first Tuesday after November 1.[2][3][4] This time coincides with the basic elections of numerous other national, status, and local competitions since nearby governing bodies are accountable for controlling elections, these races typically all appear on one ballot. The Electoral College or university electors then formally cast their electoral votes in the first Monday after December 12 at their state’s funds. Congress then certifies the results during early Jan, and the presidential word begins on Inauguration Day time, which since the passageway of your Twentieth Amendment is set up at January 20.

The nomination procedure, consisting of the key elections and caucuses as well as the nominating conferences, was not stipulated from the Constitution, but was designed as time passes with the states and political events. These main elections are generally held between January and June just before the standard election in October, as the nominating events are held in the summer. Though not codified legally, political celebrations also comply with an indirect political election approach, exactly where voters in the fifty claims, Washington, D.C., and You.S. areas, cast ballots for any slate of delegates to a politics party’s nominating conference, who then elect their party’s presidential nominee. Each bash may then select a vice presidential running lover to sign up with the admission, that is either dependant upon range of the nominee or with a 2nd spherical of voting. As a consequence of adjustments to national campaign financial laws because the 1970s concerning the disclosure of efforts for national promotions, presidential prospects from the major governmental celebrations usually announce their motives to operate around the springtime of the prior calendar year just before the election (almost 21 several weeks before Inauguration Time). Constitutionally, the legislature of each and every status determines how its electors are selected Article II, Segment 1, Clause 2 suggests that each status shall appoint electors “such Method because the Legislature Thereof May Direct”.[7] In the initial presidential election in 1789, only 6 of your 13 original says selected electors by any form of preferred vote.[note 2]

Gradually through the years, the states commenced conducting well-liked elections to decide on their slate of electors. In 1800, only five in the 16 claims picked electors with a well-known vote by 1824, once the go up of Jacksonian democracy, the portion of suggests that picked electors by well-known vote had sharply increased to 18 out from 24 states.[8] This gradual movements toward increased democratization coincided with a gradual decrease in property limitations for your franchise.[8] By 1840, only among the 26 suggests (South Carolina) still selected electors from the state legislature.[9]

Vice presidents Under the initial program founded by Article Two, electors cast votes for just two different prospects for director. The prospect with the highest number of votes (provided it absolutely was most of the electoral votes) took over as the president, as well as the next-place prospect took over as the vice president. This provided a problem throughout the presidential selection of 1800 when Aaron Burr obtained the same quantity of electoral votes as Thomas Jefferson and pushed Jefferson’s political election on the office. Eventually, Jefferson was selected since the chief executive as a result of Alexander Hamilton’s effect in your house.

Responding to the 1800 selection, the Twelfth Amendment was approved, requiring electors to cast two distinct votes: 1 for director and the other for v . p .. While this sorted out the issue available, it lessened the reputation from the vice presidency, since the business office was no longer organised from the top rated challenger for your presidency. The independent ballots for leader and vice president grew to become something of your moot problem later within the 1800s in the event it had become the tradition for well-liked elections to determine a state’s Electoral College or university delegation. Electors picked by doing this are pledged to vote for a particular presidential and vice presidential prospect (provided by a similar governmental bash). While the president and v . p . are legally decided individually, in reality these are selected with each other.

Tie votes The Twelfth Amendment also recognized rules when no choice victories a majority vote from the Electoral University. Inside the presidential election of 1824, Andrew Jackson received a plurality, but not a vast majority, of electoral votes cast. The political election was cast on the Property, and John Quincy Adams was elected chief executive. An in-depth rivalry resulted between Andrew Jackson and House Presenter Henry Clay-based, who had already been a candidate from the selection.

Popular vote Given that 1824, aside from the periodic “faithless elector”, the most popular vote indirectly can determine the champ of a presidential selection by figuring out the electoral vote, as each express or district’s well-known vote decides its electoral university vote. Even though nationwide popular vote fails to directly decide the victor of your presidential selection, it does strongly correlate with who may be the victor. In 54 of the 59 full elections kept up to now (about 91 percent), the winner from the nationwide well-known vote has also carried the Electoral College vote. The champions of the countrywide well-known vote and also the Electoral College or university vote have differed only in shut elections. In highly very competitive elections, prospects focus on switching out their vote from the contested swing says important to succeeding an electoral university bulk, hence they usually do not try to optimize their well-liked vote by genuine or deceptive vote improves in a-bash regions.