An Investigation into "The Statistical Case Against Biden's Win" by Steve Cortes

Ben Namovicz

On November 9, 2020, Trump campaign advisor Steve Cortes published an article titled The Statistical Case Against Biden's Win. Cortes later adapted this article into a series of twitter videos, one of which was retweeted by President Trump. The article claims to demonstrate "intense improbability of the accuracy of the present Biden lead" using statistics. [0]

According to Cortes, there were unusual results in specific swing states that favored Biden. Cortes thinks it would be highly unlikely to see results like these in a fair election. The central thesis revolves around the idea that the results in swing states were significantly different from the rest of the country, but Cortes only ever compares these swing states to one or two other non-swing states. I will compare the results in swing states to the entire rest of the country to see if they were really unusual after all.

In this notebook I will go through the entire process from data collection to data cleaning to data analysis. First I will collect data on the 2020 presidential election, past presidential elections, the 2020 senate election, and geographical information on US counties. Then I will tidy the data so that different datasets can be combined. Finally I will produce data visualizations to investigate the statistical claims Cortes makes.

Steps

  1. Data Collection
  2. Data Cleaning
  3. Data Analysis

Data Collection

Before collecting any data I need to set up the environment by importing the modules I use, and defining some helper functions. Then I collect data from a variety of sources.

If don't want to read through data collection and cleaning you can skip ahead to analysis.

2020 Presidential results

I used a web scraper to get county level results for the 2020 election from Politico. These results should be finalized by now.

Historical Presidential Results

I found complete county level data for all presidential elections from 2000 to 2016 at the Harvard Dataverse. I have edited this file slightly:

2020 Senate Results

Conveniently, Politico has Senate results in the same format as their presidential results.

Geographic data

I got the geographic boundaries for counties from census.gov. The census also has population and citizen voting age population (CVAP) for every county on election years. As of writing this 2020 data is not yet available so I used 2019 data in its place. For 2012 I edited the file to fix Oglala Lakota County, SD like in the historical data. I also renamed the CVAP files, they were all originally called 'County.csv' or 'county.sas7bdat'.

Data Cleaning

First I define a few useful functions for data cleaning. Then I get the data ready for analysis. I will have to combine different datasets so I can compare data between them. Data from politico doesn't have FIPS Identifiers for counties, so I have to match them by name to counties in the historical data. Making sure county names matched exactly proved complicated. [1]

2020 Presidential Results

Historical Presidential Results

2020 Senate Results

Geographic Data

Combined Geographic Data

Combined Presidential Data

2020 presidential data combined with historical presidential data

Combined 2020 President and Senate Data

2020 presidential and senate election results combined by county and state

Data Analysis

The Four Claims Made

Cortes provides four pieces of evidence to support his thesis of election improbability. I will investigate each of these claims separately, and evaluate them on their merits. They are:

  1. Turnout
  2. Overperformance vs Obama
  3. Biden-Only Ballots
  4. Absence of Mail-In Vote Vetting

Claim 1: Turnout

The first claim Cortes makes is that Wisconsin saw implausibly high turnout. Wisconsin had a turnout rate of 89% registered voters. This number can be misleading: Wisconsin usually calculates turnout as a share of eligible voter due to its same day voter registration. The turnout rate as a share of eligible voters was 72.3%, only a little higher than the 2016 turnout of 67.3%. Cortes misleadingly compares the registered voter turnout, exaggerated as "over 90%", to Australia's eligible voter turnout of 92%.

Cortes calls the 84% turnout in Milwaukee, WI "suspect" compared with only 51% turnout in the similar city of Cleveland, OH. It appears this 84% was calculated as a share of eligible voters unlike his calculation for Wisconsin overall. Turnout in Cleveland was 51%, calculated as a share of registered voters because Ohio does not have same day voter registration. The comparison here is misleading. Both Milwaukee and Cleveland are cities within counties. The surrounding suburbs in Milwaukee County and Cuyahoga County have much higher turnout than Milwaukee City and Cleveland City. The 84% turnout Cortes cites is for the entire Milwaukee County, while The 51% turnout in Cleveland is just the city. Cuyahoga County had an overall turnout of 71%.

Any comparison between state turnouts is going to be flawed. Different states have different rules for registration, different rules for voting, and different methods to calculate turnout. I will compare states and counties using two more meaningful measures:

Calculating Turnout Data

Visualizations of Turnout

To start I plot national turnout over time for context

National turnout rose from 136 million in 2016 to 157 million in 2020, a 15% increase. We can see how this varies between states.

As you can see, Wisconsin hardly stands out. It's 10% increase in turnout is actually less than the national average. It appears Ohio is the more unusual case, with one of the lower increases in turnout. Cortes' compared two states with relatively small increases turnout in order to argue that one of those state's turnout grew improbably quickly.

Let's move down to the county level to see if the counties Cortes highlighted in Wisconsin and Ohio were unusual.

Milwaukee County, which Cortes calls "suspect" saw a modest increase of 4% since 2016. This is higher than Cuyahoga County at 3%, but lower than the Wisconsin state average of 10%, or any other county in the state. Between the two states, three counties stand out for large increases in turnout: Menominee County in Wisconsin is a small, mostly Native American county of just 4,556 people. Delaware and Union Counties in Ohio were both won by Trump. No other county in either state saw turnout rise by more than 20%.

We can compare these with the rest of the counties in the US to see that there is wide variation in how much turnout changed between regions. Cortes' assertion that turnout was abnormal "only in the key swing states that Biden allegedly won" doesn't hold water.

Now let's turn to the other measure: turnout as a percent of voting age citizens.

Contrary to Cortes' claim, Wisconsin had a normal turnout. In 2020 turnout rose around the country, and Wisconsin saw a pretty typical increase. Milwaukee county usually has slightly lower turnout than Wisconsin overall, and it saw it's turnout increase the least of any Wisconsin county. The claim of improbable turnout relies on misleading statistics and cherrypicked comparisons.

Claim 2: Overperformance vs Obama

The main claim in this section is that Biden's improvements over Obama in certain counties are unrealistic. This isn't really a statistical claim, Cortes simply doesn't think that a "doddering and lazy" Biden could do better than Obama's "rockstar appeal". He once again says that these gains came in "just the right places". This time the "just right" places are suburban counties surrounding Philadelphia. Once again I will compare these counties to the rest of the country to see if they really stand out.

Visualizations of Margins

First we see the context of popular vote margin over time.

To his credit, Cortes chose a good comparison. Biden's 2020 margin of 4.5% is closer to Obama's 2012 margin of 3.9% than any other election this century. These elections were very similar on the national level, so let's see how the compare in Pennsylvania.

This is an interesting map. Most of the counties in Pennsylvania have become much more republican. Some have even shifted red by over 40%. Democrats have made gains in the counties surrounding Philadelphia, Pittsburgh, Harrisburg, and Penn State University. The main takeaway of this map is that republicans are improving in rural areas while democrats are improving in suburban areas. The only county on this map that includes city but not suburb is Philadelphia County, which moved towards republicans. To see if this is unusual like Cortes claims, we can compare Pennsylvania to neighboring states.

In the states bordering Pennsylvania the same trends hold. Rural counties are shifting republican, some dramatically so. Suburban counties are shifting democratic. We can take another step back and compare this to the whole country.

The pattern holds again at the national level, and there appears to be a regional trend where the midwest is getting redder and the southwest is getting bluer.

The regional trend is once again visible on a state level map. Utah really stands out here with a 27% swing in just 8 years. Perhaps this is because Mitt Romney was the republican candidate in 2012, and is now a senator from Utah.

I decided to look a bit closer into the hypothesis that shift is related to whether a county is rural, suburban, or urban. I calculated population density of each county as a measure of urbanization. I plotted population density against change in margin from 2012 to 2020. There is a lot of noise, but a trend is visible. Counties with a population density of around 10 people per square mile swung more red on average than counties with a population density of around 100 people per square mile. There aren't enough counties with very high population densities to make out a clear trend.

Claim 3: Biden Only Ballots

Cortes claims that over 450,000 ballots nationwide were cast for Biden with the rest of the ballot left blank. He says there were 95,801 Biden only ballots in Georgia compared to just 818 Trump only ballots. These numbers are calculated by taking Biden's or Trump's vote totals and subtracting that state's Democratic or Republican senate candidates' total. This exact claim was addressed in an Associated Press article. There is simply no way of knowing for certain whether this difference is the result of more Biden ballots being left blank or split ticket voters. It is not unusual for senate races to get fewer votes than presidential races.

While there is no way to know exactly how many 'Biden only ballots' were cast, we get a pretty rough estimate how many ballots were cast for Biden (or Trump) that left the senate race blank by looking at two measures: presidential overperformance of senate candidates and presidential turnout compared to senate turnout.

Calculating Turnout and Margin for President and Senate Data

President vs Senate Visualization

To start, let's compare the turnout for president and senate in the states that had both elections in 2020.

There were only about 1 million more votes cast for president overall in the states that had both elections. We can also compare how each party did in these elections combined.

Even though Biden won the overall popular vote convincingly, Trump won the popular vote in the states that had a senate election in 2020. The republican senate candidates in those states won by a much larger margin. Now let's look at Georgia specifically.

Biden did slightly better than the democratic senate candidate in Georgia, Jon Ossoff, in most counties. To see whether this is unusual we can compare this to other states.

In about half of the states with a senate election Biden did better than the democratic senate candidate. In the other half, Trump did better than the republican senate candidate. Georgia had a relatively small difference. The two states that stand out are Nebraska and Maine.

Nebraska and Maine are the only two states that award electoral college votes based on how each congressional district votes in addition to the whole state's vote. As we can see in the county level map, Nebraskans around Omaha voted for Biden despite also electing republican Ben Sasse. These voters gave Biden one of Nebraska's electoral votes even though Trump won the whole state.

Maine's tilt towards Biden doesn't seem to change much by county. This is probably because moderate voters around the state supported famously moderate republican Susan Collins while also voting for Biden.

Southern and Eastern Arkansas show another interesting trend. The Arkansas senate race had no democratic candidate, only a republican and a libertarian. Biden convincingly overperformed the libertarian in the rural, majority black counties near the Mississippi Delta. The democratic senate candidates in nearby Mississippi and Alabama ran ahead of Biden.

Now we will look at how turnout in senate and presidential elections compared.

Throughout the entire state of Georgia, presidential race saw slightly higher turnout than the senate race. We can compare this to the rest of the country.

Georgia is entirely unremarkable here, so let's return to the states we were looking at earlier.

Nebraska saw much higher presidential turnout that senate concentrated around Omaha, where Biden outperformed the senate democrat. This suggests that there really were a lot of Biden only ballots.[3] In Omaha.

Despite a big difference in results between Biden and democratic senate candidate Sara Gideon, the turnouts between Maine's senate and presidential races were very close across the whole state. This suggests that there were a lot of Biden-Collins split ticket votes dispersed across the whole state.

The same counties in Arkansas where Biden outran the libertarian senate candidate, there were a lot more presidential voters than senate voters. This once again suggests Biden only voters.

Claim 4: Absence of Mail-In Vote Vetting

Cortes claims that mail-in ballots were not properly vetted. He specifically mentions Pennsylvania as a state he thinks did not properly vet its mail-in ballots. His evidence for this is a mail-in ballot rejection rate of 0.03%, which is unusually low. Cortes does not cite a source for this number, and I was unable to find any source to confirm this. In fact, complete data on the number of mail-in ballots rejected in Pennsylvania is not yet available. Ballotpedia lists mail-in rejection rate by state, but does not yet have a number listed for Pennsylvania, or a majority of states. The states that have data listed do not seem unusual compared to past years. A USA Today story states:

Philadelphia's final mail ballot rejection numbers are still being sorted out, along with other counties and states: It could be spring before the full number of rejected ballots is known.

So where could Cortes have gotten this number? I suspect he got it from a justthenews.com article first published on November 6, 2020. This article provides a 0.03% rejection rate, and even suggested that the number raised "potential questions." Cortes published his article 3 days later. The justthenews.com article was later updated with the following correction:

Correction: An earlier version of this story incorrectly reported a premature, overall mail-in ballot rejection rate in Pennsylvania on the basis of a partial, early count of rejected absentee ballots current as of Nov. 5. A final mail ballot rejection rate "is typically not available until some weeks after the election, once all ballots have been canvassed by counties," a spokeswoman for the Pennsylvania Department of State has clarified. In this cycle, she added, "The canvassing of mail ballots in some counties continued through the week of November 9."

Cortes repeated a false rejection rate as evidence of election irregularities. He has not corrected his article.

Conclusion

I have analyzed election data to test the four claims of election irregularities that Steve Cortes makes in his article. None of these claims holds up under closer inspection. Much of his analysis is based on flawed calculation, or complete misinformation. He cherry-picks states to make his points while ignoring nationwide trends. He cites improbable sounding, but statistically meaningless numbers. For example, that Biden doubled Obama's margin in a Pennsylvania county that Obama won by a small margin 8 years earlier.

The areas that Cortes called unusual were not unusual, but there were some places that actually were unusual. For example, the rural, majority Hispanic, Starr County, Texas saw an in turnout of 49% from 2016. The county swung towards Trump by 68% relative to 2012. Trump surpassed Republican senator John Cornyn's margin by 10%, on 18% higher turnout in the presidential election. This suggests the presence of Trump only voters. Every single one of these numbers stand out. These are some of the most extreme swings of any county in the US. Is this evidence of fraud favoring Trump in Starr County?

No. This was simply a county where Trump did extraordinarily well compared to past elections. In a country with over 3,000 counties, some counties are going to be extraordinary. There will also always be trends between elections. In 2020 suburbs shifted blue and rural areas shifted red. If votes didn't change between elections there would be no need to have them every 4 years. Cortes takes a different view. He treats every unexpected outcome as inconceivable. He portrays broad trends as inexplicable anomalies.

All of this serves his article's central thesis: that Biden's victory was statistically improbable. This conclusion is truly absurd. Biden got more votes than Trump in a high turnout election where he overperformed Obama in suburbs and underperformed Obama in cities. He outran some democratic senate candidates, and underran others. There were fewer senate votes than presidential votes. None of this is irregular. None of this is improbable. There is simply no statistical case against Biden's win.

Footnotes

[0]There is a very real and important field that uses statistics to look for election fraud. Cortes does not use these statistical methods, or any other statistical methods. He simply uses individual examples out of context. There are also studies that examine other claims of election irregularities in the 2020 election.

[1]The vast majority of counties in the US are just counties, but not all of them. All the counties in Louisiana are called Parishes. Washington DC is a federal district. Then there are some cities that are their own jurisdictions, and not in any county. Baltimore, MD; Carson City, NV; and St. Louis, MO are all independent cities. The remaining 38 independent cities are in Virginia. In Virginia every town of at least 10,000 becomes an independent city (in theory). Some of these cities share a name with a county in Virginia. Baltimore and St. Louis also share names with a county in Maryland and Missouri respectively. This all makes it hard to avoid false matches (Baltimore City vs Baltimore County), so I decided to classify every jurisdiction in my database as a county, city, parish or district. This process was further complicated by three edge cases, all in Virginia. Charles City County is a county in Virginia. It is not an independent city despite its name. This is also true of James City County, Virginia. Bedford City, Virginia lost its independent status in 2013, and was absorbed into Bedford County. This happened right in the middle of the time period I am analyzing.

[2]It can be very apparent when states do change their rules. For example, Georgia implemented automatic voter registration between 2016 and 2020 and saw turnout increase to record levels.

[3] In this context "Biden only" means a lot of voters probably voted for Biden and not the senate. We don't know whether these voters voted in other races down ballot. It is also possible there were a lot of Trump voters who didn't vote for senate, and a lot of Biden voters who also voted for Senator Sasse. I suspect it was mostly "Biden only" voters, but there is no way to know for certain.