Weight, weight, don’t tell me: how to bias a poll

1
Oct

Weight, weight, don’t tell me: how to bias a poll

Last week I noted that a prominent polling analyst Nate Silver of the New York Times FiveThirtyEight blog tweeted that national polls on the Presidential race are no longer making any sense.

But Republicans want to believe (a) that the media is biased against them, and (b) that Romney is winning no matter what the polls show.

Into this environment enters Unskewedpolls.com, which attempts to “correct” publicly-released polls for perceived media bias against Republicans.

Our goal is to NOT be skewed or confirm pre-conceived results. This election season has seen so many polls heavily skewed and poorly weighted, that show skewed and inaccurate results. The purpose of this poll is to present the most accurate results possible based on the latest of what is known about the electorate and voter behavior and other statistical information available.

The QStarNews poll works with the premise that the partisan makeup of the electorate 34.8 percent Republicans, 35.2 percent Democrats and 30.0 percent independent voters. Additionally, our model is based on the electorate including approximately 41.0 percent conservatives, 20.0 percent moderates and 39.0 percent liberals.

Republicans are 89 percent conservative, 9 percent moderate and 2 percent liberal. Among Democrats, 3 percent are conservative, 23 percent are moderate and 74 percent are liberal. Independents include 33 percent conservatives, 49 percent moderates and 18 percent liberals.

Our polls about doubly-weighted, to doubly insure the results are most accurate and not skewed, by both party identification and self-identified ideology. For instance, no matter how many Republicans answer our survey, they are weighted at 34.8 percent. If conservatives are over-represented among Republicans in the raw sample, they are still weighted at 89 percent of Republicans regardless. This system of double weighting should insure our survey produces very accurate results, not skewed either way for the Democrats or for the Republicans.

Dick Morris has jumped on the bandwagon, claiming that “the polls that are out there are all misleading.” Morris does make a couple good points that should be considered.

1. All of the polling out there uses some variant of the 2008 election turnout as its model for weighting respondents and this overstates the Democratic vote by a huge margin.

But 2008 was no ordinary election.

Almost all pollsters are using the 2008 turnout models in weighting their samples.Rasmussen, more accurately, uses a mixture of 2008 and 2004 turnouts in determining his sample. That’s why his data usually is better for Romney.

But polling indicates a widespread lack of enthusiasm among Obama’s core demographic support due to high unemployment, disappointment with his policies and performance, and the lack of novelty in voting for a black candidate now that he has already served as president.

If you adjust virtually any of the published polls to reflect the 2004 vote, not the 2008 vote, they show the race either tied or Romney ahead, a view much closer to reality.

2. Almost all of the published polls show Obama getting less than 50% of the vote and less than 50% job approval. A majority of the voters either support Romney or are undecided in almost every poll.

But the fact is that the undecided vote always goes against the incumbent.

Let me state first that I am wary of people making categorical statements about polling, and Morris contradicts himself when he says in the same article that “All of the polling out there uses some variant of the 2008 election turnout as its model for weighting respondents” and “Almost all pollsters are using the 2008 turnout models in weighting.”

What is weighting and why is it important?

Let’s talk first about what weighting is, before we discuss why and when it is appropriate. Weighting is a statistical process used to compensate for differences between the makeup of a survey sample and known qualities of the electorate.

For example, if we our survey yields 65% female respondents and 35% male respondents, while historical turnout figures peg women as around 53-55% of most iterations of the electorate, we know that our sample is gender-biased. We might weight each male respondent by a factor of about 1.28 so that after weighting, males account for about 45% of the weighted sample. We would likewise weight female respondents by a factor of about .846 so that they represent about 55% of the electorate.

When we weight a survey, we should include a methodology statement that discusses factors for which we weighted, among other practices we follow. As a member of the American Association for Public Opinion Research, I release data in accordance with the Association’s Code of Professional Practices. This is important to me as both a professional, and as a consumer of polling data. Be skeptical of any pollster who does not make the requisite disclosures.

My belief and general practice is that it is appropriate to weight survey results when they are out of line with known historical facts and trends. The example above involving gender would clearly qualify.

But what is going on out there with amateurs like UnskewedPolls “re-weighting” polls by partisan identification would not qualify, because partisan ID is not an unchanging fact like gender or age, but an opinion or self-identification that is subject to change over time.

The editor in chief of Gallup has a great discussion of this.

Party identification is basically an attitudinal variable, not a stable population parameter. It is designed to vary. This is distinct from demographic variables such as age, gender, race/ethnicity, and education, which are, generally speaking, stable indicators measured by the U.S. Census Bureau. The only issues relating to demographic variables are measurement concerns — e.g., how the census, which creates the targets, measures ethnicity versus how individual pollsters measure it. But, generally speaking, these are fairly stable targets.

Party identification is not measured by the U.S. Census Bureau, nor are there any other official state or national standards for what party identification “should be” in terms of the percent per party as it relates to the general population.

Many people use the exit polls as a standard. But exit polls use a distinct question wording, a different methodology (in person interviews at the polling place as opposed to telephone interviews), a different environment (people are asked their party identification just after having voted, which could affect how they answer), and different sampling techniques to develop who it is that is asked the question. So party identification figures as measured by a specific poll aren’t easily compared to party identification as measured by an exit poll because of these and other potential issues.

Party identification changes as political tides change. General shifts in the political environment can affect party identification just as they can affect presidential job approval and results of the “Who are you going to vote for?” question.

We know that party identification moves over time — sometimes in very short periods of time, just like other political variables.  Generally, if there is a political tide toward either of the two major parties, all questions we ask that are of a political nature will move in that direction. This includes the ballot, job approval, party identification, among others.

But it’s not necessarily the case that pollsters are weighting the samples so that the party identification matches. Mark Blumenthal writes about this:

Claims that media polls “assume” a specific partisan or demographic composition of the electorate are mostly false. The pollsters behind most of the national media surveys, including those who conduct the CBS/New York Times/Quinnipiac, NBC/Wall Street Journal/Marist and Washington Post polls, all use the same general approach: They do not directly set the partisan or demographic composition of their likely voter samples. They first sample adults in each state, weighting the demographics of the full adult sample (for characteristics such as gender, age, race and education) to match U.S. Census estimates for the full population. They then select “likely voters” based on questions answered by the respondents, without making any further adjustments to the sample’s demographics or partisanship.

There are pollsters that weight the subset of “likely voters” by party or to match very specific assumptions about the demographics of those they expect to vote. However, such practices are generally shunned by the national media surveys whose recent results have drawn most of the “skewed poll” criticism.

Let me give you a concrete example of how a pollster might weight his sample for known demographics within a certain electorate and how it would act to change the party ID numbers.

According to the Secretary of State’s website, the age distributions of the 2008 and 2010 General elections were:

Note that there is a 10.71 point difference between the percentage of 18-34 year olds between the 2008 General Election and the 2010 contest.

If you conducted a poll and your results mirrored the 2010 election, but you believe that the 2012 electorate will be more like 2008, you will weight the respondents in that age group to effectively add ten points to their representation in the sample.

Now let’s see how simply weighting for age could affect the results of our survey. The age range matchup isn’t exact, but we’ll pretend it is: on Friday, Gallup released a survey in which it found that among respondents, voters under 30 favored Obama over Romney by 59% to 33%. If we apply this to the 10.71 points we’re adding to each candidate by weighting to reflect 2008 turnout by age, Obama gains 6.32 points, while Romney gains 3.53 points, for a net Obama gain of about 2.79 points. The math isn’t exact here, but it illustrates what the effect will be of raising the percentage of young voters.

If you repeat that process for race and any other factors you may wish, you can soon end up with weighted “results” that are a far cry from your actual results. And here’s the salient point: once you start weighting for demographics, you introduce your own biases.

This bias creep doesn’t have to be part of a conspiracy to suppress Republican votes. I’ve talked to Democratic pollsters and political operatives who sincerely believe and can point to evidence to support their assertion that 2012 will be like 2008. Likewise, I know Republican who plausibly and sincerely believe the opposite.

I can’t tell you “scientifically,” whether the 2012 electorate will be more like four years ago or two years ago. That’s where experience and expertise can hopefully guide a pollster to an accurate assessment of the electorate. That’s where the art comes into polling.

Here’s the money quote from Dick Morris:

Any professional pollster (those consultants hired by candidates not by media outlets) would publish two findings for each poll — one using 2004 turnout modeling and the other using 2008 modeling. This would indicate just how dependent on an unusually high turnout of his base the Obama camp really is.

In my case, for paying clients this year, I’m modeling my analysis on both 2010 and 2008 and presenting both where it’s appropriate. But generally, I think weighting in polls is like hot sauce in cooking: a little bit goes a long way.

Comments ( 1 )