Collective Choice: Polling, Voting, and the '06 Elections
by Shannon Appelcline
When we wrote our original article on Collective Choice, Christopher and I talked about polls, and we clearly defined them as a related area to "selection" (or voting) systems, because the purpose of an "opinion" (or polling) system is to determine how a vote will come out. We also outlined two major systems, pre-voting systems ("how are you going to vote?") and post-voting systems are ("how did you vote?").
Now it's two days after a mid-term US election, and thus after a year (or more) of polling we've finally got some votes too, and together they form an interesting and informative comparison on how these two collective choice systems hang together, and what the strengths of each are.
Post-vote polling has been largely on the outs since the 2004 election, so this time I'm just going to look at its pre-vote cousin.
Most polls are pre-vote. Candidates and news organizations are trying to figure out how a vote is going to turn out. Candidates want to know how they're doing, if they should dump more money into a race, if they should change tactics, or if they should give up. Conversely news organizations are trying to provide a public service by reporting on the current state of things.
In a typical poll a small percentage of the total voting class is contacted, and they're asked how they're going to vote. From there the pollster then tries to figure out how those answers correspond to actual votes.
The process of polling (at least, as a predictor of voting) implicitly has a number of flaws in it.
First up you hit margin of error (which we've briefly discussed before). In short, the fewer the people you poll, the more likely the chance that your polling is innaccurate. There's different formulas for this depending on how high you want your confidence rating to be, but the short answer is: if you poll thousands of people you can start getting your margin of error down to a few percent.
Still, that's pretty bad if you're trying to poll a close vote. Say you have a 2% margin of error with 90% confidence and you poll two candidates at 47% and 45% That means that you're 90% confident that the first candidate would get 45-49% of the votes and the second candidate will get 43-47% of the vote. In other words, you have no idea who'll win.
Second you have to poll a group of people who actually represent the people who will vote. In politics pollsters thus usually differentiate between "registered voters" and "likely voters". Polling registered voters might give a larger sample size, and thus a smaller margin of error, but it's probably going to be less accurate than polling likely voters, especially when certain categories of people regularly are registered voters but not likely voters. (In US politics Democrats make up a larger percentage of registered than likely voters, as an example, so polls of registered voters generally make it look like Democrat is doing better than polls of likely voters.)
And even if you're actually polling people who are likely to vote, you still have to make sure that your sample conforms to what the general population looks like. Polling firms generally have arcane formulas that they use to turn their polling answers into actual results. For example, they might find that Democrats are considerably more likely to answer their questions than Republicans. As a result they have to give every self-identified Democratic response in their poll less weight and every self-identified Republican response in the poll more weight.
Third, and here's somewhere that polling entirely falls down, you have to make the polling experience similar visually and aesthetically to the voting experience. For example this year in Connecticut Joe Lieberman was running as an Independent candidate against Democrat Ned Lamont and Republican Alan Schlesinger. Plenty of polls asked people which of the three they'd vote for, but as far as I know none of them represented the visceral fact that the Republican and Democrat would be presented first on the ballot, while Lieberman's space on the ballot was seventh under an independent title.
Pre-Vote Polling Remarks
Remarkably, the pre-vote polling in 2006 did a pretty good job on the major races.
As noted, the Lieberman-Lamont race was a hard one to poll. Some folks even said that polling a three-way race was beyond modern-day pollsters. However, their answers came out pretty good. Here's what the polls looked like in the week before the election, with 95% confidence intervals used as Margin of Error (which is calculated as .98/SQRT(n), where n is the size of the sample):
Overall relatively accurate, though it looks like the Republican got generally underpolled, which was probably the result of the lack of visuals in polling, and which would have made a difference if he was tighter in the race.
The site these numbers were drawn from is pollster.com, which did its best to provide useful results by averaging multiple polls from multiple sources. Their final call on the Lieberman race using this methodology was Lieberman winning by 10%, at 49%-39% over Lamont, which turned out to be a dead-on prediction.
On the whole pollster's results were good. Their final call for the Senate overall was 49 Republicans, 47 Democrats or Independents, and 4 Tossups. As of this writing, it looks like all four tossups fell in the direction their polls had (marginally) predicted--though two were tossups to the end.
Overall Pollster.com's poll consolidation proved a highly accurate methodology. They ultimately couldn't remove the margin of error even by average multiple polls, and so their percentages were off, and sometimes a the marginal differences in a toss-up didn't predict an acutal winner. But, from what I've seen thus far, the polling categories (either choosing a winner or listing something as a toss-up) were very close to 100% right. The problem with using this type of methodology is, of course, feeling good about the data you're putting in. Are the polls accurate? And, even if they are, do you have enough polling for a less popular race to be able to really offer the benefits of poll averaging that you purport?
Pre-Vote Polling & The Internet
When Chris & I wrote our previous articles on Collective Choice, one of the main purposes was to look at the topics in relation to the Internet.
Some companies have already started using the Internet for polls. In the chart above, the 11/05/06 poll conducted by Polimetrix was Internet-driven. It was also the only poll in the final weeks which suggested that the vote was still in contention at the 95% level. One other polling company (Zogby) regularly did Internet polls on this race, and they also regularly polled higher for Lamont than anything else ... and like that final 11/05/06 poll they turned out to be a ways off.
This outlines the core problem of using the Internet to poll on real-world votes. You're self-selecting for a more affluent, more technologically-experienced group, and it looks like polling companies don't have their formulas right yet to turn that into the numbers for an actual vote (if it's even possible at this stage in the penetration of the Internet into American society).
It would take some much more extensive research into the polling data leading up to this election to see how general the issue is, or if it was just a quirk in the Connecticut data.
Clearly Internet polling for Internet voting will be much more reliable than Internet polling for Real-world voting. In many ways it will resolve many of the issues with polls, because you can present polls that look just like the votes, and you can do your best to isolate the exact same clientele. If anything the question that will ultimately rise will be "Why poll rather than vote?" It makes sense for something like an election, where someone is elected to serve for a set amount of time, but if a group is trying to make a decision, a poll would be pretty superfluous if a vote could be accomplished just as easily.
One other issue that comes up with Internet polling is figuring out who gets polled. In real-life polls, participant selection is entirely driven by the pollster. This allows for random selection, which is entirely required for a poll to be accurate. Conversely, this type of query-based polling is totally at odds with Internet technology, where instead people tend to self-select for answering poll questions, which not only skews the base from the start, but also allows for massive fraud. Multiple special interest groups now regularly rally their base when a poll comes up anywhere on the 'net related to their particular area of influence.
Until a method comes to either push out polls to random individuals or else randomize self-selected input, polling driven by the Internet will be largely useless.
Pre-Vote Polling & Gaming
And I want to close out with my own balliwick: game design.
There are definitely games where voting is an interesting and relevent part of gameplay. Among tabletop board games we find Democrazy by Bruno Faidutti, where you try and stage votes which will increase the value of your personal items, but not those of your opponents. Quo Vadis? is another, this time with some opportunity for negotiation and trading items of value. I've also written about Survivor and Big Brother, which are essentially voting-based reality TV shows.
Could you introduce polling into a game too?
One method has actually been used in Survivor a few times. Everyone secretly votes how they think the group as a whole will answer a question. "Who is the most annoying Survivor?", "Who is the prettiest?", etc. It's all the fun of Senior Year at High School without the Prom at the end: you get a poll and a vote at the same time and you get rewarded for voting with the majority rather than voting for something that benefits you.
Other methods could distance the poll and the vote further from each other, creating a sort of futures market.
The area largely hasn't been explored yet, and I'd be eager to see some serious adaptations of polling to gaming.
The recent US midterm election offers a good opportunity to look at polling and its various deficits. As a collective choice method it's a sort of half-hearted one, because it depends totally upon its successor, voting, but as long as it stays relevent (until every vote is conducted instantly on the Internet without a need to poll), it remains a valid area of study.