The Bee Blog

Open Data is Pollsters' Next Big Step
17 April 2016, 9:03pm

Opinion polling is nothing new. The first voting intention opinion poll in the UK was one by Gallup in February 1939 and regular polling was undertaken before the 1945 general election and for every election since. But in the last decade, with Internet access and a big increase in online polling, the amount of data publicly available has sky rocketed. Unfortunately, the way that data is presented has hardly changed - and although is now published online, the format is that of static documents that are hard to work with unless the tables are manually transferred into a format useful for computer analysis, and is not standardised in a way that makes the data obvious to understand by lay analysers.

There are one or two exceptions to that; first is Mark Pack's excellently maintained excel spreadsheet of Westminster voting intention polls, and the other, even though I say so myself, is our API for the polls that appear on our website. It is one of the reasons I built this website; after looking for APIs from pollsters, and finding none, I decided to make my own so that others may find it useful. Neither or Mark Pack's data, however, include more that the headline numbers for voting intention of particular elections and referendums. This is basically because the time it would take to manually transfer all the information from the PDF tables produced by polling companies to a database is nearly impossible with the resources that any individual has at their disposal (not to mention it being a pretty unforgiving task - especially in the run up to a big political event).

It is, of course, the next big step for pollsters to provide the public data of the polling they do in a format that, not only can be easily processed by computers, but in a standardised form so any application that gets the data doesn't have to process data differently in order to compare the data from different pollsters.

I am therefore calling on the British Polling Council to develop a standard for polling companies to release public polling information using pollsters own APIs - I suggest in an XML or JSON format. By 2020, open data will be how information (particularly quantitative data) is distributed around the Internet, and it will probably be Google, rather than me or anyone else, that the public go to in order to get that data. Now is the time to get public polling data future-proofed.