For the hundred days or so leading up to the election, I was scraping oddschecker listed betting odds for most of the UK parliamentary constituencies (I didn't check that I was scraping them all; this was just a side-side-project...).
I didn't look at the data at all in the run-up to the election (my original plan was to look at timeseries within each constituency to try to detect sudden changes in odds that might indicate some sort of major shift in sentiment in each constituency), but with things all settled now, I thought I'd look to see what tales - if any - the betting data might tell, at least at a high level. For example, what did the betting odds have to say about the likely number of seats taken by each party...?
This notebook was developed as part of an exploration into possible forms a student produced notebook could take as part of an assessment process for a course on data management and analysis using a dataset selected by a student. Intended student worktime for the assessment: <10 hours. Comments appreciated...
#I'm going to use the pandas library to analyse the data
import pandas as pd
The data can be pulled directly from the scraper - https://morph.io/psychemedia/electionodds. Using the API, we can write a SQLite query to grab the odds for a particular day:
select * from 'Constituency2015GE' constituency where strftime('%d-%m-%Y',time)="06-05-2015"
The data can be pulled directly into a pandas dataframe from the morph.io scraper API:
#df=pd.read_csv('https://api.morph.io/psychemedia/electionodds/data.csv?key='+SECRETKEY+'&query=select%20*%20from%20%27Constituency2015GE%27%20constituency%20where%20strftime(%27%25d-%25m-%25Y%27%2Ctime)%3D%2206-05-2015%22)
#Or you can download the electionodds data and pop it into a file...
#Don't believe the filename - the data is data collected at some point on May 5th
df=pd.read_csv('electionodds_thurs.csv')
Let's just get a quick view over the data to familiarise ourselves with what it contains.
#Preview the data
df.head()
time | bookie | party | odds | oddsraw | constituency | |
---|---|---|---|---|---|---|
0 | 2015-05-05T09:54:19+00:00 | LD | green | 100.00 | 100 | aberavon |
1 | 2015-05-05T09:54:19+00:00 | FB | pc | 50.00 | 50 | aberavon |
2 | 2015-05-05T09:54:19+00:00 | LD | pc | 50.00 | 50 | aberavon |
3 | 2015-05-05T09:54:19+00:00 | WH | pc | 50.00 | 50 | aberavon |
4 | 2015-05-05T09:54:19+00:00 | FB | labour | 0.01 | 1/100 | aberavon |
The odds
column is simply a decimalised version of the oddsraw
value. Note that this is not strictly the decimal odds, which would be 1 greater.
#What bookies did we collect data for?
df['bookie'].unique()
array(['LD', 'FB', 'WH', 'B3'], dtype=object)
#How many constituencies did we grab data for?
len(df['constituency'].unique())
650
#What is the range of odds offered?
(df['odds'].min(), df['odds'].max())
(0.001, 9999.0)
#What parties did we collect data for and in what numbers?
#Note that we are likely to count the same party several times in each constituency,
# once for each bookmaker offering odds on that party in that constituency
df['party'].value_counts().head(15)
labour 2103 liberal democrats 2014 ukip 1977 conservatives 1945 any other party or candidate 600 greens 580 green 273 snp 224 conservative 159 tusc 130 green party 105 pc 91 liberal democrat 85 alliance 57 sinn fein 57 dtype: int64
The count of parties appearances in the dataset is not necessarily very useful, becuase the same party may count more than several times in the same constituency given the odds from different bookmakers. However, the count does clearly show us that there are multiple possible representations of what are presumably the same party name.
Note there are several opportunities in the party
column at least for cleaning the data - for example, green, greens and green party are likely the same party, as are conservative and conservatives. A quick and dirty cleaning approach would be to right strip "s", replace occurrences of "party" at the end of the string, and then strip()
just to clear away any whitespace.
df['party_clean']=df['party'].str.rstrip('s').str.replace(r'party$','').str.strip()
df['party_clean'].value_counts()
conservative 2104 labour 2103 liberal democrat 2099 ukip 1977 green 958 any other party or candidate 600 snp 224 tusc 130 pc 91 alliance 57 sinn fein 57 sdlp 56 dup 53 uup 46 any other 45 ... lorraine morgan-brinkhurst 1 alfred okam 1 chaka artwell 1 john neville hobb 1 any other independant 1 les tallon-morri 1 the whig 1 christopher tompson 1 robin lambert 1 residents for uttlesford 1 james kirkcaldy 1 europeans 1 roy ivinson 1 christopher gray 1 criag pond 1 Length: 309, dtype: int64
#Inspect the full range of parties, if required, perhaps as basis for further cleaning
#df['party_clean'].unique()
#A more advanced approach might be to run the names through a clustering algorithm,
#or partial string matcher, to see if there aare any near collisions that should be combined
If we wanted a strict decimal odds column, we could simply add 1 to the odds
column:
df['decimal_odds'] = df['odds']+1
df.head()
time | bookie | party | odds | oddsraw | constituency | party_clean | decimal_odds | |
---|---|---|---|---|---|---|---|---|
0 | 2015-05-05T09:54:19+00:00 | LD | green | 100.00 | 100 | aberavon | green | 101.00 |
1 | 2015-05-05T09:54:19+00:00 | FB | pc | 50.00 | 50 | aberavon | pc | 51.00 |
2 | 2015-05-05T09:54:19+00:00 | LD | pc | 50.00 | 50 | aberavon | pc | 51.00 |
3 | 2015-05-05T09:54:19+00:00 | WH | pc | 50.00 | 50 | aberavon | pc | 51.00 |
4 | 2015-05-05T09:54:19+00:00 | FB | labour | 0.01 | 1/100 | aberavon | labour | 1.01 |
To start with, let's focus on the odds offered by a single bookmaker. To choose which bookie, let's see how many constituencies each of them have prices for:
for bookie in df['bookie'].unique():
print('{}: {}'.format(bookie,len(df[df['bookie']==bookie]['constituency'].unique())))
LD: 627 FB: 648 WH: 648 B3: 239
So a good candidate would be WH (William Hill) or LD (Ladbrokes). Let's go with the former...
#Grab the data for a particular bookie into a separate dataframe
df_wh=df[df['bookie']=='WH']
One thing we might want to do over the full dataset is see what the odds were for a partcular constituency offered by a particular bookmarker. We can write a simple convenience function to help us do that.
#Write a convenience function to look up odds by constituency
def oddsForConstituency(df,bookie,constituency):
''' Function to find rows associated with a particular bookie in a particular constituency '''
filterView= df[(df['bookie'].str.upper()==bookie.upper()) &
(df['constituency'].str.lower()==constituency.lower())]
return filterView
oddsForConstituency(df,'WH','berwickshire-roxburgh-and-selkirk')
time | bookie | party | odds | oddsraw | constituency | party_clean | decimal_odds | |
---|---|---|---|---|---|---|---|---|
919 | 2015-05-05T09:56:50+00:00 | WH | snp | 2.25 | 9/4 | berwickshire-roxburgh-and-selkirk | snp | 3.25 |
923 | 2015-05-05T09:56:50+00:00 | WH | labour | 150.00 | 150 | berwickshire-roxburgh-and-selkirk | labour | 151.00 |
927 | 2015-05-05T09:56:50+00:00 | WH | liberal democrats | 2.00 | 2 | berwickshire-roxburgh-and-selkirk | liberal democrat | 3.00 |
935 | 2015-05-05T09:56:50+00:00 | WH | conservatives | 1.20 | 6/5 | berwickshire-roxburgh-and-selkirk | conservative | 2.20 |
The data as it stands is in a relatively tidy (Third Normal Form) long format. Each row contains a single observation that associates the odds for a single party with a particular bookmaker in each constituency.
If we wanted to compare the odds for particular parties within each constituency, it might be more convenient to put the data into a wide format, with a separate column for the odds offered for each party, indexed by cosntituency:
dfp=df_wh.pivot('constituency','party_clean','odds')
dfp.head()
party_clean | al murray - the pub landlord | alliance | bez | bnp | claire wright | conservative | dup | green | ian steven | john bercow | ... | respect | robin scott | sdlp | sinn fein | snp | stephen picton | sylvia hermon | tuv | ukip | uup |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
constituency | |||||||||||||||||||||
aberavon | NaN | NaN | NaN | NaN | NaN | 100.0 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 16 | NaN |
aberconwy | NaN | NaN | NaN | NaN | NaN | 0.4 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 80 | NaN |
aberdeen-north | NaN | NaN | NaN | NaN | NaN | 150.0 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 0.100000 | NaN | NaN | NaN | 150 | NaN |
aberdeen-south | NaN | NaN | NaN | NaN | NaN | 50.0 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 0.142857 | NaN | NaN | NaN | 250 | NaN |
aberdeenshire-west-and-kincardine | NaN | NaN | NaN | NaN | NaN | 5.5 | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | 0.142857 | NaN | NaN | NaN | NaN | NaN |
5 rows × 30 columns
#Preview all the columns
dfp.columns
Index([u'al murray - the pub landlord', u'alliance', u'bez', u'bnp', u'claire wright', u'conservative', u'dup', u'green', u'ian steven', u'john bercow', u'john blackie', u'kerry smith', u'labour', u'lib dem', u'liberal democrat', u'national health action', u'ni conservative', u'pc', u'pirate', u'plaid cymru', u'respect', u'robin scott', u'sdlp', u'sinn fein', u'snp', u'stephen picton', u'sylvia hermon', u'tuv', u'ukip', u'uup'], dtype='object')
If we wanted to limit the dataset to just major parties - that is, parties contesting a large number of seats - we could reduce the dataset as follows:
party_subset=['green','labour','liberal democrat','conservative','snp','ukip']
#Limit rows in long dataset to selected parties
example=df_wh[df_wh['party_clean'].isin(party_subset)]
example['party_clean'].unique()
array(['labour', 'liberal democrat', 'ukip', 'conservative', 'snp', 'green'], dtype=object)
#Limit columns in wide dataset to selected parties
example=dfp[party_subset]
example.columns
Index([u'green', u'labour', u'liberal democrat', u'conservative', u'snp', u'ukip'], dtype='object')
However, there are significant risks in taking this sort of approach. For example, where an individual is standing in a particular consituency, perhaps under their own name, who has a good chance of winning, their candidacy would not be captured in the reduced dataset.
Perhaps a better dataset would be one that includes all major parties and any candidates who appear to have a reasonable chance of winning (say, 10 to 1 or better).
One reduced dataset we might be interested in working with are the Scottish constituencies. We can use the fact that the SNP had a candidate in the seat as a proxy for which constituencies should be included in this set.
One way of identifying those seats is to filter the wide data set (which uses constituencies as the index values) to exclude rows in the SNP column with a null (NaN
) value:
dfp['snp'].dropna().index.values
array(['aberdeen-north', 'aberdeen-south', 'aberdeenshire-west-and-kincardine', 'airdrie-and-shotts', 'angus', 'argyll-and-bute', 'ayr-carrick-and-cumnock', 'ayrshire-central', 'ayrshire-north-and-arran', 'banff-and-buchan', 'berwickshire-roxburgh-and-selkirk', 'caithness-sutherland-and-easter-ross', 'coatbridge-chryston-and-bellshill', 'cumbernauld-kilsyth-and-kirkintill', 'dumfries-and-galloway', 'dumfriesshire-clydesdale-and-tweeddale', 'dunbartonshire-east', 'dunbartonshire-west', 'dundee-east', 'dundee-west', 'dunfermline-and-west-fife', 'east-kilbride-strathaven-and-lesma', 'east-lothian', 'edinburgh-east', 'edinburgh-north-and-leith', 'edinburgh-south', 'edinburgh-south-west', 'edinburgh-west', 'falkirk', 'fife-north-east', 'glasgow-central', 'glasgow-east', 'glasgow-north', 'glasgow-north-east', 'glasgow-north-west', 'glasgow-south', 'glasgow-south-west', 'glenrothes', 'gordon', 'inverclyde', 'inverness-nairn-badenoch-and-strathspey', 'kilmarnock-and-loudoun', 'kirkcaldy-and-cowdenbeath', 'lanark-and-hamilton-east', 'linlithgow-and-falkirk-ea', 'livingston', 'midlothian', 'moray', 'motherwell-and-wishaw', 'na-h-eileanan-an-iar', 'ochill-and-sth-perthshire', 'orkney-and-shetland', 'paisley-and-renf-north', 'paisley-and-renf-south', 'perth-and-n-perthshire', 'renfrewshire-east', 'ross-skye-and-lochaber', 'rutherglen-and-hamilton-west', 'stirling'], dtype=object)
The safest seats are the seats with the shortest (smallest) odds. If we sort the table by increasing odds and select the top few, that should give us the most secure seats.
df_wh.sort('odds',ascending=True).head(10)
time | bookie | party | odds | oddsraw | constituency | party_clean | decimal_odds | |
---|---|---|---|---|---|---|---|---|
8438 | 2015-05-05T10:14:59+00:00 | WH | conservatives | 0.005 | 1/200 | runnymede-and-weybridge | conservative | 1.005 |
5376 | 2015-05-05T10:07:32+00:00 | WH | conservatives | 0.005 | 1/200 | hertfordshire-ne | conservative | 1.005 |
5362 | 2015-05-05T10:07:27+00:00 | WH | conservatives | 0.005 | 1/200 | hertford-and-stortford | conservative | 1.005 |
5319 | 2015-05-05T10:07:20+00:00 | WH | conservatives | 0.005 | 1/200 | henley | conservative | 1.005 |
1628 | 2015-05-05T09:58:35+00:00 | WH | conservatives | 0.005 | 1/200 | brentwood-and-ongar | conservative | 1.005 |
10097 | 2015-05-05T10:19:14+00:00 | WH | labour | 0.005 | 1/200 | tyneside-north | labour | 1.005 |
8941 | 2015-05-05T10:16:17+00:00 | WH | labour | 0.005 | 1/200 | south-shields | labour | 1.005 |
5029 | 2015-05-05T10:06:43+00:00 | WH | conservatives | 0.005 | 1/200 | hampshire-east | conservative | 1.005 |
4989 | 2015-05-05T10:06:39+00:00 | WH | labour | 0.005 | 1/200 | halton | labour | 1.005 |
4916 | 2015-05-05T10:06:27+00:00 | WH | labour | 0.005 | 1/200 | hackney-south-and-shoreditch | labour | 1.005 |
This is not simply a question of finding the constituency with the longest odds, but a question about finding the constituency with the longest (largest) favourite's odds.
If we find the minimum value across each row in the wide dataset, ignoring the missing values, we get the odds of the favourite. We can then sort on that value in a descending fashion to find the constituencies with the longest favourite odds.
dfp.min(axis=1).order(ascending=False).head(10)
constituency berwickshire-roxburgh-and-selkirk 1.200000 dumfries-and-galloway 0.909091 edinburgh-south 0.909091 northampton-north 0.909091 torbay 0.909091 st-ives 0.833333 halesowen-and-rowley-regis 0.833333 pudsey 0.833333 finchley-and-golders-green 0.800000 cornwall-north 0.800000 dtype: float64
We can ask this question by filtering the long dataset using two criteria combined using a Boolean operator:
df_wh[ (df_wh['party_clean']=='green') & (df_wh['odds']<=10) ]
time | bookie | party | odds | oddsraw | constituency | party_clean | decimal_odds | |
---|---|---|---|---|---|---|---|---|
1713 | 2015-05-05T09:58:49+00:00 | WH | greens | 0.222222 | 2/9 | brighton-pavilion | green | 1.222222 |
1788 | 2015-05-05T09:58:56+00:00 | WH | greens | 4.500000 | 9/2 | bristol-west | green | 5.500000 |
7454 | 2015-05-05T10:12:36+00:00 | WH | greens | 5.000000 | 5 | norwich-south | green | 6.000000 |
Trivially, we might think to sort the parties by each constituency in terms of increasing odds, then pick the one with the lowest odds.
If we sort the data frame in order of increasing odds, and then group by constituency, the order of the rows within each group will be in increasing order of odds.
df_wh.sort('odds', ascending=True).groupby('constituency', as_index=False).get_group("aberavon")
time | bookie | party | odds | oddsraw | constituency | party_clean | decimal_odds | |
---|---|---|---|---|---|---|---|---|
6 | 2015-05-05T09:54:19+00:00 | WH | labour | 0.01 | 1/100 | aberavon | labour | 1.01 |
13 | 2015-05-05T09:54:19+00:00 | WH | ukip | 16.00 | 16 | aberavon | ukip | 17.00 |
3 | 2015-05-05T09:54:19+00:00 | WH | pc | 50.00 | 50 | aberavon | pc | 51.00 |
10 | 2015-05-05T09:54:19+00:00 | WH | liberal democrats | 100.00 | 100 | aberavon | liberal democrat | 101.00 |
16 | 2015-05-05T09:54:19+00:00 | WH | conservatives | 100.00 | 100 | aberavon | conservative | 101.00 |
If we pick the first()
row in each group, we can generate a dataframe that contains the a single row for each consituency identifying a party with those best odds.
We can then group these rows according to the cleaned party name, and count how many rows correspond to each party, ordering the result to show the most heavily favourited party first.
likelyparty=df_wh.sort('odds', ascending=True).groupby('constituency', as_index=False).first()
likelyparty.groupby('party_clean').size().order(ascending=False)
party_clean conservative 278 labour 263 snp 55 liberal democrat 26 dup 9 sinn fein 4 ukip 3 sdlp 3 pc 2 sylvia hermon 1 respect 1 plaid cymru 1 john bercow 1 green 1 dtype: int64
Interpreting this naively, we see there are 278 seats with the Conservatives as favourite, 263 with Labour as favourite, and so on.
However, this approach would incorrectly predict seats where there are joint favourites, if there are any. Let's see if we can identify constituency seats where low odds are tied...
#Start by considering short odds
#then group by odds in each constituency
#count the rows in each group
#order the result
#and show the top few results
df_wh[df_wh['odds']<=2] \
.groupby(['odds','constituency']) \
.size() \
.order(ascending=False) \
.head()
odds constituency 0.833333 pudsey 2 0.909091 northampton-north 2 torbay 2 0.833333 halesowen-and-rowley-regis 2 0.015152 workington 1 dtype: int64
My reading of this is that there are two parties tied on odds of 0.8333 in Pudsey. Let's check:
df_wh[df_wh['constituency']=='pudsey']
time | bookie | party | odds | oddsraw | constituency | party_clean | decimal_odds | |
---|---|---|---|---|---|---|---|---|
8003 | 2015-05-05T10:13:53+00:00 | WH | labour | 0.833333 | 5/6 | pudsey | labour | 1.833333 |
8009 | 2015-05-05T10:13:53+00:00 | WH | liberal democrats | 100.000000 | 100 | pudsey | liberal democrat | 101.000000 |
8013 | 2015-05-05T10:13:53+00:00 | WH | ukip | 40.000000 | 40 | pudsey | ukip | 41.000000 |
8017 | 2015-05-05T10:13:53+00:00 | WH | conservatives | 0.833333 | 5/6 | pudsey | conservative | 1.833333 |
If we wanted a long dataset containing rows where the odds are tied within a constituency, we could use the following filter command:
#Limit rows to rows in constituencies where there are parties with the same odds
#That is, where there is more than one member in groups of odds by constituency
df_sameodds=df_wh.groupby(['odds','constituency']).filter(lambda x: len(x) > 1)
df_sameodds.sort(['odds','constituency']).head()
time | bookie | party | odds | oddsraw | constituency | party_clean | decimal_odds | |
---|---|---|---|---|---|---|---|---|
4933 | 2015-05-05T10:06:29+00:00 | WH | labour | 0.833333 | 5/6 | halesowen-and-rowley-regis | labour | 1.833333 |
4948 | 2015-05-05T10:06:29+00:00 | WH | conservatives | 0.833333 | 5/6 | halesowen-and-rowley-regis | conservative | 1.833333 |
8003 | 2015-05-05T10:13:53+00:00 | WH | labour | 0.833333 | 5/6 | pudsey | labour | 1.833333 |
8017 | 2015-05-05T10:13:53+00:00 | WH | conservatives | 0.833333 | 5/6 | pudsey | conservative | 1.833333 |
7387 | 2015-05-05T10:12:26+00:00 | WH | labour | 0.909091 | 10/11 | northampton-north | labour | 1.909091 |
If we want to get a feel for which party is second favourite (or tied on joint odds with the "first" favourite), we can use the nth()
rather than first()
method on the odds sorted, constituency grouped form of the dataset, noting that nth()
starts counting with index 0, so nth(1)
corresponds to the second item in the ordered list:
secondparty=df_wh.sort('odds', ascending=True).groupby('constituency', as_index=False).nth(1)
secondparty.head()
time | bookie | party | odds | oddsraw | constituency | party_clean | decimal_odds | |
---|---|---|---|---|---|---|---|---|
4933 | 2015-05-05T10:06:29+00:00 | WH | labour | 0.833333 | 5/6 | halesowen-and-rowley-regis | labour | 1.833333 |
8003 | 2015-05-05T10:13:53+00:00 | WH | labour | 0.833333 | 5/6 | pudsey | labour | 1.833333 |
4364 | 2015-05-05T10:05:10+00:00 | WH | labour | 0.909091 | 10/11 | finchley-and-golders-green | labour | 1.909091 |
9960 | 2015-05-05T10:18:59+00:00 | WH | conservative | 0.909091 | 10/11 | torbay | conservative | 1.909091 |
2871 | 2015-05-05T10:01:50+00:00 | WH | liberal democrats | 0.909091 | 10/11 | cornwall-north | liberal democrat | 1.909091 |
That result for pudsey looks a little odd? In the data frame above, the conservative entry had a higher index value, so why is the labour party listed as the second item? Let's check:
df_wh.sort('odds', ascending=True).groupby('constituency', as_index=False).get_group("pudsey")
time | bookie | party | odds | oddsraw | constituency | party_clean | decimal_odds | |
---|---|---|---|---|---|---|---|---|
8017 | 2015-05-05T10:13:53+00:00 | WH | conservatives | 0.833333 | 5/6 | pudsey | conservative | 1.833333 |
8003 | 2015-05-05T10:13:53+00:00 | WH | labour | 0.833333 | 5/6 | pudsey | labour | 1.833333 |
8013 | 2015-05-05T10:13:53+00:00 | WH | ukip | 40.000000 | 40 | pudsey | ukip | 41.000000 |
8009 | 2015-05-05T10:13:53+00:00 | WH | liberal democrats | 100.000000 | 100 | pudsey | liberal democrat | 101.000000 |
Hmm, the ordering here does seem to put the conservative row first. But I'm not sure why?
What happens if there is a swing from first to second favourite in constituencies with short odds across the joint first, or first and second, favourites? Are there particular swings likely from one party to another?
Let's start by creating a dataframe where each row is an observation for a constituency that shows the joint or first and second favorites, along with their odds:
col_subset=['constituency','party_clean','odds']
m=pd.merge(secondparty[col_subset],likelyparty[col_subset],on='constituency')
m.columns = ['constituency','secondparty','odds_second','firstparty','odds_first']
m.head()
constituency | secondparty | odds_second | firstparty | odds_first | |
---|---|---|---|---|---|
0 | halesowen-and-rowley-regis | labour | 0.833333 | conservative | 0.833333 |
1 | pudsey | labour | 0.833333 | conservative | 0.833333 |
2 | finchley-and-golders-green | labour | 0.909091 | conservative | 0.800000 |
3 | torbay | conservative | 0.909091 | liberal democrat | 0.909091 |
4 | cornwall-north | liberal democrat | 0.909091 | conservative | 0.800000 |
We can now look to see what the possible swings are by party away from the favourite to a joint of close second favourite.
#Start by finding the rows where the odds are short for the second favourite
#then count group sizes swinging from first to second party
m[m['odds_second']<=1.5].groupby(['firstparty','secondparty']).size()
firstparty secondparty conservative labour 13 liberal democrat 2 labour conservative 10 liberal democrat 1 snp 2 liberal democrat conservative 6 labour 1 plaid cymru 1 plaid cymru labour 1 respect labour 1 snp conservative 1 labour 4 ukip conservative 1 dtype: int64
This shows, for example, that there are good chances that up to 13 conservative seats could go to labour, and 3 seats from liberal democrat to conservative.
In this notebook, I have investigated a data set containing election odds for the majority of UK constituencies on a single day prior to the UK General Election, 2015.
The data predicited that the Conservatives would win the largest number of seats, though not a majority. The data predicted that the SNP would win a large number of Scottish constituency seats, and that Ed Ball's Pudsey constituency was unsafe for Labour.
Analysis of constituencies with low-odds, joint or close first and second favourites suggested more possible "swings" from Conservative to Labour than vice versa and mosts swings from Liberal Democrats to the Conservatives.