Taken from both companies' Wikipedia pages:
Plotly is an online analytics and data visualization tool. Plotly provides online graphing, analytics, a Python command line, and stats tools for individuals and collaboration, as well as scientific graphing libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST.
Socrata is a company that provides social data discovery services for opening government data. Socrata targets non-technical Internet users who want to view and share government, healthcare, energy, education, or environment data. Its products are issued under a proprietary, closed, exclusive license.
Simply put, the two are meant to work together and this IPython notebook will you how you can turn a dataset like this one and into a plot like that one.
You need an application token to communicate with Socrata from a Socrata Open Data API (soda for short).
Register to Socrata and get your application token here.
Unfortunately, there are no Soda Python wrapper available at this moment in time. But, fortunately, IPython allows us to use mutliple programming language inside the same environment (called an IPython notebook). So, here we will use Ruby and the soda-ruby
gem to comminicate with Socrata.
With Ruby and gem installed on your machine, run in a terminal/command prompt:
$ gem install soda-ruby
Add sudo
in front of the above for a system-wide install on Unix-like machines. Information about local gem install can be found here.
Then, add the line:
gem 'soda-ruby', :require => 'soda'
to a file named Gemfile
placed either in the current directory or in folder part of the gems path found of your machine (more here).
Head to opendata.socrata.com, browse or search for a dataset that you like and click on its link. I chose a list of the Guardian's "Top 1,000 Songs to Hear Before You Die" which can be viewed here. Here is a screenshot of the web page in question:
Then,
Click on Export
, a blue button on the upper right side of the page.
Click on Soda API
, the upper-most tab under Export
.
Copy the API Access Endpoint
, under the Soda API
tab.
In our case the API Access Endpoint is:
http://opendata.socrata.com/resource/ed74-c6ni.json
The API Access Endpoint represent the link between the dataset hosted on Socrata and the API, in our case soda-ruby. It contains two pieces of important information: the domain name and the dataset identifier. From the Socrata offical docs, take note that the API Access Endpoint corresponds to:
http://$domain/resource/$dataset_identifier
So, in our case the domain name is opendata.socrata.com
and the dataset identifier is ed74-c6ni
. Note that .json
is just the file extension, not needed to access the dataset).
Now, call the %%ruby
IPython inline magic to turn on Ruby inside the cell below:
%%ruby --out socrata_data
# with --out, data written to the stdout in this ruby cell
# will be mapped to a Python variable (socrata_data) after execution.
require 'soda/client'
require 'json'
# Set up client object with domain and application token
client = SODA::Client.new({:domain => "opendata.socrata.com",
:app_token => "eqZC5q2iEmFXdIu2qEbtZkWgP"})
# Get data with dataset identifier
response = client.get("ed74-c6ni")
# Print dataset to stdout as a JSON
puts response.to_json
And there you go, the Socrata dataset in now inside our IPython namespace!
Next, we will handle the dataset inside IPython using the popular pandas
module, so
import pandas as pd
# Read the retrieved JSON dataset (df stands for dataframe)
df = pd.read_json(socrata_data)
df.head() # print the first 5 lines of the dataframe
artist | spotify_url | theme | title | year | |
---|---|---|---|---|---|
0 | ABC | {u'url': u'http://open.spotify.com/track/78j3q... | Love | The Look of Love | 1982 |
1 | Badly Drawn Boy | {u'url': u'http://open.spotify.com/track/2PojS... | Love | The Shining | 2000 |
2 | The Beach Boys | {u'url': u'http://open.spotify.com/track/0ObrX... | Love | God Only Knows | 1966 |
3 | The Beach Boys | {u'url': u'http://open.spotify.com/track/2oF7F... | Love | Good Vibrations | 1966 |
4 | The Beach Boys | {u'url': u'http://open.spotify.com/track/0cx32... | Love | Wouldn’t It Be Nice | 1966 |
5 rows × 5 columns
df.shape # print the dataframe's size
(994, 5)
Let's make a Plotly bar chart with the following features:
So, let's first make a dictionary pairing the artist's name to the their number of tracks in the 1,000 songs list:
song_by_artist = df.groupby('artist').size().to_dict()
song_by_artist
{u'!!!': 1, u'(Don\u2019t Fear) The Reaper': 1, u'808 State': 1, u'A R Rahman': 1, u'ABC': 2, u'AC/DC': 2, u'ATB': 1, u'Aaliyah': 1, u'Abba': 3, u'Abyssinians': 1, u'Aerosmith': 2, u'Afroman': 1, u'Al Green': 5, u'Alice Cooper': 1, u'Alicia Keys': 2, u'Aliotta Haynes Jeremiah': 1, u'All Saints': 1, u'Althea and Donna': 1, u'Amy Winehouse': 2, u'Andy Williams': 1, u'Ann Peebles': 1, u'Anne Briggs': 1, u'Anthony Johnson': 1, u'Antony and the Johnsons': 1, u'Aphex Twin': 1, u'Arcade Fire': 1, u'Archie Bell and the Drells': 1, u'Archie Bleyer': 1, u'Arctic Monkeys': 4, u'Aretha Franklin': 3, u'Arthur Conley': 1, u'Arthur Russell': 1, u'Artie Shaw': 1, u'Ashford and Simpson': 1, u'Astrud Gilberto': 1, u'Au Pairs': 1, u'BB King': 1, u'Baccara': 1, u'Badfinger': 1, u'Badly Drawn Boy': 1, u'Baggy Trousers': 1, u'Bappi Lahiri/Parvati Khan': 1, u'Barbra Streisand and Barry\xa0Gibb': 1, u'Barrington Levy': 1, u'Barry McGuire': 1, u'Beastie Boys': 1, u'Bee Gees': 1, u'Belle and Sebastian': 1, u'Ben E King': 2, u'Benga and Coki': 1, u'Bessie Smith': 1, u'Bettye LaVette': 1, u'Bettye Swann': 1, u'Beyonc\xe9': 1, u'Big Joe Turner': 1, u'Big Star': 2, u'Bill Allen and the Backbeats': 1, u'Bill Withers': 4, u'Billie Holiday': 1, u'Billie Holliday': 1, u'Billy Bragg': 2, u'Billy Paul': 1, u'Bim Sherman': 1, u'Bing Crosby': 2, u'Bj\xf6rk': 1, u'Black Sabbath': 2, u'Blind Alfred Reed': 1, u'Blondie': 2, u'Blue Mink': 1, u'Blur': 3, u'Bo Diddley': 1, u'Bob Andy': 1, u'Bob Dylan': 24, u'Bob Lind': 1, u'Bob Marley': 2, u'Bob Marley and the Wailers': 2, u'Bobbie Gentry': 2, u'Bobby Bland': 1, u'Bobby Darin': 2, u'Bobby Fuller Four': 1, u'Bobby \u201cBlue\u201d Bland': 1, u'Body Count': 1, u'Bon Iver': 1, u'Bonnie \u2018Prince\u2019 Billy and Matt Sweeney': 1, u'Bonzo Dog Doo-Dah Band': 1, u'Boogie Down Productions': 1, u'Boys Town Gang': 1, u'Bright Eyes': 2, u'Britney Spears': 1, u'Bronski Beat': 2, u'Bruce Springsteen': 4, u'Bruce Springsteen and the E Street Band': 1, u'Bryan Ferry': 1, u'Buffalo Springfield': 1, u'Buffy Sainte-Marie': 1, u'Burning Spear': 1, u'Buzzcocks': 2, u'CSS': 1, u'Cab Calloway': 1, u'Candi Staton': 1, u'Cannibal and the Headhunters': 1, u'Captain and Tennille ': 1, u'Carl Bean': 1, u'Carlton and the Shoes': 1, u'Carly Simon': 2, u'Carole King': 2, u'Cast of Grange Hill': 1, u'Cat Stevens': 1, u'Ce Ce Rogers': 1, u'Chairmen of the Board': 1, u'Chic': 1, u'Chris Difford': 1, u'Chris Wood': 1, u'Chuck Berry': 5, u'Class Action': 1, u'Coldplay': 2, u'Cole Porter': 1, u'Commodores': 1, u'Cornershop': 1, u'Country Joe and the Fish': 1, u'Crosby, Stills, Nash and Young': 1, u'Crystal Mansion': 1, u'Curtis Mayfield': 1, u'Cyndi Lauper': 1, u'Daft Punk': 1, u'Dan Le Sac vs Scroobius Pip': 1, u'Daniel Johnston': 1, u'David Bowie': 9, u'Dead Kennedys': 1, u'Deee-Lite': 1, u'Def Leppard': 1, u'Depeche Mode': 2, u'Derek and the Dominos': 1, u'Desmond Dekker': 1, u'Destiny\u2019s Child': 1, u'Devo': 1, u'Dexys Midnight Runners': 2, u'Diana Ross': 1, u'Diana Ross and the Supremes': 1, u'Dick Gaughan': 1, u'Dillinger': 1, u'Dinah Washington': 1, u'Dion': 2, u'Dion & the Belmonts': 1, u'Dionne Warwick': 2, u'Divinyls': 1, u'Dixie Chicks': 1, u'Dolly Parton': 3, u'Don McLean': 1, u'Donna Summer': 2, u'Donny Osmond': 1, u'Donovan': 2, u'Doobie Brothers': 1, u'Dory Previn': 1, u'Doves': 1, u'Dudley Moore and Peter Cook': 1, u'Duffy ': 1, u'Dusty Springfield': 2, u'D\u2019Angelo': 1, u'Earth, Wind and Fire': 1, u'Echo and the Bunnymen': 1, u'Eddie Cochran': 1, u'Eddie Jefferson': 1, u'Edwin Starr': 1, u'Elastica': 2, u'Elbow': 3, u'Electribe 101': 1, u'Electric Light Orchestra': 1, u'Ella Fitzgerald': 2, u'Elliott Smith': 1, u'Elton John': 2, u'Elvis Costello': 2, u'Elvis Costello ': 1, u'Elvis Costello and the Attractions': 5, u'Elvis Costello and the Imposters': 1, u'Elvis Presley': 6, u'Eminem': 1, u'Emmylou Harris ': 1, u'Eric Bogle': 1, u'Esther Williams': 1, u'Etta James': 2, u'Eurythmics': 1, u'Everything But the Girl': 1, u'Ewan MacColl': 1, u'Faith No More': 1, u'Fatman Scoop featuring the Crooklyn Clan': 1, u'Fats Domino': 1, u'Fela Kuti': 2, u'First Choice': 1, u'Flanders and Swann': 1, u'Fleetwood Mac': 1, u'Flight of the Conchords': 1, u'Frank Sinatra': 6, u'Frank Wilson': 1, u'Frankie Goes To Hollywood': 1, u'Frankie Goes to Hollywood': 2, u'Frankie Valli and the Four Seasons': 1, u'Funkadelic': 1, u'Gary Numan': 1, u'George Harrison': 1, u'George Kranz': 1, u'George McCrae': 1, u'George Michael': 2, u'Geto Boys': 1, u'Gil Scott-Heron': 2, u'Gilbert O\u2019Sullivan': 1, u'Girls Aloud': 1, u'Glasvegas': 1, u'Glen Campbell': 1, u'Gloria Gaynor': 1, u'Gloria Jones': 1, u'Gorillaz': 1, u'Grace Jones': 4, u'Gram Parsons': 2, u'Gram Parsons with Emmylou Harris': 1, u'Grandmaster Flash and the Furious Five': 1, u'Grandmaster Melle Mel': 1, u'Green Day': 1, u'Gregory Isaacs': 1, u'Grinderman': 1, u'Guns N\u2019 Roses': 1, u'Guy Clark': 1, u'Gwen McRae': 1, u'Half Man Half Biscuit': 1, u'Halls and Oates': 1, u'Hamilton Bohannon': 1, u'Hank Williams': 3, u'Happy Mondays': 2, u'Heaven 17': 1, u'Heinz': 1, u'Herbie Hancock': 1, u'Herman D\xfcne': 1, u'Herman\u2019s Hermits': 1, u'Hot Chip': 1, u'Hot Chocolate': 1, u'House of Pain': 1, u'H\xfcsker D\xfc': 1, u'Ian Campbell Folk Group': 1, u'Ian Dury': 1, u'Ian Dury and the Blockheads': 2, u'Ice Cube': 1, u'If I Could Turn Back Time': 1, u'Ike and Tina Turner': 1, u'Indeep': 1, u'Inner City': 1, u'Irene Cara': 1, u'Irma Thomas': 1, u'JC Lodge': 1, u'Jackie Brenston and His Delta Cats': 1, u'Jackie Wilson': 1, u'James Brown': 2, u'James Carr': 1, u'Jamie Principle': 1, u'Jan Bradley': 1, u'Jane Birkin and Serge Gainsbourg': 1, u'Janet Kay': 1, u'Janis Joplin': 1, u'Janis Joplin and Big Brother and the Holding Company': 1, u'Jarvis Cocker': 1, u'Jeannie C Riley ': 1, u'Jerry Lee Lewis': 1, u'Jimi Hendrix': 1, u'Jimmie Rodgers': 1, u'Jimmy Cliff': 1, u'Jimmy Reed': 1, u'Jimmy Ruffin': 1, u'Jimmy Webb': 1, u'Joan Baez': 1, u'Joan Jett and the Blackhearts': 1, u'Joe Jackson': 1, u'John Cale and Lou Reed': 1, u'John Coltrane': 1, u'John Coltrane Quartet': 1, u'John Lennon': 3, u'John Martyn': 1, u'John Prine': 1, u'Johnny Bristol': 1, u'Johnny Cash': 5, u'Johnny Mandel': 1, u'Jonathan Richman and the Modern Lovers': 1, u'Joni Mitchell': 3, u'Joy Division': 2, u'Judy Clay and William Bell': 1, u'Judy Garland': 2, u'Julian Cope': 1, u'Junior Murvin': 1, u'Justice v Simian': 1, u'KD Lang': 1, u'Kanye West': 2, u'Karen Dalton': 1, u'Kate Bush': 5, u'Katy Perry': 1, u'Keep Me in Your Heart': 1, u'Kelis': 2, u'Kellee Patterson': 1, u'Kelly Clarkson': 1, u'Kid Creole and the Coconuts': 1, u'Kings of Leon': 1, u'Kirsty MacColl': 1, u'Kiss': 1, u'Klaxons': 1, u'Kraftwerk': 2, u'Kris Kristofferson': 1, u'Kylie Minogue': 1, u'LCD Soundsystem': 2, u'LFO': 1, u'Labelle': 1, u'Labi Siffre': 1, u'Larry Young': 1, u'Laura Branigan': 1, u'Leadbelly': 1, u'Led Zeppelin': 2, u'Lee Dorsey': 1, u'Lemon Jelly': 1, u'Leona Lewis': 1, u'Leonard Cohen': 3, u'Leroy Hutson': 1, u'Lesley Gore': 1, u'Lethal Bizzle': 1, u'Lily Allen': 1, u'Lil\u2019 Louis': 1, u'Lionel Richie': 1, u'Little Richard': 1, u'Live Forever': 1, u'Liz Phair': 1, u'Lloyd Price': 1, u'Look Up': 1, u'Loose Joints': 1, u'Lord Kitchener': 1, u'Loretta Lynn': 1, u'Lou Reed': 2, u'Loudon Wainwright III': 1, u'Louis Armstrong': 3, u'Louis Jordan and His Tympany Five': 1, u'Louis Prima and Keely Smith ': 1, u'Love': 1, u'Lulu': 1, u'Luther Vandross': 1, u'Lynyrd Skynyrd': 1, u'M': 1, u'MC5': 1, u'MGMT': 2, u'MIA': 1, u'Maceo and the Macks': 1, u'Machine': 1, u'Madonna': 6, u'Mae West': 1, u'Malcolm McLaren': 1, u'Manic Street Preachers': 1, u'Manu Chao': 2, u'Manu Dibango': 1, u'Marianne Faithfull': 1, u'Marilyn Monroe': 1, u'Mark Dinning': 1, u'Mark Ronson featuring Amy Winehouse': 1, u'Martha and the Vandellas': 1, u'Martin Carthy': 2, u'Marvin Gaye': 6, u'Marvin Gaye And Tammi Terrell': 2, u'Mary Margaret O\u2019Hara': 1, u'Massive Attack': 1, u'Max Romeo': 1, u'Max Sedgley': 1, u'McAlmont and Butler': 1, u'Memphis Minnie': 1, u'Merle Haggard': 3, u'Merrilee and the Turnabouts': 1, u'Metallica': 1, u'Me\u2019Shell Ndegeocello': 1, u'Michael Jackson': 2, u'Mick Hanly with Christy Moore': 1, u'Mike Berry and the Outlaws': 1, u'Millie Jackson': 1, u'Minnie Riperton': 1, u'Mississippi John Hurt': 1, u'Missy Elliott': 3, u'Mitch Ryder and the Detroit Wheels': 1, u'Mohammed Rafi': 1, u'Morrissey': 1, u'Mr Vegas': 1, u'My Favourite Girl': 1, u'My Neck, My Back (Lick It)': 1, u'NWA': 1, u'Nancy Sinatra and Lee Hazlewood': 2, u'Nas': 1, u'Nat King Cole': 1, u'Naturally 7': 1, u'Naughty by Nature': 1, u'Needle of Death': 1, u'Neil Young': 5, u'Nelly': 1, u'New Order': 3, u'Nick Cave': 1, u'Nick Cave and the Bad Seeds': 1, u'Nick Cave and the Bad\xa0Seeds': 1, u'Nick Drake': 1, u'Nilsson': 1, u'Nina Simone': 3, u'Nine Inch Nails': 1, u'Nirvana': 1, u'Nitin Sawhney': 1, u'Norman Greenbaum': 1, u'Oasis': 1, u'Old Man': 1, u'Ol\u2019 Dirty Bastard': 1, u'Orbital': 1, u'Otis Redding': 1, u'OutKast': 2, u'Owen and Leon': 1, u'PJ Proby': 1, u'Patsy Cline': 1, u'Patsy Gallant': 1, u'Patti Smith': 2, u'Paul Hardcastle': 1, u'Paul McCartney and Wings': 1, u'Paul Simon': 4, u'Paul Weller': 3, u'Paul Westerberg': 1, u'Peaches': 1, u'Peggy Seeger': 1, u'Pentangle': 1, u'Percy Sledge': 1, u'Pet Shop Boys': 2, u'Pete Seeger': 1, u'Peter Gabriel': 2, u'Peter Tosh': 1, u'Phil Collins': 1, u'Phil Ochs': 1, u'Phuture': 1, u'Pigbag': 1, u'Pink Floyd': 2, u'Pitman': 1, u'Pixies': 2, u'Pluto Shervington': 1, u'Portishead': 2, u'Primal Scream': 2, u'Prince': 6, u'Prince and the Revolution': 2, u'Public Enemy': 2, u'Pulp': 5, u'Queen': 2, u'Queens of the Stone Age': 1, u'R D Burman': 1, u'R Kelly': 1, u'REM': 3, u'Radiohead': 1, u'Rage Against the Machine': 1, u'Ralph Stanley': 1, u'Randy Newman': 8, u'Ray Charles': 4, u'Rhythim Is Rhythim': 1, u'Richard Hawley': 1, u'Richard Thompson': 1, u'Richard and Linda Thompson': 1, u'Richie Havens': 1, u'Rick James': 1, u'Rihanna': 1, u'Robbie Williams': 1, u'Robert Johnson': 1, u'Robert Wyatt': 1, u'Roberta Flack': 1, u'Rod Stewart': 4, u'Roots Manuva': 1, u'Rose Royce': 1, u'Roxy Music': 2, u'Roy Ayers': 1, u'Roy Bailey': 1, u'Roy Davis Jr': 1, u'Roy Orbison': 4, u'Rufus featuring Chaka Khan': 1, u'Ry Cooder': 1, u'Ryan Adams': 2, u'Salsoul Orchestra': 1, u'Salt-n-Pepa': 1, u'Sam Cooke': 3, u'Sam Mayo': 1, u'Sam and Dave': 1, u'Sapan Chakraborty': 1, u'Scarface': 1, u'Scott Walker': 2, u'Screamin\u2019 Jay Hawkins': 1, u'Selector': 1, u'Sex Pistols': 2, u'Sham 69': 1, u'Shangri-Las': 1, u'Shannon': 1, u'Sheffield Socialist Choir': 1, u'Shirley Collins': 1, u'Shirley Ellis': 1, u'Simon and Garfunkel': 3, u'Sinead O\u2019Connor': 1, u'Sister Sledge': 2, u'Skee-Lo': 1, u'Skip James': 1, u'Sly and the Family Stone': 2, u'Small Faces': 1, u'Smog': 2, u'Smokey Robinson and the Miracles': 1, u'Snooks Eaglin': 1, u'Soft Cell': 1, u'Solomon Burke': 1, u'Sonic Youth': 2, u'Sonny and Cher': 1, u'Soul Brothers Six': 1, u'Soul Sisters': 1, u'Spacemen 3': 1, u'Sparks': 1, u'Spike Jones and His City Slickers': 1, u'Spiritualized': 1, u'Squeeze': 1, u'Stephen Fretwell': 1, u'Steppenwolf': 2, u'Steve Earle': 4, u'Steve Goodman': 1, u'Stevie Wonder': 4, u'Stiff Little Fingers': 1, u'Suede': 2, u'Sue\xf1o Latino': 1, u'Super Furry Animals': 1, u'Sylvester': 1, u'System Of A Down': 1, u'T Rex': 1, u'T-Connection': 1, u'TLC': 2, u'Take That': 1, u'Talib Kweli': 1, u'Talking Heads': 2, u'Tammy Wynette': 1, u'Tears for Fears': 1, u'Terry Jacks': 1, u'The Animals': 1, u'The Artful Dodger featuring Craig David': 1, u'The B-52\u2019s': 1, u'The Beach Boys': 6, u'The Beat': 1, u'The Beatles': 19, u'The Beautiful South': 1, u'The Bee Gees': 1, u'The Blue Nile': 1, u'The Byrds': 3, u'The Cars': 1, u'The Carter Family': 1, u'The Chi-Lites': 2, u'The Clash': 5, u'The Coasters': 1, u'The Communards': 1, u'The Congos': 1, u'The Contours': 1, u'The Cramps': 1, u'The Crickets': 1, u'The Crystals': 2, u'The Cure': 3, u'The Decemberists': 1, u'The Disposable Heroes of Hiphoprisy': 1, u'The Drifters': 1, u'The Eagles': 1, u'The Everly Brothers': 1, u'The Faces': 1, u'The Fall': 1, u'The Flirts': 1, u'The Flying Burrito Brothers': 1, u'The Good, the Bad and the Queen': 1, u'The Handsome Family': 1, u'The Hidden Cameras': 1, u'The Hold Steady': 1, u'The House of Love': 1, u'The Human League': 1, u'The Impressions': 3, u'The Isley Brothers': 2, u'The Jackson 5': 1, u'The Jam': 3, u'The Jimi Hendrix Experience': 1, u'The Killers': 1, u'The Kingsmen': 1, u'The Kinks': 6, u'The Knack': 1, u'The Libertines': 1, u'The Louvin Brothers': 2, u'The Lovin\u2019 Spoonful': 1, u'The Mamas and the Papas': 1, u'The Mothers of Invention': 1, u'The Mountain Goats': 1, u'The Nolans': 1, u'The Normal': 1, u'The Notorious BIG': 2, u'The Number of the Beast': 1, u'The Pogues': 4, u'The Pointer Sisters': 1, u'The Police': 4, u'The Pop Group': 1, u'The Pretenders': 2, u'The Proclaimers': 1, u'The Prodigy': 1, u'The Rapture': 1, u'The Righteous Brothers': 2, u'The Rolling Stones': 8, u'The Ronettes': 2, u'The Shangri-Las': 1, u'The Shirelles': 1, u'The Small Faces': 1, u'The Smiths': 5, u'The Special AKA': 1, u'The Specials': 2, u'The Spencer Davis Group': 1, u'The Spice Girls': 1, u'The Spinners': 1, u'The Stanley Brothers': 1, u'The Staple Singers': 1, u'The Stone Roses': 1, u'The Stooges': 1, u'The Streets': 1, u'The Strokes': 1, u'The Sugarhill Gang': 1, u'The Sundown Playboys': 1, u'The Supremes': 5, u'The Surfaris': 1, u'The Teenagers featuring Frankie Lymon': 1, u'The Temptations': 2, u'The The': 1, u'The Ting Tings': 1, u'The Trammps': 1, u'The Troggs': 1, u'The Undertones': 1, u'The Vapors': 1, u'The Velvet Underground': 2, u'The Velvet Underground and Nico': 1, u'The Wailers': 1, u'The Walker Brothers': 1, u'The Waterboys': 1, u'The Who': 4, u'Them': 1, u'Them/Van Morrison': 1, u'Thin Lizzy': 1, u'Thom Yorke': 1, u'Tim Buckley': 1, u'Tim Hardin': 1, u'Tito Puente': 1, u'Todd Rundgren': 1, u'Tom Jones': 1, u'Tom Robinson': 1, u'Tom Waits': 5, u'Tommy James and the Shondells': 1, u'Tone Loc': 1, u'Toni Braxton': 1, u'Tony Bennett': 1, u'Toots and the Maytals': 1, u'Townes Van Zandt': 1, u'Tullio De Piscopo': 1, u'Tupac': 2, u'Tupac - as Makaveli': 1, u'Tweet': 1, u'Twinkle': 1, u'U2': 5, u'USA for Africa': 1, u'Vampire Weekend': 1, u'Van Morrison': 3, u'Vic Chesnutt': 1, u'Wanda Jackson': 1, u'War': 1, u'Warren G and Nate Dogg': 1, u'Wayne Smith': 1, u'West Street Mob': 1, u'Wham!': 1, u'Whitney Houston': 2, u'William Blake, Charles Hubert and Hastings Parry': 1, u'Willie Nelson': 1, u'Wilson Pickett': 2, u'Woody Guthrie': 2, u'World Domination Enterprises': 1, u'X-Ray Spex': 1}
Now, loop through that dictionary and select the key-value pairs corresponding to the artists with 4 or more songs in the 1,000 songs list:
song_by_artist_4plus = {k:v for k,v in song_by_artist.items() if v>=4}
song_by_artist_4plus
{u'Al Green': 5, u'Arctic Monkeys': 4, u'Bill Withers': 4, u'Bob Dylan': 24, u'Bruce Springsteen': 4, u'Chuck Berry': 5, u'David Bowie': 9, u'Elvis Costello and the Attractions': 5, u'Elvis Presley': 6, u'Frank Sinatra': 6, u'Grace Jones': 4, u'Johnny Cash': 5, u'Kate Bush': 5, u'Madonna': 6, u'Marvin Gaye': 6, u'Neil Young': 5, u'Paul Simon': 4, u'Prince': 6, u'Pulp': 5, u'Randy Newman': 8, u'Ray Charles': 4, u'Rod Stewart': 4, u'Roy Orbison': 4, u'Steve Earle': 4, u'Stevie Wonder': 4, u'The Beach Boys': 6, u'The Beatles': 19, u'The Clash': 5, u'The Kinks': 6, u'The Pogues': 4, u'The Police': 4, u'The Rolling Stones': 8, u'The Smiths': 5, u'The Supremes': 5, u'The Who': 4, u'Tom Waits': 5, u'U2': 5}
Next, as Python dictionaries cannot be sorted, make separate lists of keys and values from the song_by_artist_4plus
dictionary and sort them in descending order:
import numpy as np
# Lists of keys and values
my_keys = song_by_artist_4plus.keys()
my_vals = song_by_artist_4plus.values()
# Find indices of sorted values (first converted to a numpy array)
i_sorted = np.argsort(np.array(my_vals))[::-1]
# Sort both the keys and value list
my_keys_sorted = [my_keys[i] for i in i_sorted]
my_vals_sorted = [my_vals[i] for i in i_sorted]
If have a plotly account as well as a credentials file set up on your machine, singing in to Plotly's servers is done automatically while importing plotly.plotly
:
import plotly.plotly as py
For more info on how to sign up or sign in to Plotly, see Plotly's Python API User Guide.
Next, import a few graph objects needed to make our Plotly plot:
from plotly.graph_objs import Figure, Data, Layout
from plotly.graph_objs import Bar
from plotly.graph_objs import XAxis, YAxis, Marker, Font, Margin
Make an instance of the bar and data object:
my_bar = Bar(x=my_keys_sorted, # labels of the x-axis
y=my_vals_sorted, # values of the y-axis
marker= Marker(color='#2ca02c')) # a nice green color
my_data = Data([my_bar]) # make data object, (Data accepts only list)
Make an instance of the layout object:
my_title = 'Number of songs listed in the Guardian\'s<br>\
<em>Top 1,000 Songs to Hear Before You Die</em> per artist with 4 or more songs'
my_ytitle = 'Number of songs per artist'
my_layout = Layout(title=my_title, # set plot title
showlegend=False, # remove legend
font= Font(family='Georgia, serif', # set global font family
color='#635F5D'), # and color
plot_bgcolor='#EFECEA', # set plot color to grey
xaxis= XAxis(title='', # no x-axis title
tickangle=45, # tick labels' angle
ticks='outside', # draw ticks outside axes
ticklen=8, # tick length
tickwidth=1.5,), # and width,
yaxis= YAxis(title=my_ytitle, # y-axis title
gridcolor='#FFFFFF', # white grid lines
ticks='outside',
ticklen=8,
tickwidth=1.5),
autosize=False, # manual figure size
width=700,
height=500,
margin= Margin(b=140) # increase bottom margin,
) # to fit long x-axis tick labels
Make instance of the figure object, send it to Plotly and get a plot in return inside this IPython notebook:
my_fig = Figure(data=my_data, layout=my_layout)
py.iplot(my_fig, filename='socrata1')
Not bad, but let's try to improve our plot by making use of Plotly's hover capabilities.
Next, we add hover text to each of bars so that hovering with cursor over them will show a list of the songs' titles and years of release included in the 1,000 songs list in chronological order.
First, we need to trim the original dataframe so that it contains only the artists with 4 or more songs in the 1,000 songs list:
# Rows which have 'artist' name in song_by_artist_4plus
i_good = (df['artist'].isin(song_by_artist_4plus))
df_good = df[i_good] # a new dataframe
df_good.shape # a much smaller dataframe than the original
(222, 5)
Next, loop through the sorted artists names building a text list to be linked the the 'text'
key in the data object.
Unfortunately, the biggest lists will have to be truncated to fit inside the Plotly figure:
my_text = [] # init. the hover-text list
# Loop through the sorted artist names, so that my_text
# will have to same ordering as the values linked to 'x' and 'y' in my_data
for k in my_keys_sorted:
# Slice dataframe to artist name and sort songs by year
i_artist = (df['artist']==k)
df_tmp = df_good[i_artist].sort(columns='year')
my_text_tmp = '' # init. string
cnt_song = 0 # song counter for given artist
N_song = len(df_tmp['title']) # total number of song for given artist
# Loop through songs
for i_song, song in df_tmp.iterrows():
# Add to string and counter
my_text_tmp += song['title']+' ('+str(song['year'])+')<br>'
cnt_song += 1
# Skip if song list is too long to fit on figure
if cnt_song>12:
diff = N_song - cnt_song
my_text_tmp += ' and '+str(diff)+' more ...'
break
# Append hover-text list
my_text += [my_text_tmp]
# Update figure object
my_fig['data'][0].update(text=my_text)
Finally, add a text annotation citing our data source to our plot:
from plotly.graph_objs import Annotation
my_anno_text = '<em>Open Data by Socrata</em><br>\
Hover over the bars to see list of songs'
my_anno = Annotation(text=my_anno_text, # annotation text
x=0.95, # position's x-coord
y=0.95, # and y-coord
xref='paper', # use paper coords
yref='paper', # for both coordinates
font= Font(size=14), # increase font size (default is 12)
showarrow=False, # remove arrow
bgcolor='#FFFFFF', # white background
borderpad=4) # space bt. border and text (in px)
# Update figure object
my_fig['layout'].update(annotations=[my_anno])
And now all we have left to do is to send the updated figure object to plotly:
py.iplot(my_fig, filename='socrata1-hover')
Spend some time hovering over the bars and admire plotly's interactibility!
Great data and beautiful visualization, at your finger tips.
About Plotly
Big thanks to
# CSS styling within IPython notebook
from IPython.core.display import HTML
import urllib2
def css_styling():
url = 'https://raw.githubusercontent.com/plotly/python-user-guide/master/custom.css'
styles = urllib2.urlopen(url).read()
return HTML(styles)
css_styling()