Global Hackathon: Can you predict who will love a song?


This is precisely the question over 45,000 data scientists were invited to answer at a recent Music Data Science 24 hour global Hackathon event with first time access to EMI Music’s million interview dataset. The dataset comprised of a selection of results from a million interviews of music fans around the world, offering insight to their interests, attitudes, behaviours, familiarity, and appreciation of music.

This was a collaborative project with Data Science London, EMI Music, EMC, Lightspeed Research and Kaggle, to challenge data scientists to predict the rating someone would give a song based on their demographic, the artist and track ratings, their answers to questions about musical preferences and the words they use to describe EMI artists.

Over 1,300 entries were submitted from 138 different teams – a mix of those attending in person and from across the globe. The winning algorithm came from Shanda Innovations, a tech incubator based in Shanghai and Beijing. Other mined insights included the fact that women tended to be generally more positive than men, using words like “current”, “edgy” and “cool” to describe songs, as opposed to “cheap”, “unoriginal” and “superficial”. Retired people tended rate songs higher, while students and unemployed people often gave lower ratings. You can view some of the visualisations from the event here:

The data science community tweeted throughout the event, with many in ‘heads down & focus’ mode!

Tweet: #musicdata #DSGhack Eerily quiet as coders are in ‘head down & focus’ mode @hubwestminster #committed

Tweet: Off to get the essentials as supplies are running low @hubwestminster. They’re a hungry bunch! #musicdata ##DSGhack

Tweet: #musicdata conclusions: the importance of individual models for each artist. No one size fits all solution in this game!

Tweet: #musicdata quotes: “Some users are more talkative – use the ratio of each word / how many total words used”

EMC provided IT Infrastructure and analytical tools to the contestants, as well as operational support for the competition through its Greenplum division.

From an EMI perspective this hackathon may well prove the importance of individual artist insight models. “One size fits all is no longer a valid model. The results and the data scientists comments in Kaggle’s forum also show that understanding music attitudes, behaviours and listeners words of music appreciation are more important than having insight on traditional demographics data,” said David Boyle SVP EMI.

“Community, learning and collaboration are at the heart of innovation. To succeed in the new world of Big Data, companies need to invest in innovation and experiment with data-sets to mine their real, untapped value,” said Chris Roche, Regional Director for EMC Greenplum. “I see this series of crowd sourcing events as one of Greenplum’s investments in community, learning, collaboration and innovation. We are pleased to support the Data Science London community and EMI both with our technology and expertise.”

Leave a Reply