Is modern music basic? Is lyricism dead? Are lyrics very repetitive? What genre is very repetitive? How is the trend changing over the years?
Have you ever had these questions and was looking for a systematic way to answer such questions? In this post I attempted to utilize the "entropy" of popular song lyrics to answer these questions. In information theory, entropy quantifies the average level of uncertainty or information associated with the variable's potential states or possible outcomes. In our case, it can quantify the average level of uncertainty or variability in a particular song.
How did I model this problem? Essentially looked at song lyrics and mapped the probability of occurrence of different words within the song lyrics. The most repetitive songs probably re-use the same lyrics (words) a lot through the song, and would have a lower "entropy".
To gather these results, I analyzed the songs on Billboard for each year from 1950 to 2015 in the kevinschaich/billboard GitHub repo. Some songs such as Harlem Shake unsurprisingly have low entropy, and songs such as 6 Foot 7 Foot have high entropy. Music genre such as Rap and Hip Hop have high entropy as they use diverse vocabulary and as their popularity has increased over the recent years have caused an increase in overall entropy.