Automated Lyric Analysis

Automated Lyric Analysis

Computer science major used computational tools to determine the sophistication of musical lyrics over time. 

About the Project


Did musical lyrics get less sophisticated over time? Joanna Gormley’s honors capstone project used computational tools to analyze the lyrical sophistication of the Billboard top 100 from 1959 – 2016.

Both the reading level and the repetition of the lyrics were considered. Two commonly accepted measures of reading level were considered: Flesch-Kincaid and Coleman-Liau. The average reading level for each year was then compared over time.

The results we observed led us to the conclusion that the reading level of music has not significantly changed over time. Therefore in this aspect, lyrics have not become less sophisticated.

To measure repetition of lyrics the Shannon entropy was computed to aid in the calculation of repetition for each song first by words and then by verses. We then averaged each year’s songs and compared them. Our results indicate that there is a slight increase of entropy over time. This indicates that the songs are becoming less repetitive; which may be an indication of increased sophistication.


Joanna Gormley ’18, Computer Science 


Zach Kissel ’05, Associate Professor, Computer Science

Assignment Type

Independent study project

Languages/Code/Programs/Technology Used
  • Process of data collection and data cleaning was explored (Python was used for automated lyric collection)
  • An extensive syllable database was constructed that allowed for looking up the number of syllables per word (required for reading level calculations)
  • The analysis of reading level and repetition was coded in Java

Photo Gallery