Our Methods
In order to markup the important data, format, and
syntax of each of the song lyrics files in our gigantic
collection, we used oXygen's "Find and Replace in Files" tool to
create REGEX which captured the regular formatting patterns
which occurred throughout the blues lyrics files, then wrapped
corresponding XML tags around our capturing groups accordingly.
We created XML tags to markup the following
information from the files:
- The "metadata" section, which contained the song title,
performing artist, recording date/year, and album
- The content within the "metadata" section, including all
those listed above
- The entirety of the song's lyrics, as well as each line
within the lyrics
- At times, the original website had a "Notes" section beneath
the song lyrics which contained neat facts about the song,
explained what certain references or allegories within the
song's lyrics referred to, or fun facts about the artist who
sang the song.
XQuery
Once we had all relevant data and formatting
tagged in our collection, we used the eXist software to create
and conduct XQueries which could pull and query the information
we were after from our XML tags en masse using XPath functions.
Once we had functioning XQueries, we used the FLWOR method of
XQuerying to create SVG graphics, collections of raw text files
for Python, and TSV (Tab-Separated Values) files from the
queried data. From these, we created visualizations of our
research results from our blues collection!(Here's the GitHub view of our XQuery Files)
Networking Analysis Graphics
Network of Interactions between Songwriters and
Performing Artists
This network, created in Cytoscape, utilizes all of our
"artist"
and "songwriter"
tags in our
collection of 1,088 song XML files to create a visualization of the
shared interactions between performing
artists (big blue circles/rectangles) and songwriters (large and small pink
squares) as they are connected through songs. Each connecting line
represents song(s) - the direction of the line's arrows shows who
wrote the song(dotted end), and who performed it(arrow end).
The larger pink squares represent songwriters who connect two of more performing artists
together - meaning that those people wrote songs which were
performed by two or more performing artists in our collection.
The red lines indicate songs that were
written by a performing artist orignially, and were covered by one
or more other performing artists.
The green squares represent songs that are either traditional in origins, meaning the lyrics
were written a long time ago and they have been passed down through
many generations to the performing arists, or unknown in origin, meaning the original writer of the
song could not be confirmed by our source.
This network alone shows the massive amount of blues songs that have
been passed down and shared among blues artists, in our limited
collection alone.
IMPORTANT NOTE: This network does not accurately represent ALL
of the interactions between blues artists and songwriters throughout
the whole genre. This network's dataset is limited to our source
files, which is far from being fully
representative or complete data of the blues. This accounts
for why B.B. King's songwriter connections is so much bigger than
the others' - it's not that B.B. King ACTUALLY has more songs which
connect him to more songwriters - our collection simply has a
disproportionate amount of his songs, compared to the other artists
in our collection. In the future we hope to add more song files from
our artists and other blues artists we have no songs for, to create
a more representative network for the blues as a whole!
Python and SpaCy
In order to compare aspects of the blues lyrics themselves
within our collection, we used Python and spaCy's Natural Language
Processor (NLP) to auto-magically analyze any collection of lyrics we
give it and find parts of speech, entities, and the like within each
line and produce beautiful bar graphs using pyGal. (Take a
look at our Python code on GitHub)
For our purposes, we used the NLP to create visual
graphics of the most common nouns, verbs, and geopolitical entities
within songs which were written and performed in each decade. In doing
so, we created a marvelous visual for what sorts of things blues music
was serenading about, what topics or subjects may have been a
popular/common hardship to have the (emotional) blues over at the time,
and in the case of the geopolitical entities graphs, where these artists
were emotionally experiencing and musically sharing the blues from.