Subjects in the Jim Peppler Collection

Visualizations

With 199 subjects ranging in content from to Black social life to disc jockeys and funerals, the Jim Peppler Southern Courier Collection provides unique insight into the joy and despair of the 1960s in America.

Click on any of the subject headings to view photos of that subject from the Jim Peppler Southern Courier Collection on the Alabama Department of Archives and History’s digital collections site.

The below list contains the names of all 199 subjects tagged in the collection, the associated number of photographs, and the percentage of photos in the whole collection that the subject appears in. Darker colors mean that the subject appears in the collection more frequently.

Topic Modeling

While the standardized subject headings for the places, peoples, and subjects represented in the collection tell us a great deal about the materials that are being desribed, they do not fully represent every aspect of the metadata available. The free-text descriptions used in the title and description text fields provide additional information that is not represented in the subject headings.

I used MALLET and NLTK to identify topics and language that would be necessary to edit or remove to reveal the the topics respectively. But what do I mean by “edit or remove”? Isn’t it important to see the data as it exists in its original state? Well: yes and no! In many cases when analyzing a corpus of text, data has been pre-processed to exclude what are called “stopwords”. Stopwords might include the usual subjects such as “a”, “an”, “the”, “was” and similar terms. These are some of the most common words in the English language, so it makes sense to remove them as their frequency will give them undue weight.

Here is the list of default MALLET stopwords. You’ll note that some of them are less straightforward and obvious, and it’s important to consider that the stopwords we choose to use will impact the final analysis of the metadata. Understanding the natural state of the metadata and the additional stopwords that were required for me to focus in on the metadata that captured the information I found most relevant is critical to fully understanding why I analyzed the metadata I did.

The metadata describing this collection includes language that is highly repetitive and can work similarly to stopwords to mask the data we actually want to look at. To provide transparency for this project, I chose to replace duplicative language with “stub” language to allow the duplicative language to be represented but not have the individual words overrun the resulting topic topic model. I chose to edit the original data with the language I identified in the file available on the project folder on Github.

Back to project page