It’s the big day for Love Data Week! Today I am featuring a few of my favorite tools using my own qualitative dataset.
I am working on my PhD in History and, although I am still early in my program, I began research for my dissertation this year. I am currently examining a set of petitions sent to the U.S. Congress that were calling for action during the Armenian massacres of the 1890s. Instead of just reading through and taking notes in Word, I decided to collect information systematically so that I can use the data for a potential digital humanities project.
As I read through, I collected the dates, locations, types of meetings, representatives, and then notes on the language used in the petition. I originally entered the information in Excel because I needed to finish a short paper, but I am switching to a Qualtrics form for final analysis. I still need to test it out and make sure it will work for my questions. I mocked this up in half an hour at the most one day before I went to NARA. If you have suggestions, let me know.
Qualtrics is easy to use and a more reliable way to input data because you can control the types of fields that can be entered. For example, I am interested in changes over time, so I can control both the date fields and the congressional information and avoid input error. Moreover, the data can be exported easily and analyzed in any software. Because I have a large set of petitions, using a system like this is absolutely necessary and more reliable for counts and a broad overview than just taking notes on the petitions.
Once I had a starting set, I used OpenRefine to clean up some fields. OpenRefine is a much easier and more reliable tool for cleaning data than Excel. For example, in my spreadsheet I was collecting information about the specific representative to whom the petitions were sent. Again, I want to know about this issue over time, so I’d like to see if individuals were receiving specific petitions multiple times and if they got petitions on other issues. But as you can see from this picture, the handwriting is not often legible. This is actually one of the easiest to read. As such, because I was inputting data under a deadline, some of the names of legislators are inconsistent (see the image below). I can use OpenRefine to easily and quickly clean fields. So, for example, all the fields with Fletcher, Minn can quickly become Fletcher, MN. Also, if I have trouble making out someone’s name on one petition but can see their state, I can use OpenRefine to give me clues as to the name from other petitions.
The tool can do much more than this. Definitely worth checking out for spreadsheet cleaning!
After I cleaned up my spreadsheet, I mapped the petition origination locations just for fun. I used Google Fusion Tables because we have GAFE at UNCG and I wanted to test it versus ArcGIS Online. For simple mapping, it is quite helpful and easy to use. As long as it can recognize location fields (City, State for example), it will try to map the data. I used this to get a sense of where the petitions were coming from. While I thought I knew because I read each one, it was difficult to get a sense of geography after reading over 200 petitions. I assumed most would be from the Northeast, but I was surprised by the actual geographic spread. Of course, a lot more could be done with this. For example, the size of bubbles could be changed depending on the number of petitions from a location.
Next, I did some basic text visualization using Voyant to explore the themes. I uploaded parts of the petition that referred to the role of the state (my research question) just to explore. You can do several things with Voyant for basic text mining, but the word clouds are always fun. You can also see the most frequently used words. In this case they were government (24), Turkish (20), people (19), humanity (18), right (17). Again this was a subset of my petitions (only one column), but it gives you the idea of what you could do just starting out.
I’ve also played around some with Atlas.ti for my documents that are not petitions, especially my primary source newspapers. I haven’t done much with that yet, but I love the iPad version of Atlas.ti and can see it as being useful later on when I have a larger set of primary documents to work with. It is mostly used for qualitative data work in the social sciences, but if you are a historian working with Atlas.ti, Nvivo, or Dedoose, PLEASE get in touch with me. I would love to have some use case scenarios.
Eventually I would also like to do some basic network analysis using Gephi but my data is not ready for that. And of course I could never go anywhere without my Zotero library that provides immediate access to all of my secondary sources and some primary documents! Yes, that is data too!
Historians wanting to do more systematic examinations of their “data” sets of primary documents should check out the Programming Historian. You can do a lot of analysis now that you couldn’t do even five years ago, but you need to start by collecting and managing the information systematically. If you would like tips on doing this, please contact me and we can talk through your project!