Using basic data tools for history research #mystory #lovedata18

It’s the big day for Love Data Week! Today I am featuring a few of my favorite tools using my own qualitative dataset.

I am working on my PhD in History and, although I am still early in my program, I began research for my dissertation this year. I am currently examining a set of petitions sent to the U.S. Congress that were calling for action during the Armenian massacres of the 1890s. Instead of just reading through and taking notes in Word, I decided to collect information systematically so that I can use the data for a potential digital humanities project.

As I read through, I collected the dates, locations, types of meetings, representatives, and then notes on the language used in the petition. I originally entered the information in Excel because I needed to finish a short paper, but I am switching to a Qualtrics form for final analysis. I still need to test it out and make sure it will work for my questions. I mocked this up in half an hour at the most one day before I went to NARA. If you have suggestions, let me know.

Qualtrics is easy to use and a more reliable way to input data because you can control the types of fields that can be entered. For example, I am interested in changes over time, so I can control both the date fields and the congressional information and avoid input error. Moreover, the data can be exported easily and analyzed in any software. Because I have a large set of petitions, using a system like this is absolutely necessary and more reliable for counts and a broad overview than just taking notes on the petitions.


Petition from Boonton, NJ

Once I had a starting set, I used OpenRefine to clean up some fields. OpenRefine is a much easier and more reliable tool for cleaning data than Excel. For example, in my spreadsheet I was collecting information about the specific representative to whom the petitions were sent. Again, I want to know about this issue over time, so I’d like to see if individuals were receiving specific petitions multiple times and if they got petitions on other issues. But as you can see from this picture, the handwriting is not often legible. This is actually one of the easiest to read. As such, because I was inputting data under a deadline, some of the names of legislators are inconsistent (see the image below). I can use OpenRefine to easily and quickly clean fields. So, for example, all the fields with Fletcher, Minn can quickly become Fletcher, MN. Also, if I have trouble making out someone’s name on one petition but can see their state, I can use OpenRefine to give me clues as to the name from other petitions.

OpenRefine dataset

My dataset in OpenRefine


The tool can do much more than this. Definitely worth checking out for spreadsheet cleaning!

After I cleaned up my spreadsheet, I mapped the petition origination locations just for fun. I used Google Fusion Tables because we have GAFE at UNCG and I wanted to test it versus ArcGIS Online. For simple mapping, it is quite helpful and easy to use. As long as it can recognize location fields (City, State for example), it will try to map the data. I used this to get a sense of where the petitions were coming from. While I thought I knew because I read each one, it was difficult to get a sense of geography after reading over 200 petitions. I assumed most would be from the Northeast, but I was surprised by the actual geographic spread. Of course, a lot more could be done with this. For example, the size of bubbles could be changed depending on the number of petitions from a location.

Map of locations of origin for petitions

Next, I did some basic text visualization using Voyant to explore the themes. I uploaded parts of the petition that referred to the role of the state (my research question) just to explore. You can do several things with Voyant for basic text mining, Armeniancloudbut the word clouds are always fun. You can also see the most frequently used words. In this case they were government (24), Turkish (20), people (19), humanity (18), right (17). Again this was a subset of my petitions (only one column), but it gives you the idea of what you could do just starting out.

I’ve also played around some with Atlas.ti for my documents that are not petitions, especially my primary source newspapers. I haven’t done much with that yet, but I love the iPad version of Atlas.ti and can see it as being useful later on when I have a larger set of primary documents to work with. It is mostly used for qualitative data work in the social sciences, but if you are a historian working with Atlas.ti, Nvivo, or Dedoose, PLEASE get in touch with me. I would love to have some use case scenarios.

Eventually I would also like to do some basic network analysis using Gephi but my data is not ready for that. And of course I could never go anywhere without my Zotero library that provides immediate access to all of my secondary sources and some primary documents! Yes, that is data too!

Historians wanting to do more systematic examinations of their “data” sets of primary documents should check out the Programming Historian. You can do a lot of analysis now that you couldn’t do even five years ago, but you need to start by collecting and managing the information systematically. If you would like tips on doing this, please contact me and we can talk through your project!


My favorite data organization @iassistdata #lovedata18 #alamw18

In honor of Love Data Week I am going to do a series of posts on my favorite data resources/tools. I am a data connector, meaning my primary job is to connect people with the data they need. Because of the proliferation of tools and resources, it can be difficult to choose and find great sources. I also often work with newer data users, so I have to figure out ways to lower barriers to using data of all kinds. I can’t do it alone so I rely on a network of professionals to help me learn about new tools and think up lesson plans.

Many professional organizations out there support data librarians and other data professionals. I wish I could be involved with all of them, but only so much time in the day and bucks in my bank account. My favorite data organization is undoubtedly IASSIST, one of the first international data organizations. This group has been around since the 1970s and brings together data professionals of all types, from metadata specialists to programmers to librarians. Although its traditional focus is social sciences, IASSIST has branched out lately and its annual conference includes sessions on GIS, qualitative data, and much more. The conference this year is in Montreal, and we are joining forces with the Association of Canadian Map Libraries and Archives. Conference registration will open up soon, so I encourage you to consider attending if you love data!

In telling our data stories (one of the themes of #lovedata18), I always remember that I am not navigating my data work alone and that I can draw upon the knowledge of my colleagues. IASSIST provides a forum for immediate assistance through its listserv and a long term network that connects me with colleagues from Australia to Nigeria, from the Federal Reserve banks to tiny colleges in the frozen Midwest. It is definitely a data resource worth considering!

How much do I love data? #lovedata18

Let me count the ways … Today kicks off Love Data Week, a campaign to raise awareness about the variety of issues and topics related to research data. This year the week’s themes revolve around telling our stories about, with, and connected to data of all types.

Because I am primarily a data connector (I connect people to data) rather than a collector (although I’ll show you my dataset of petitions anytime!), I’ll use the week to celebrate my favorite data resources, people, and tools. Data (of all kinds) are the heart of research and undergird the outputs that everyone needs, from scholarly articles to the demographic stats we use to target our patrons. We rely on the proper collection, protection, preservation, and archiving of data to help us understand the world around us.

Tell your data stories too through the Love Data blog or use the hashtag #lovedata18 on Instagram or Twitter. I’d love to see your favorite data visualization, tools, resources and more. Let’s celebrate!

All promotional Love Data 2018 materials used under a Creative Commons Attribution 4.0 International License.

Citation: Nurnberger, A., Coates, H. L., Condon, P., Koshoffer, A. E., Doty, J., Zilinski, L., … Foster, E. D. (2017). Love Data Week [image files]. Retrieved from