Using basic data tools for history research #mystory #lovedata18

It’s the big day for Love Data Week! Today I am featuring a few of my favorite tools using my own qualitative dataset.

I am working on my PhD in History and, although I am still early in my program, I began research for my dissertation this year. I am currently examining a set of petitions sent to the U.S. Congress that were calling for action during the Armenian massacres of the 1890s. Instead of just reading through and taking notes in Word, I decided to collect information systematically so that I can use the data for a potential digital humanities project.

As I read through, I collected the dates, locations, types of meetings, representatives, and then notes on the language used in the petition. I originally entered the information in Excel because I needed to finish a short paper, but I am switching to a Qualtrics form for final analysis. I still need to test it out and make sure it will work for my questions. I mocked this up in half an hour at the most one day before I went to NARA. If you have suggestions, let me know.

Qualtrics is easy to use and a more reliable way to input data because you can control the types of fields that can be entered. For example, I am interested in changes over time, so I can control both the date fields and the congressional information and avoid input error. Moreover, the data can be exported easily and analyzed in any software. Because I have a large set of petitions, using a system like this is absolutely necessary and more reliable for counts and a broad overview than just taking notes on the petitions.


Petition from Boonton, NJ

Once I had a starting set, I used OpenRefine to clean up some fields. OpenRefine is a much easier and more reliable tool for cleaning data than Excel. For example, in my spreadsheet I was collecting information about the specific representative to whom the petitions were sent. Again, I want to know about this issue over time, so I’d like to see if individuals were receiving specific petitions multiple times and if they got petitions on other issues. But as you can see from this picture, the handwriting is not often legible. This is actually one of the easiest to read. As such, because I was inputting data under a deadline, some of the names of legislators are inconsistent (see the image below). I can use OpenRefine to easily and quickly clean fields. So, for example, all the fields with Fletcher, Minn can quickly become Fletcher, MN. Also, if I have trouble making out someone’s name on one petition but can see their state, I can use OpenRefine to give me clues as to the name from other petitions.

OpenRefine dataset

My dataset in OpenRefine


The tool can do much more than this. Definitely worth checking out for spreadsheet cleaning!

After I cleaned up my spreadsheet, I mapped the petition origination locations just for fun. I used Google Fusion Tables because we have GAFE at UNCG and I wanted to test it versus ArcGIS Online. For simple mapping, it is quite helpful and easy to use. As long as it can recognize location fields (City, State for example), it will try to map the data. I used this to get a sense of where the petitions were coming from. While I thought I knew because I read each one, it was difficult to get a sense of geography after reading over 200 petitions. I assumed most would be from the Northeast, but I was surprised by the actual geographic spread. Of course, a lot more could be done with this. For example, the size of bubbles could be changed depending on the number of petitions from a location.

Map of locations of origin for petitions

Next, I did some basic text visualization using Voyant to explore the themes. I uploaded parts of the petition that referred to the role of the state (my research question) just to explore. You can do several things with Voyant for basic text mining, Armeniancloudbut the word clouds are always fun. You can also see the most frequently used words. In this case they were government (24), Turkish (20), people (19), humanity (18), right (17). Again this was a subset of my petitions (only one column), but it gives you the idea of what you could do just starting out.

I’ve also played around some with Atlas.ti for my documents that are not petitions, especially my primary source newspapers. I haven’t done much with that yet, but I love the iPad version of Atlas.ti and can see it as being useful later on when I have a larger set of primary documents to work with. It is mostly used for qualitative data work in the social sciences, but if you are a historian working with Atlas.ti, Nvivo, or Dedoose, PLEASE get in touch with me. I would love to have some use case scenarios.

Eventually I would also like to do some basic network analysis using Gephi but my data is not ready for that. And of course I could never go anywhere without my Zotero library that provides immediate access to all of my secondary sources and some primary documents! Yes, that is data too!

Historians wanting to do more systematic examinations of their “data” sets of primary documents should check out the Programming Historian. You can do a lot of analysis now that you couldn’t do even five years ago, but you need to start by collecting and managing the information systematically. If you would like tips on doing this, please contact me and we can talk through your project!


How much do I love data? #lovedata18

Let me count the ways … Today kicks off Love Data Week, a campaign to raise awareness about the variety of issues and topics related to research data. This year the week’s themes revolve around telling our stories about, with, and connected to data of all types.

Because I am primarily a data connector (I connect people to data) rather than a collector (although I’ll show you my dataset of petitions anytime!), I’ll use the week to celebrate my favorite data resources, people, and tools. Data (of all kinds) are the heart of research and undergird the outputs that everyone needs, from scholarly articles to the demographic stats we use to target our patrons. We rely on the proper collection, protection, preservation, and archiving of data to help us understand the world around us.

Tell your data stories too through the Love Data blog or use the hashtag #lovedata18 on Instagram or Twitter. I’d love to see your favorite data visualization, tools, resources and more. Let’s celebrate!

All promotional Love Data 2018 materials used under a Creative Commons Attribution 4.0 International License.

Citation: Nurnberger, A., Coates, H. L., Condon, P., Koshoffer, A. E., Doty, J., Zilinski, L., … Foster, E. D. (2017). Love Data Week [image files]. Retrieved from

More Info on Net Neutrality

This is from a recent email from the ALA Washington Office:

The Washington Office is following up last week’s net neutrality blog with an early analysis of the FCC’s draft order, as well as an action alert that ALA members can use to contact Congress. As we write in the blog, we believe FCC Chairman Pai likely has the three votes needed among FCC commissioners to pass the order. Contacting Members of Congress to pressure the Chairman is the most reasonable grassroots strategy as we prepare for the almost certain legal challenges to come.

Please share the blog post and action alert with your colleagues. We will continue our analysis and planning for how to best inform and engage ALA members as this issue continues to play out. The FCC vote is scheduled for December 14, so we are considering options for activities leading up to and during that day. Be assured we are watching this issue closely.

In case it comes up in any of your units, roundtables, or divisions— ALA has two net neutrality resolutions from 2006 and 2014. The first is a resolution affirming network neutrality and the second is a resolution reaffirming support.

  • Resolution endorsed by ALA Council on June 28, 2006. Council Document 20.12 (CD#20.12):

Net Neutrality and ALA

The ALA Washington Office has released a resource for librarians concerned about recent actions on net neutrality. You can still take action to protest the move to roll back net neutrality, but there are some deadlines coming up in December. If you are concerned about this issue, take action soon.

If you are new to the idea of net neutrality, the most accessible overview is  John Oliver’s slightly NSFW video from Last Week Tonight. It is also decidedly NSFI (not safe for instruction). I’ll find additional resources and post here. Please let me know if you have other resources that are particularly good.

Finally, I am the incoming ALA Councilor for North Carolina. Our orientation isn’t until February, but they have put us on the lists for Council. So, I will post information here and on social media if it seems critical. Let me know if you have questions or concerns so that I can better represent NC at Council!

Slides and video from Tips for Student Success with Data @SAGElibrarynews

I gave a presentation with Diana Aleman from SAGE Stats about helping students discover data. The principles are pretty straightforward, but hopefully you will find some of them helpful.

My slides are available for reuse/remix on figshare.

The recording is also available including Diana’s part (she’s awesome!).

IFLA, Wrocław, and #WLIC2017

I’m having a great conference at IFLA. I’ve met people from all over the world, including Croatia, Australia, Malaysia, Taiwan, South Africa, Nigeria, Germany, Norway, the UK, as well as our large US contingent. I’ve really enjoyed learning about the variety of projects people are working on from library support for refugees to reading habits of Iranian students to a new VR initiative at the Bergen public library. You can see the variety of topics from a selection of posters.

Today I had my first meeting with the Standing Committee for the Social Science Libraries section. The structure of IFLA is a bit convoluted, but I am slowly learning. Basically, my Standing Committee does great projects like tomorrow’s workshop on using ethnography in the library and more. Looking forward to sharing their work when I am officially a member of the committee after this conference.

Spencer Acadia and I also represented IASSIST at our poster session. We were able to talk to a few people and our poster will stay up all through the conference. We are still looking to build up our membership in South America and Africa!

Finally, I’ve been trying to get to know Wrocław again. The city has changed quite a bit and is much more vibrant now. It has developed a student-oriented character that wasn’t quite here in 1995.

Several times in my walks around town a place suddenly seemed familiar and I could see it again through my memories. The picture below is of a lovely square and roundabout called Plac Kościuszki near my hotel and the train station. I walked by it a few times before I remembered this evening that it was the place I used to go for coffee almost once a week. I think a post office was also on the corner. But it seems to be gone now.

Tonight I was also able to do a food tour of the city. When I was here last, I was a student and I couldn’t afford to eat out much. We cooked a lot of pierogi that semester. On the tour, one of the restaurants had a room decorated like a Polish living room during the Communist period. In one corner was a shrine to Solidarność, the trade union and opposition group, and its members from Wrocław.

So, just a few of the things going on. Tomorrow I have an all day workshop and will post notes as possible. It has been a blast sharing and learning with everyone at IFLA. Although I am new to the organization and only know a few people, everybody has been friendly and welcoming.

And I will close out with a picture of the translation booth and the IFLA President on the stage. I’ve never seen a translation booth in action, so quite a fun moment for me!

Help! webinar for August

Help! I’m an Accidental Government Information Librarian presents … State Agency Databases Project, finding and sharing agency databases by subject map-1149538_960_720

The Government Resources Section of the North Carolina Library Association welcomes you to a series of webinars designed to help us increase our familiarity with government information. All are welcome because government information wants to be free.

Join Daniel Cornwall for a short exploration of the State Agency Databases Project of ALA GODORT. Daniel will show the types of resources through the project and the new auto-updating subject compilations the LibGuides platform has enabled. He will conclude with how to contribute new databases to the project and how to share content from the project on your own LibGuides and web pages.

Daniel Cornwall is the Internet and Technology Consultant for the Alaska State Library. He has over a decade of experience in federal and state government information. He has led the State Agency Databases Project for ALA GODORT since July 2017. When not doing library or government information type stuff he enjoys hiking, reading and working on citizen science projects at More professional information about Daniel can be found on his LinkedIn profile at

We will meet together online on August 16 from 12:00 – 1:00 p.m. (Eastern). Please RSVP for the session using this link:

We will use WebEx for the live session. Information on testing and accessing the session will be made available when you register.

The session will be recorded and available after the live session, linked from the NCLA GRS web page (