Using basic data tools for history research #mystory #lovedata18

It’s the big day for Love Data Week! Today I am featuring a few of my favorite tools using my own qualitative dataset.

I am working on my PhD in History and, although I am still early in my program, I began research for my dissertation this year. I am currently examining a set of petitions sent to the U.S. Congress that were calling for action during the Armenian massacres of the 1890s. Instead of just reading through and taking notes in Word, I decided to collect information systematically so that I can use the data for a potential digital humanities project.

As I read through, I collected the dates, locations, types of meetings, representatives, and then notes on the language used in the petition. I originally entered the information in Excel because I needed to finish a short paper, but I am switching to a Qualtrics form for final analysis. I still need to test it out and make sure it will work for my questions. I mocked this up in half an hour at the most one day before I went to NARA. If you have suggestions, let me know.

Qualtrics is easy to use and a more reliable way to input data because you can control the types of fields that can be entered. For example, I am interested in changes over time, so I can control both the date fields and the congressional information and avoid input error. Moreover, the data can be exported easily and analyzed in any software. Because I have a large set of petitions, using a system like this is absolutely necessary and more reliable for counts and a broad overview than just taking notes on the petitions.


Petition from Boonton, NJ

Once I had a starting set, I used OpenRefine to clean up some fields. OpenRefine is a much easier and more reliable tool for cleaning data than Excel. For example, in my spreadsheet I was collecting information about the specific representative to whom the petitions were sent. Again, I want to know about this issue over time, so I’d like to see if individuals were receiving specific petitions multiple times and if they got petitions on other issues. But as you can see from this picture, the handwriting is not often legible. This is actually one of the easiest to read. As such, because I was inputting data under a deadline, some of the names of legislators are inconsistent (see the image below). I can use OpenRefine to easily and quickly clean fields. So, for example, all the fields with Fletcher, Minn can quickly become Fletcher, MN. Also, if I have trouble making out someone’s name on one petition but can see their state, I can use OpenRefine to give me clues as to the name from other petitions.

OpenRefine dataset

My dataset in OpenRefine


The tool can do much more than this. Definitely worth checking out for spreadsheet cleaning!

After I cleaned up my spreadsheet, I mapped the petition origination locations just for fun. I used Google Fusion Tables because we have GAFE at UNCG and I wanted to test it versus ArcGIS Online. For simple mapping, it is quite helpful and easy to use. As long as it can recognize location fields (City, State for example), it will try to map the data. I used this to get a sense of where the petitions were coming from. While I thought I knew because I read each one, it was difficult to get a sense of geography after reading over 200 petitions. I assumed most would be from the Northeast, but I was surprised by the actual geographic spread. Of course, a lot more could be done with this. For example, the size of bubbles could be changed depending on the number of petitions from a location.

Map of locations of origin for petitions

Next, I did some basic text visualization using Voyant to explore the themes. I uploaded parts of the petition that referred to the role of the state (my research question) just to explore. You can do several things with Voyant for basic text mining, Armeniancloudbut the word clouds are always fun. You can also see the most frequently used words. In this case they were government (24), Turkish (20), people (19), humanity (18), right (17). Again this was a subset of my petitions (only one column), but it gives you the idea of what you could do just starting out.

I’ve also played around some with Atlas.ti for my documents that are not petitions, especially my primary source newspapers. I haven’t done much with that yet, but I love the iPad version of Atlas.ti and can see it as being useful later on when I have a larger set of primary documents to work with. It is mostly used for qualitative data work in the social sciences, but if you are a historian working with Atlas.ti, Nvivo, or Dedoose, PLEASE get in touch with me. I would love to have some use case scenarios.

Eventually I would also like to do some basic network analysis using Gephi but my data is not ready for that. And of course I could never go anywhere without my Zotero library that provides immediate access to all of my secondary sources and some primary documents! Yes, that is data too!

Historians wanting to do more systematic examinations of their “data” sets of primary documents should check out the Programming Historian. You can do a lot of analysis now that you couldn’t do even five years ago, but you need to start by collecting and managing the information systematically. If you would like tips on doing this, please contact me and we can talk through your project!


Valentine to @ICPSR #lovedata18 #ldw18

In honor of Love Data Week, here’s a shout out to one of my favorite data archives/resources.

Many social scientists are familiar with ICPSR, the Inter-university Consortium for Political and Social Research, but faculty may not always keep up with the new goodies at the archive. Since I became a data librarian in 2007, ICPSR has expanded its resources widely to include a wealth of training materials. Along with its Summer Program in Quantitative Methods of Social Research, ICPSR also has a variety of modules for teaching and learning about data, data concepts, social science concepts, and more. My favorite tool is the Social Science Variables Database that allows users to search for variables in the major data studies, about 76% of ICPSR’s holdings. In addition to isolating data studies with specific, required variables, the tool allows users to examine the questions being asked across data studies. ICPSR has much more in its expansive offerings, especially for members of its consortium. Definitely worth a look and some love!

My favorite data organization @iassistdata #lovedata18 #alamw18

In honor of Love Data Week I am going to do a series of posts on my favorite data resources/tools. I am a data connector, meaning my primary job is to connect people with the data they need. Because of the proliferation of tools and resources, it can be difficult to choose and find great sources. I also often work with newer data users, so I have to figure out ways to lower barriers to using data of all kinds. I can’t do it alone so I rely on a network of professionals to help me learn about new tools and think up lesson plans.

Many professional organizations out there support data librarians and other data professionals. I wish I could be involved with all of them, but only so much time in the day and bucks in my bank account. My favorite data organization is undoubtedly IASSIST, one of the first international data organizations. This group has been around since the 1970s and brings together data professionals of all types, from metadata specialists to programmers to librarians. Although its traditional focus is social sciences, IASSIST has branched out lately and its annual conference includes sessions on GIS, qualitative data, and much more. The conference this year is in Montreal, and we are joining forces with the Association of Canadian Map Libraries and Archives. Conference registration will open up soon, so I encourage you to consider attending if you love data!

In telling our data stories (one of the themes of #lovedata18), I always remember that I am not navigating my data work alone and that I can draw upon the knowledge of my colleagues. IASSIST provides a forum for immediate assistance through its listserv and a long term network that connects me with colleagues from Australia to Nigeria, from the Federal Reserve banks to tiny colleges in the frozen Midwest. It is definitely a data resource worth considering!

ALA Council Day 1 #alamw18 #alacouncil

My first day on Council was awesome! The proceedings went by quickly, which is unusual apparently. We had a few items of interest, however, and much more will be discussed today at Council II.

  • We approved a new ALA award for mid-career professionals called the Lois Ann Gregory-Wood Fellows Program. The award is named after Lois Ann, the ALA Council Secretariat and honors her 50 years of service at ALA. The awards will provide funding for mid-career professionals who want to be involved in ALA governance, but who do not have institutional funding. More information is forthcoming on applying and donating! I will post information as I have it.
  • The Librarian of Congress Carla Hayden was approved to be given honorary ALA membership. We were all happy about that one!
  • The ALA Membership Committee brought forward a resolution to adjust dues with the CPI. The resolution passed and will go before the membership on the spring 2018 ballot. Another good reason to VOTE this spring!
  • The ALA Executive Director Mary W. Ghikas gave figures for the Midwinter conference attendance. As of February 10, registration was 7,894, a decrease from 8,892 last year. This is certainly an ongoing issue for Midwinter, and an issue that we will discuss at Council II today!

If you have questions or concerns for ALA, please feel free to contact me! I hope to represent NC’s interests as fully as possible, but I’d love to do that with feedback from NC librarians.

How much do I love data? #lovedata18

Let me count the ways … Today kicks off Love Data Week, a campaign to raise awareness about the variety of issues and topics related to research data. This year the week’s themes revolve around telling our stories about, with, and connected to data of all types.

Because I am primarily a data connector (I connect people to data) rather than a collector (although I’ll show you my dataset of petitions anytime!), I’ll use the week to celebrate my favorite data resources, people, and tools. Data (of all kinds) are the heart of research and undergird the outputs that everyone needs, from scholarly articles to the demographic stats we use to target our patrons. We rely on the proper collection, protection, preservation, and archiving of data to help us understand the world around us.

Tell your data stories too through the Love Data blog or use the hashtag #lovedata18 on Instagram or Twitter. I’d love to see your favorite data visualization, tools, resources and more. Let’s celebrate!

All promotional Love Data 2018 materials used under a Creative Commons Attribution 4.0 International License.

Citation: Nurnberger, A., Coates, H. L., Condon, P., Koshoffer, A. E., Doty, J., Zilinski, L., … Foster, E. D. (2017). Love Data Week [image files]. Retrieved from

Get ready to support federal funding for libraries! #fundlibraries

Coming soon to a Congress near you …

We are mired in the current year budget process still, but soon the President will release the budget proposal for FY 2019. As such, the ALA Washington Office is preparing its campaign in case we see an attempt to cut federal library funding. While the campaign is still in the early stages and the FY 2019 proposal has not been received, the ALA Washington Office held a conference call today for state chapter councilors and leaders on the general timeline and provided a few resources. The shutdown has pushed back the cycle, but the budget should be coming in the next couple of weeks. In March, ALA will send out its Dear Appropriator letters to representatives and Senators and the long process of outreach will begin. You can do the following in the meantime:

  • Sign up for
  • Check out the #fundlibraries website.
  • Start the process of inviting your representatives/senators to visit your library and show off your resources and services.
  • Start collecting stories that demonstrate impact of library services based on concrete examples in your state. The #fundlibraries website  will have a form for submitting stories, but go ahead and beginning gathering them now.
  • Once the budget proposal is released and the campaign begins, you can contact you reps. The #fundlibraries website. will have real time information on whether they have signed the Dear Appropriator letters (example below). Please hold off on this step until the budget has been released and the campaign has begun.

The ALA Washington Office will give regular updates, especially once the budget proposal is available, but we can/should start the legwork to get ready!




More Info on Net Neutrality

This is from a recent email from the ALA Washington Office:

The Washington Office is following up last week’s net neutrality blog with an early analysis of the FCC’s draft order, as well as an action alert that ALA members can use to contact Congress. As we write in the blog, we believe FCC Chairman Pai likely has the three votes needed among FCC commissioners to pass the order. Contacting Members of Congress to pressure the Chairman is the most reasonable grassroots strategy as we prepare for the almost certain legal challenges to come.

Please share the blog post and action alert with your colleagues. We will continue our analysis and planning for how to best inform and engage ALA members as this issue continues to play out. The FCC vote is scheduled for December 14, so we are considering options for activities leading up to and during that day. Be assured we are watching this issue closely.

In case it comes up in any of your units, roundtables, or divisions— ALA has two net neutrality resolutions from 2006 and 2014. The first is a resolution affirming network neutrality and the second is a resolution reaffirming support.

  • Resolution endorsed by ALA Council on June 28, 2006. Council Document 20.12 (CD#20.12):