NCLA GRS webinar on Census data access

Took a small break from webinaring to survive the semester. This summer NCLA GRS Help! is back with a hankering for gov info. We have webinars June, July, and August. Don’t miss out!
Help! I’m an Accidental Government Information Librarian presents … Census Bureau Data Access
The Government Resources Section of the North Carolina Library Association welcomes you to a series of webinars designed to help us increase our familiarity with government information. All are welcome because government information wants to be free.
Do you find American FactFinder challenging?  Well, get ready, because the Census Bureau is beta-testing a new data platform that will replace ALL of its data tools!  Michele Hayslett will give a brief overview of the process, the tools to be transferred, and the timeline; provide an orientation to the interface; and demonstrate a few searches.  You will receive the questions the Census Bureau would like testers of all skill levels to answer about their experiences with the new interface, and the email address to which to send feedback.
Michele Hayslett started learning about Census data when she worked as the Demographics Specialist at the State Library of North Carolina in the early 2000s, and hasn’t stopped learning.  She is currently the Librarian for Numeric Data Services and Data Management at UNC at Chapel Hill, working in the Davis Library Research Hub.
We will meet together online on June 7 from 12:00 – 1:00 p.m. (Eastern). Please RSVP for the session using this link: 
We will use WebEx for the live session. Information on testing and accessing the session will be made available when you register.
The session will be recorded and available after the live session, linked from the NCLA GRS web page (



Ensuring access to government information

Interested in efforts to ensure access to gov info? Concerned about future access to our nation’s information heritage?

Check out the special issue pre-prints from Against the Grain from the issue that Shari Laster and I edited. The issue covers a wide range of topics, including the Data Refuge initiative, the End of Term Presidential ArchiveEnd of Term Presidential Archive, the PEGI Project and much more! We even have Canada!

Big thanks to Shari for agreeing to edit with me and to all the authors for being great colleagues!





Using basic data tools for history research #mystory #lovedata18

It’s the big day for Love Data Week! Today I am featuring a few of my favorite tools using my own qualitative dataset.

I am working on my PhD in History and, although I am still early in my program, I began research for my dissertation this year. I am currently examining a set of petitions sent to the U.S. Congress that were calling for action during the Armenian massacres of the 1890s. Instead of just reading through and taking notes in Word, I decided to collect information systematically so that I can use the data for a potential digital humanities project.

As I read through, I collected the dates, locations, types of meetings, representatives, and then notes on the language used in the petition. I originally entered the information in Excel because I needed to finish a short paper, but I am switching to a Qualtrics form for final analysis. I still need to test it out and make sure it will work for my questions. I mocked this up in half an hour at the most one day before I went to NARA. If you have suggestions, let me know.

Qualtrics is easy to use and a more reliable way to input data because you can control the types of fields that can be entered. For example, I am interested in changes over time, so I can control both the date fields and the congressional information and avoid input error. Moreover, the data can be exported easily and analyzed in any software. Because I have a large set of petitions, using a system like this is absolutely necessary and more reliable for counts and a broad overview than just taking notes on the petitions.


Petition from Boonton, NJ

Once I had a starting set, I used OpenRefine to clean up some fields. OpenRefine is a much easier and more reliable tool for cleaning data than Excel. For example, in my spreadsheet I was collecting information about the specific representative to whom the petitions were sent. Again, I want to know about this issue over time, so I’d like to see if individuals were receiving specific petitions multiple times and if they got petitions on other issues. But as you can see from this picture, the handwriting is not often legible. This is actually one of the easiest to read. As such, because I was inputting data under a deadline, some of the names of legislators are inconsistent (see the image below). I can use OpenRefine to easily and quickly clean fields. So, for example, all the fields with Fletcher, Minn can quickly become Fletcher, MN. Also, if I have trouble making out someone’s name on one petition but can see their state, I can use OpenRefine to give me clues as to the name from other petitions.

OpenRefine dataset

My dataset in OpenRefine


The tool can do much more than this. Definitely worth checking out for spreadsheet cleaning!

After I cleaned up my spreadsheet, I mapped the petition origination locations just for fun. I used Google Fusion Tables because we have GAFE at UNCG and I wanted to test it versus ArcGIS Online. For simple mapping, it is quite helpful and easy to use. As long as it can recognize location fields (City, State for example), it will try to map the data. I used this to get a sense of where the petitions were coming from. While I thought I knew because I read each one, it was difficult to get a sense of geography after reading over 200 petitions. I assumed most would be from the Northeast, but I was surprised by the actual geographic spread. Of course, a lot more could be done with this. For example, the size of bubbles could be changed depending on the number of petitions from a location.

Map of locations of origin for petitions

Next, I did some basic text visualization using Voyant to explore the themes. I uploaded parts of the petition that referred to the role of the state (my research question) just to explore. You can do several things with Voyant for basic text mining, Armeniancloudbut the word clouds are always fun. You can also see the most frequently used words. In this case they were government (24), Turkish (20), people (19), humanity (18), right (17). Again this was a subset of my petitions (only one column), but it gives you the idea of what you could do just starting out.

I’ve also played around some with Atlas.ti for my documents that are not petitions, especially my primary source newspapers. I haven’t done much with that yet, but I love the iPad version of Atlas.ti and can see it as being useful later on when I have a larger set of primary documents to work with. It is mostly used for qualitative data work in the social sciences, but if you are a historian working with Atlas.ti, Nvivo, or Dedoose, PLEASE get in touch with me. I would love to have some use case scenarios.

Eventually I would also like to do some basic network analysis using Gephi but my data is not ready for that. And of course I could never go anywhere without my Zotero library that provides immediate access to all of my secondary sources and some primary documents! Yes, that is data too!

Historians wanting to do more systematic examinations of their “data” sets of primary documents should check out the Programming Historian. You can do a lot of analysis now that you couldn’t do even five years ago, but you need to start by collecting and managing the information systematically. If you would like tips on doing this, please contact me and we can talk through your project!

Valentine to @ICPSR #lovedata18 #ldw18

In honor of Love Data Week, here’s a shout out to one of my favorite data archives/resources.

Many social scientists are familiar with ICPSR, the Inter-university Consortium for Political and Social Research, but faculty may not always keep up with the new goodies at the archive. Since I became a data librarian in 2007, ICPSR has expanded its resources widely to include a wealth of training materials. Along with its Summer Program in Quantitative Methods of Social Research, ICPSR also has a variety of modules for teaching and learning about data, data concepts, social science concepts, and more. My favorite tool is the Social Science Variables Database that allows users to search for variables in the major data studies, about 76% of ICPSR’s holdings. In addition to isolating data studies with specific, required variables, the tool allows users to examine the questions being asked across data studies. ICPSR has much more in its expansive offerings, especially for members of its consortium. Definitely worth a look and some love!

My favorite data organization @iassistdata #lovedata18 #alamw18

In honor of Love Data Week I am going to do a series of posts on my favorite data resources/tools. I am a data connector, meaning my primary job is to connect people with the data they need. Because of the proliferation of tools and resources, it can be difficult to choose and find great sources. I also often work with newer data users, so I have to figure out ways to lower barriers to using data of all kinds. I can’t do it alone so I rely on a network of professionals to help me learn about new tools and think up lesson plans.

Many professional organizations out there support data librarians and other data professionals. I wish I could be involved with all of them, but only so much time in the day and bucks in my bank account. My favorite data organization is undoubtedly IASSIST, one of the first international data organizations. This group has been around since the 1970s and brings together data professionals of all types, from metadata specialists to programmers to librarians. Although its traditional focus is social sciences, IASSIST has branched out lately and its annual conference includes sessions on GIS, qualitative data, and much more. The conference this year is in Montreal, and we are joining forces with the Association of Canadian Map Libraries and Archives. Conference registration will open up soon, so I encourage you to consider attending if you love data!

In telling our data stories (one of the themes of #lovedata18), I always remember that I am not navigating my data work alone and that I can draw upon the knowledge of my colleagues. IASSIST provides a forum for immediate assistance through its listserv and a long term network that connects me with colleagues from Australia to Nigeria, from the Federal Reserve banks to tiny colleges in the frozen Midwest. It is definitely a data resource worth considering!

ALA Council Day 1 #alamw18 #alacouncil

My first day on Council was awesome! The proceedings went by quickly, which is unusual apparently. We had a few items of interest, however, and much more will be discussed today at Council II.

  • We approved a new ALA award for mid-career professionals called the Lois Ann Gregory-Wood Fellows Program. The award is named after Lois Ann, the ALA Council Secretariat and honors her 50 years of service at ALA. The awards will provide funding for mid-career professionals who want to be involved in ALA governance, but who do not have institutional funding. More information is forthcoming on applying and donating! I will post information as I have it.
  • The Librarian of Congress Carla Hayden was approved to be given honorary ALA membership. We were all happy about that one!
  • The ALA Membership Committee brought forward a resolution to adjust dues with the CPI. The resolution passed and will go before the membership on the spring 2018 ballot. Another good reason to VOTE this spring!
  • The ALA Executive Director Mary W. Ghikas gave figures for the Midwinter conference attendance. As of February 10, registration was 7,894, a decrease from 8,892 last year. This is certainly an ongoing issue for Midwinter, and an issue that we will discuss at Council II today!

If you have questions or concerns for ALA, please feel free to contact me! I hope to represent NC’s interests as fully as possible, but I’d love to do that with feedback from NC librarians.