Exploring Slave Narratives with ICPSR #AdoptaDataset

Adopt.pngThe team at ICPSR is doing some clever promotions of data for Love Data Week, including Adopt a Dataset! I adopted the Quantitative Data Coded from the Federal Writers’ Project Slave Narratives, United States, 1936-1938.  I’ve read so much about this project and it seemed appropriate for February and Black History Month. You can read the actual interview transcripts on the Library of Congress website: Born in Slavery: Slave Narratives from the Federal Writers’ Project, 1936 to 1938. In the late 1970s, Paul Escott read and coded 2,358 of the slave narratives to create this dataset.

The narratives provide insight both into the process of the interview as well as the experiences of the formerly enslaved people. One of the most controversial questions was about attitudes toward the master, with some writers pointing to “favorable” attitudes toward masters as an indicator of slavery being a “less harsh” institution. But that ignores the fact that there were 771 who did not answer the question (or gave no indication of an answer in the narrative). In addition, around 1200 of the interviewers were white as opposed 400 who were black. In the 1930s American South, it would have been difficult for a person of color to speak ill of a white person in front of another white person. In addition, the coder’s interpretation of favorability needs to be taken into account.

attitude.png

ICPSR has made the dataset easy to use in R. The only trick is that the variables are mostly factors that need to be converted to numeric. ICPSR helpfully provides the R library and functions that can help with the conversion. Just remember to read the documentation closely before jumping in! Below are some my explorations including creating a subset of NC and another of NC women.

R-Narratives

You should adopt a dataset and explore some data! You don’t need to know statistical software because the codebooks can provide some basic overviews of the dataset. In addition, many of their datasets have online analyses available.

Tomorrow you can join their tweetchat starting at 12:30 pm. Go and give some love to your data!

Promoting ICPSR @UNCG

ICPSR
Did you know UNC Greensboro staff, students, & faculty can access ICPSR social/behavioral research data?

The ICPSR Data Fair is always a great opportunity for learning more about their new data tools and services. They are creating lots of new tools for promotions, so I encourage you to check those out.

This year I participated by talking about how to promote ICPSR on campus, including social media outreach, graduate student promotions, and creating targeted messages.

How do you promote ICPSR on your campus? I would love to get some new ideas too!

Beyond the Numbers Day 2

Because I teach a semester-long course sometimes and I have duties elsewhere, I haven’t been able to attend many smaller conferences lately. Even NCLA has been a struggle. Beyond the Numbers is the perfect small conference that brings together people really interested and knowledgeable about a concentrated topic (or related topics). Not only was I able to connect with librarians I haven’t seen in a while, I was able to put several names to faces and meet new people. Day 1 didn’t disappoint.  Day 2 was a half-day so not as much going on, but there were some interesting sessions.
The keynote was by Wendy Stephens, a professor at Jacksonville State University. Her presentation titled All About You, Up For Sale: How Data Brokers Like Cambridge Analytica Construct Consumer Identities looked at data as a commodity and ways that organizations collect information about us. She made the case for controlling the data that we put out online or allow others to connect. She suggested a number of readings for more information some of which are well known and others I’ve not heard of.
Next, Jennifer Boettcher, ALA Councilor and Business and Economic Liaison and Reference Librarian at Georgetown, talked about intellectual property governance for government data. Her slides were quite good and complete, so I will post a link when they are up on the BTN website. She talked about the difference between copyright and public domain, the open data movement, intellectual property, and more. She mentioned her article in Online Searcher so definitely check that out for more info:  Boettcher, J., & Dames, K. (2018). Government data as intellectual property: Is public domain the same as open access? Online Searcher, 42(4).

After Jennifer, Marie Concannon, Katrina Stierholz, and I presented on the PEGI project looking at the preservation of economic data and information. Marie is the Head of the Government Information and Data Archives at University of Missouri and Katrina is the Vice President and Director of Library and Research Information Services at the Federal Reserve Bank of St. Louis. I discussed the history of PEGI and current focus of the project. Marie talked about issues she had come across in her work with economic data (and by the way check out her awesome Prices and Wages by Decade libguide). She discussed several of the issues that we’ve encountered including lack of data documentation, the move to cloud services that require a fee for extraction of government data, and the commercialization of government data. Finally, she mentioned the decreasing number of electronic documents available through the GPO, despite the move to electronic formats. For example, she searched the GPO’s Catalog of Government Publications for the “L” SuDoc, which includes the documents for the Labor department, and only 30 items came up. Keep in mind, these aren’t print documents; these are the electronic documents. Her presentation  brought home the scale of the problem that we face regarding the loss of government information.

Katrina then talked about the process of revising economic data and the importance in capturing those revisions over time. She talked about how current versions of economic data are less accurate, but those are the ones on which policy is often made. Therefore, we need to collect the past data so that we can better understand how policy was decided and what the errors were. Moreover, ALFRED, the historical economic data database, only captures series that are in FRED, but there are a lot of data series that aren’t in FRED. Furthermore, the Federal Reserve Banks aren’t government agencies and aren’t subject to the same rules for retention.  So, the question becomes how do we coordinate with these kinds of special nongov organizations that are producing information necessary to the functioning of our government? What becomes the highest priorities?

Lots to think about. PEGI will hold a national forum in December at the CNI meeting with the goal of bringing in stakeholders from the wider communities (librarians, community leaders, activists, archivists, journalists, and government employees). More to come on those discussions soon.

 

Finally, for our working lunch, representatives from Census, the Federal Reserve Board and Banks, the Bureau of Labor Statistics, the World Bank, the OECD, and FRED sat on a panel to discuss various issues. Ron Nakao, the Data and Economics Librarian at Stanford, asked an interesting question about the priorities for data in these organizations. He noted that there are three threads in data: data creation/collection,  metadata creation/collection, and tool creation/collection, and that the metadata curation aspects often do not have enough infrastructural support. Several of the representatives agreed and noted the activities at their institutions for metadata creation. For example, the Census Bureau is requiring all surveys to use the same metadata and the BLS is working towards a glossary of terms. Hopefully those efforts will help to reduce the creation of metadata as an afterthought in the data collection/creation process.
Great conference. Really happy that I went although it was whirlwind! BTN is on the list for next year!
P.S. Thanks for the mug! It’s like they know me … and my coffee addiction.
IMG_9149

Beyond the Numbers 2018 Day 1 #data #BTN2018

I had the opportunity to attend the Beyond the Numbers conference at the Federal Reserve Bank of St. Louis this week. This biennial event brings together librarians, archivists, and economists from all over the country to talk about the challenges in economic information access and use. Usually they add their presentation materials every year so check back for slides. I’ve never been to the conference but have heard a lot about it from IASSIST members since it started in 2014. I arrived late because of teaching and plane malfunctions, but I was able to attend a few sessions on Thursday.

IMG_9140
Data Play

The first was with Christine Murray from Bates College talking about using R for economics data. She did a great job showing both the basics of using R and then how to do use the pdfetch package to work with time series from economic data vendors like FRED, BLS, and others. I’ve imported data using API but this package makes it much easier to work with these particular vendors.  You can also visualize and layer time series within R. She created a great libguide showing how to use R for economics. Definitely going on my data play list for winter break.

The second was Kate McNamara’s Evidence-Based Research with the Census Bureau Data Linkage Infrastructure. Kate talked about the new efforts in the Census Bureau’s Data Linkage Infrastructure program. This is related to the  Federal Statistical Data Research Centers (FSRDCs) located around the country (our closest is at Duke) that have administrative data from a wide variety of government agencies that are linked together. Researchers must apply to access the data and it has been a lengthy (and slightly cumbersome process). One of their efforts is to promote evidence building projects that are collaborations between Bureau researchers and academics. The difficulty for academics in the past has been that, while there is a data inventory, the CB hasn’t provided detailed metadata about the available datasets and information on what unique identifiers are available for linking datasets. Without that information it can extremely difficult for researchers to know before they apply if the data will be useful. The CB is preparing though to post that metadata on ICPSR and create a new inventory available to the public. That is REALLY exciting news for data users.

Finally,  Kristin Fontichiaro and Wendy Stephens presented on From “Skip the Numbers” to “Great Stuff”: A Data Education Project. These LIS professors created a project geared to high school teachers and media center specialists to help them integrate statistical literacy into their curricula. Their project, Creating Data Literate Students, made the rounds a while back and they have recordings from past virtual conferences if you are interested. For lower level or data adverse students, the principles and teaching suggestions are very helpful. They also have two free books on teaching statistical and data literacy in teaching. I’m xcited to read Lynette Hoelter’s chapter! She does some great work at ICPSR.

So, day 1 is a wrap. Today we learn more about data and I am presenting on the PEGI Project. Exciting stuff and more to come!

Ensuring access to government information

Interested in efforts to ensure access to gov info? Concerned about future access to our nation’s information heritage?

Check out the special issue pre-prints from Against the Grain from the issue that Shari Laster and I edited. The issue covers a wide range of topics, including the Data Refuge initiative, the End of Term Presidential ArchiveEnd of Term Presidential Archive, the PEGI Project and much more! We even have Canada!

Big thanks to Shari for agreeing to edit with me and to all the authors for being great colleagues!

giphy

 

 

 

My favorite data organization @iassistdata #lovedata18 #alamw18

In honor of Love Data Week I am going to do a series of posts on my favorite data resources/tools. I am a data connector, meaning my primary job is to connect people with the data they need. Because of the proliferation of tools and resources, it can be difficult to choose and find great sources. I also often work with newer data users, so I have to figure out ways to lower barriers to using data of all kinds. I can’t do it alone so I rely on a network of professionals to help me learn about new tools and think up lesson plans.

Many professional organizations out there support data librarians and other data professionals. I wish I could be involved with all of them, but only so much time in the day and bucks in my bank account. My favorite data organization is undoubtedly IASSIST, one of the first international data organizations. This group has been around since the 1970s and brings together data professionals of all types, from metadata specialists to programmers to librarians. Although its traditional focus is social sciences, IASSIST has branched out lately and its annual conference includes sessions on GIS, qualitative data, and much more. The conference this year is in Montreal, and we are joining forces with the Association of Canadian Map Libraries and Archives. Conference registration will open up soon, so I encourage you to consider attending if you love data!

In telling our data stories (one of the themes of #lovedata18), I always remember that I am not navigating my data work alone and that I can draw upon the knowledge of my colleagues. IASSIST provides a forum for immediate assistance through its listserv and a long term network that connects me with colleagues from Australia to Nigeria, from the Federal Reserve banks to tiny colleges in the frozen Midwest. It is definitely a data resource worth considering!