What follows is a lightly edited version of the talk I gave as part of my interview for the job of Assistant Director for Digital Scholarship on December 4, 2015. I had a lot of slides, most of which were screenshots, and I’ve mostly just converted them to links. Also, I had a bunch of jokes in the talk. It turns out I’m very corny. I couldn’t keep those in the written version either.

The prompt:

Discuss three ways in which the nature of academic scholarship is changing and identify key services for each that Penn Libraries should offer in support of researchers’ evolving needs.

Thank you so much for being here today. My talk today is going to address the question I was asked, but before I get started, I will spend a few moments talking about who I am and what I’m bringing to this question. Part of what I bring is my love for Philly. This is my home, and the city I love. As many of you know I worked here at Penn in the Research and Instructional Services Department as Social Sciences Data Services Librarian from 2002-2008, and I’ve been at Haverford for the past seven and a half years. For the first few years at Haverford, I was the Coordinator for Research, Instruction, and Outreach. After we brought in an awesome new library director who led us through a big re-organization, I moved into my current role as Coordinator for Digital Scholarship and Services.

In that role, I’ve been working with my colleagues to build a Digital Scholarship program. At Haverford, we define digital scholarship as library partnerships with faculty and students to create scholarship in new forms. Building this program means we’ve built the human and technical infrastructure in the library to make digital scholarship as manageable and sustainable as we think it can reasonably be, understanding that learning and creating scholarship is a creative, messy, and iterative process. In practical terms, our work involves engagements with faculty and students in the curriculum, on larger scale research projects, and as employers and mentors of students in co-curricular activities. Over the past few years, the digital scholarship group has been involved in dozens of courses, ranging from 20 minute drop-ins to courses where DS labs were built into the weekly schedule and integral to the course outline. We’re now in our second year running a fellowship program for students interested in participating in a sustained DS learning opportunity, and we recently hired a Drexel Library School intern to help run that program. We’re also partnering with faculty on some big and exciting projects, some of which I’ll mention in this talk, and others of which I won’t, but would be happy to talk about later.

The team in Digital Scholarship at Haverford includes me and my colleague Mike full-time, and then two people who don’t report to me but who work closely with us, a metadata specialist and our web coordinator (and Music Librarian). As part of the library, our group works regularly with our colleagues in special collections and with subject librarians. And truly, when describing the team, the key to our success is the 12 student workers we employ. Our students are about half CS majors, and half majoring in a host of other disciplines, and they tend to stay with us throughout their college career, some of them joining us in summers. They develop considerable expertise in the tools, approaches, and strategies we employ in our work, and we rely on them as collaborators.

So, that’s the work I’ve been doing, and I’ll return to aspects of it during the rest of this talk.

Now to the question. I was asked to:

Discuss three ways in which the nature of academic scholarship is changing and identify key services for each that Penn Libraries should offer in support of researchers’ evolving needs.

I think the most important lesson I’ve learned at Haverford is that the nature of our library is co-created by faculty, librarians, and students, as well as by administrators. So I think the best way to build library programs is by instituting strong processes to engage in that co-creation, and to accept that things will keep changing. This means building from where you are, moving through stages, and being flexible in working with collaborators. With that in mind, I’m going to talk about changes in the following areas.

  1. Data is everywhere
  2. Multimodal scholarship
  3. Interdisciplinary and community engaged scholarship

Truly, these three changes are all closely connected to one another and so some of the recommendations I’ll make will overlap across these three areas. But, I think these shifts in scholarship are different enough to imply a few distinct activities within libraries. So on to the first:

Data is everywhere

What do I mean by data? Data means a lot of things, and data was around before there were computers. But these days, data, broadly, ends up meaning stuff that can be processed by computers. And the range of things that can be processed by computers keeps growing. People have access to more and simpler computational power, and we’re learning lots of new ways to understand things when we approach them with computers.

When I say that data is everywhere, I mean that more and more of our objects of study are machine-readable. So, works of literature can be considered data when they’re processed through topic modeling. Computers can operate on temporal data, and on geospatial data, that is, they can be used to understand things in terms of time and space at various scales. Films and objects in space can be modeled with computers to create data as well. I don’t want to imply that this data is all raw or without meaning, or that it is not part of other systems and structures. But a data explosion in the academy and elsewhere is certainly happening, and it is worth contending with.

What does this rise in data imply about library activities? Here are a few things. First:

Libraries should: Approach library collections as data

So, in terms of ways we can respond to the rise of data in libraries, there are a few. Starting with the notion of library collections as data.

This can mean a lot of things. First, it means we understand that there will be researchers who use our collections not just to learn from each item one at a time but who seek to find out what they can learn if they look across whole collections at once. Understanding this, when we buy or lease materials to add to our collections, we need to ensure that our community also has the rights to use the material as data.

Beyond our licensing and purchasing agreements, libraries should make it easy for people to engage with our collections as data. I love OPenn. It’s an awesome way to provide access to the amazing collection of digitized images of collection materials, and using OPenn allows researchers who want to address those images and their metadata in bulk to re-use them and re-imagine them in exciting ways.

There are other libraries who provide access to digital collections as data. Here I’ll call out the DocSouthData project at UNC, Indiana University Libraries’ Digital Collections on Github, and the recently released open collections api from UBC as three awesome examples. I’d actually love to spend a lot more time talking about these three different models, and the pros and cons of each, but I’ll keep going for now as there is a lot more to discuss.

It’s also important to point out that joining Hathi Trust and DPLA, both of which provide bulk or api access to data or metadata, is another important step in approaching library collections as data.

Libraries should: Expose library metadata as data

The mention of Hathi Trust and DPLA brings me a little deeper into the world of exposing library collections as data, and so I want to take a side trip to talk about making sure that we expose the metadata about library collections as data. We can learn a lot from our metadata, and opening it up for analysis using a whole host of tools and methods is really exciting to me. Our metadata provides a record of what we have valued over the years of our libraries. It can help us understand our own histories, what is present and what is absent in our collections, as well as how we’ve been describing the scholarship we provide access to. And once we’ve gotten into exposing our collections and metadata as data, it takes me to my next point, which is that…

Libraries should: Study our own collections as data

I mean this not just as a way of developing services or collections, though that’s important. But as a way of engaging in scholarship that reflects on our own institutions, on what it means to make these collections, and what it means to make them digitally available. There are some really good examples of schools who are doing this. At Haverford, we’ve been thinking a lot about what we can learn from our collections when we use them as data. We’ve also been talking a lot about about what it means to make a collection meaningfully digital.

I want to pause here to talk about one project we worked on last summer, and which will continue for the next few summers, where we wanted student to learn from a collection, and produce digital information, but we weren’t sure that creating images of the collection was going to be the most valuable way to do that. The Haverford Libraries recently received records from the Friends Hospital, one of the earliest Mental Health Institutions in the US, opened in 1819.

Quakers and Mental Health homepage

With support from the Scattergood Foundation, we decided, rather than paying undergraduates to create images of these materials, that we would hire two students, Abby Corcoran, a history major, and Lindsay Silver, an English major with a CS minor, help us learn what we could if we thought of them as data. As Abby spent the summer closely reading the Superintendent’s Day books, and researching and writing about the early days of the asylum, she was gathering lists of names of patients, of information about them, etc. Working separately, but informed by Abby’s research, Lindsay created a web portal where the data that Abby was gathering, and other views of the collection as data, could be explored.

Patient Data as Charts

You can see that Abby and Lindsay took the notes of the Hospital staff, and regularized the way they understood patients at the time, with columns for the length of insanity, their gender, the weekly rate they paid, etc, in keeping with the hospital staff’s concern for only taking patients who were, in their words “curable”.

Patient Data as Table

Again, this way of providing “access” to collections is one I hope to keep exploring, as I think it challenges us and our students to think through what how we can best use computational and non-computational ways of understanding collections.

To summarize, thinking of library collections as data can mean changing the ways we negotiate with vendors, providing access to digitized collections in formats that are useful for computational reuse, using our own metadata as data, and experimenting with the data we create to make new scholarship. I believe it is through this experimentation with our own data that we can create the expertise and infrastructure to continue being great partners to our faculty and research colleagues as they are also experimenting with data-first approaches to their own research, which leads to the next area.

Libraries should: Support researcher data

Library collections are not remotely the only kind of data that we should be contending with. As nearly every discipline engages with data in its various forms, researchers produce their own data, and we in libraries need to continue to expand the kinds of support we can provide for the production, storage, and publication of those data as well.

Penn is already beginning this process by providing support for the creation of Data Management Plans, and by storing data in the institutional repository. Obviously, Haverford is a whole lot smaller than Penn, which means that things can be done a little more personally. Our grants office tells every faculty member writing a proposal for funding to reach out to me if their proposal needs a data management plan. The ensuing conversations have really helped me and our faculty understand data management planning, what kinds of data there might be, what constitutes the objects of study in various disciplines, and what’s worth and not worth keeping over time. I’ve worked with faculty in chemistry, psychology, political science, linguistics and biology, and the conversations have changed my understanding of how our repositories might look in the future. The kind of reproducibility that these data management plans are designed to facilitate will change what we think of as a publication, and as scholarship. So I’m very excited about thinking through the changes to our services that are coming.

But before leaving behind the notion of researcher data, I want to point to a project that I haven’t really worked on, but that I think is so fantastic. This is Project Tier, which was developed by my colleague Norm Medeiros in the library and Richard Ball, a faculty member in the Haverford Economics Department, and funded by the Sloan Foundation and others.

Norm and Richard developed a protocol for empirical papers that aims for “complete reproducibility” and have been using the protocol beginning at Haverford, and now moving across undergrad institutions. I believe they’re expanding into grad schools, as well. Using the protocol, when a student writes a paper that makes claims of an empirical nature, they turn in the paper and the data and the transformations and analysis files, as well as some metadata. I think this project and ones like it are absolutely something to keep an eye on.

Of course, we’re about to turn to multimodal scholarship, where we’ll need to leave behind the notion of reproducibility for a while lest we go down a theoretical and methodological rabbit hole.

Multimodal Scholarship

By multimodal scholarship, I’m describing the whole range of activities that engage with research or presentation methods that cross format platforms. This might mean a book with an accompanying dataset and website. It might mean a network graph showing participants in a poetry group, it might mean a representation of the topics that emerge from computational approaches to english literature. And it might be a website describing and encoding the underrepresented work of women writers. It might be a huge range of other things too.

Libraries should: Make sure we can accept into our collections, and support the creation of multimodal scholarship

So, what should libraries do about the emergence of multimodal scholarship? I have in my notes: make a joke about how hard it is to store and preserve multimodal scholarship. But I could not come up with one. It is just very hard. Saving things in digital forms turns out to be extremely difficult. File formats change, hardware and software change, and all of these things matter in the preservation of digital data. But multimodal scholarship is not just digital data. Interfaces matter. They always matter, but especially when the work is explicitly multimodal. So this is a hard problem. And one we’re going to need to keep working on. It means a lot of conversations as we move through research about what needs to be saved, what can be saved, what the tradeoffs are, and what part of the meaning of our work can or should last into the future.

So yes, we need to make allowances in our repositories for multimodal scholarship. And we have to ask questions about the losses and gains in separating data from interface as false as those choices might be in some cases.

[Dear Reader: please play the video while you read this.] I’m now playing a video that is currently projected on one of the gallery walls in the Magill Library at Haverford as part of an exhibit designed by Ashley Foster’s freshman writing courses (four courses taught over two semesters). The course takes on the art and social action of modernist artists during the Spanish civil war, and connects them to quaker peace testimonies, posters, and letters, all of which were advocating for a positive peace.

The students used Neatline to create a network of interconnected annotations across the poetry of Muriel Rukyeser and Langston Hughes, Picasso’s Guernica, Virginia Woolf’s Three Guineas, with web assignments running throughout the course. They were asked to make connections between works as material links, and to engage Virginia Woolf’s scrapbooks and her Three Guineas as connected to the kinds of sources she drew from and influenced. Preserving these connections as website will be difficult. And so we made these videos, in part, to capture the experience of using a site whose connections are not necessarily the ones one might expect of a webpage.

This was a freshman writing class that grew out of a lot of collaborations that extended across students as students, as workers, the professor as editor, as teacher, librarians from my group, from special collections, with all of us involved in making exhibits, in figuring out projections, in scanning books, and printing labels, and on and on.

Designing this involved a constant interplay between the technical work and the needs of the scholarship. The students in the course worked on the site, but so did my team of student workers, and we all spent tons of time working together and refining and designing. Which brings me my next response to the shift towards multimodal scholarship

Libraries should: Envision the library as a productive space

Libraries are, should be, and always have been involved in the entire life cycle of scholarly production, from inspiration to preservation. We should make sure that the ways we imagine our work honor that, by ensuring that our libraries can be sites for the creation of new knowledge, and not just for its storage and dissemination. In practice this means developing a flexible technical architecture that allows for scholarship to be dreamed up, created, and stored. And it means developing colleagues in the library whose disciplinary expertise extends across text, image, webpage, dataset, and interface.

Transforming technical infrastructure takes time and care. That’s also true of transforming a library. But experimenting, taking some risks, and growing together with our faculty and students is, I believe, the only responsible way for us to grow our capacity.

At Haverford, when we started the Digital Scholarship program, there were a few projects that the library had committed to. As we began to take on more, our friends in IT were understandably hesitant to let us have access to campus servers, as we didn’t really know how to use them or keep them safe. So, we started on Amazon cloud, and did what we could. We asked for their advice, and we took their advice when they offered it. But we were extremely committed to keeping the scholarly question at the center of each of our projects, and making sure that the technology was following from that. At this point, we have a great collaboration with them and 7 virtual machines that they help us maintain. We have shared understandings about how we’ll grow together. That said, we still have our amazon account, so that we can be free to experiment and do stupid things, and make mistakes.

Envisioning the library as a productive space also means making sure it is productive for librarians. There is absolutely a professional development aspect of it. I will point here to the work that’s been done at UNC and Columbia where librarians took two different approaches to learning tools for multimodal scholarship.

At Haverford, we did a summer workshop series for librarians, covering things like mapping, visualization, project planning etc. We actually spent more time in ours talking about approaches that worked and didn’t work with incorporating tools into courses, but that’s just the nature of Haverford. However it’s done, providing ways for librarians with subject experience to explore the multimodal approaches to scholarship that are emerging in their field is an incredibly important aspect of expanding our infrastructure, and one I think we’re all still working on. As we think about what librarians need to learn, we can turn to the third shift in scholarship that I want to call out, which is really closely aligned with the first two.

Interdisciplinary and community engaged scholarship

Here, I want to talk about how very frequently disciplines are borrowing approaches and objects of study from one another these days. The humanities have important work to do in offering approaches to computational studies and in thinking about how algorithms embed various and complex layers of meaning into our world. And of course methods from computational studies are making their ways into the humanities. I think this is happening for a few reasons. Partially because it feels right. We do this work to create knowledge for the world. And we have all of these tools that make that possible. It just feels right to share things earlier and more widely to a lot of people. This kind of sharing leads to more borrowing and a more open process.

Borrowing methods across disciplines, sharing objects of study across disciplines and engaging with communities we study and serve are parts of my work I’m really interested in. In part because I think that deep interdisciplinary and multimodal scholarship can challenge us to understanding that the expertise to make sense of a problem might not live in a single mind, but be arrived at together in a group.

Libraries should: Create and collect repurposable scholarship, when possible

I’m not going to say more about this as a lot of the work of thinking of materials as data can help with this. Part of what librarians can do is make methods and processes interdisciplinary, as well.

Libraries should: Make library expertise and resources interdisciplinary and community engaged

By this, I mean that we can get better at thinking about our own methods, research methods, archival methods, digital preservation methods, metadata creation as interdicsiplinary. Partially we should make our own work the objects of study, and partially we should throw these methods into the pot of scholarly creation. Of course, we should be careful to make sure that the projects we work on expand our infrastructure, our skill sets, and help us make more projects sustainable. But, if everyone is getting in on the party of using our skillsets on other peoples objects of studies, let’s join the party. (I’m thinking here about how it feels like the whole world keeps discovering that archives are crazy interesting like they just made it up and there isn’t a whole field thinking and writing about this)

At Haverford, we created a digital scholarship fellowship program to introduce students to the methods of digital scholarship. For our pilot year, we decided to test out these skills on a public art and research project co-curated by Paul Farber, a Haverford faculty member, Will Brown, a curatorial assistant at RISD, and Ken Lum, of the Fine Arts department here at Penn. The fellowship was designed so that we could introduce students to the methods of digital scholarship that we’d been working with, and to help them incorporate them into their work, but the Monument Lab project that we worked on is an example of the kind of community engage scholarship that really excites me.

Fellows at work

As the students in our digital scholarship fellows program went through the year, with every skill or approach they learned, they use Monument Lab as the project. Monument lab was a public art and civic engagement project that took place in the center of city hall courtyard in the spring. It was framed around a central guiding question: What is an appropriate monument for the current city of Philadelphia? There was a lab at City Hall next to a fantastic sculpture proposed by the late artist (and Penn faculty member) Terry Adkins. At the lab, members of the public were asked to propose speculative monuments that might help answer this question, and their answers were added to a map of the city, and to the OpenDataPhilly repository of Philadelphia data. The fellows were basically consultants on the project. As they thought about interface design, they thought about it for Monument Lab. When they worked on visualizing data, we used a Philly monuments dataset, etc. Engaging students with the methods of digital scholarship on a real project that would also be engaged with by the citizens of Philadelphia was powerful both for them and for us. This year’s cohort of students will be working on the Data produced for the Friends Asylum project I talked about earlier, and it will be interesting for us to see how the students engage with this totally different subject matter using the same set of tools and methods.

Libraries should: Create communities of practice around methods

I’m happy to see that you’re already doing this in W0rdlab and in Vitale II in fantastic ways here at Penn. On our own scale, we’re doing it as well. Over the last year or so, we’ve had a devoted group including librarians and students who get together to learn together. We call it Server Summer school, because it started in the summer, but at this point, it’s really more of a linked data study group. But through it, we’ve learned about maven, and docker, and we continue working through linked data experiments and tutorials. I’ve become completely converted to the power of a group of people choosing to learn hard things together. And I think that libraries should absolutely be a place for that.

I’m going to try to pull together these three areas of data-rich, multimodal, interdisciplinary community engaged scholarship by using the example of a project I’ve been involved in for the last few years, the Ticha project.

The team is led by Brook Lillehaugen, a linguist at Haverford, and includes Aaron Broadwell, another linguist whose at University of Florida, Michel Oudjik, an ethnohistorian at UNAM in Mexico City as well as some incredible students.

Ticha Arte Page

Brook, Aaron, and Michel study Zapotec, which is the third largest indigenous language family spoken in Mexico. The two larger language families are Nahuatl, which you might have heard called Aztec, and Mayan. Zapotec languages are currently spoken by hundreds of thousands of people in the state of Oaxaca, in Southern Mexico, and in communities where they’ve emigrated, notably Southern California. There’s a lot of discrimination targeting speakers of indigenous languages in Mexico, and Spanish is the language currently used for writing across Mexico. However, Zapotec was written in Colonial Mexico. It was written by Spanish missionaries in an effort to convert Zapotec people to Christianity, and it was used in administrative and religious documents. So the Ticha project aims to find and present documents written in Zapotec from Colonial Mexico, and to make those documents legible to a variety of audiences. The site is designed to be layered, so that visitors can view images of the documents, can read their transcriptions (and soon, we hope, they’ll be able to help in the transcription), and can see the linguistic analysis of the Zapotec, gaining a richer understanding of the language itself than you would be able to using translation alone. While not every text is available in every form, we’ve chosen to use a collection of technologies appropriate to each stage of the project, and tie them together loosely, rather than trying to build a giant, stable site all at once.

The site has been built by me and student developers, so by keeping it really modular, as we learn new techniques, we can substitute them in, and as our community of users grows and changes, we can make adjustments to the site to reflect the new needs. This means that the data behind the site: the images and their derivatives, and metadata, the transcriptions, the encoded versions of those transcriptions, and the text analyzed by linguists are all separate data sources entering the project independently, and we privilege making our work accessible at early stages over getting everything right all at once. This is a project from which I’ve learned an incredible amount. And in part I think I’ve learned so much because it has been an experiment the whole time, but one where we’ve kept the focus on using good and realistic practices, and on a strong commitment to the scholarly and community values of the project.

We’re nearly done. I’ve gone through three changes to academic scholarship, and talked about how I think libraries should engage with these changes. But before I stop talking and ask to hear from you, I want to point to a bunch of changes that are affecting academic institutions and scholarship that I haven’t talked about. First, I haven’t really spoken directly to open access, and I’d love to. I haven’t talked about stresses on the publishing industry or the host of pressures on higher education. I also think we need to frame our work in the context of social justice, in the face of this environmental emergency, in light of #blacklivesmatter and the huge refugee crisis going on in the world.

So my prescription for all of the changes in the first part and all of these bigger ones is this.

Be the library

The world has always been changing and there have always been huge, massive problems with it. And libraries have been around and changing with technologies for thousands of years. We get to work in an institution, the library, that is devoted to being an environment where learning comes first. Where people come to change course in their thinking, to develop new methods, to find things out. So, when thinking about the changes to academic scholarship that directly affect us, I think we should make sure that our work is iterative, collaborative, and that we focus on building human capacity while building technical capacity. Our primary goal should be to focus on our values, on learning, on enabling learning, on creating great scholarship, and on being great collaborators. Libraries grow together with scholarship and with the world in which we live.

So, basically I think the most important thing is not what things we should do, or what services we should offer, but what approach we should take to our work and to our communities. Thank you so much for your time, and I look forward to questions and conversations as we continue.