CSTPR has closed May 31, 2020: Therefore, this webpage will no longer be updated. Individual projects are or may still be ongoing however. Please contact CIRES should you have any questions.
Ogmius Newsletter

New Data For Old Problems

by Justin Farrell, Assistant Professor of Sociology at Yale University and CIRES Sabbatical Fellow (2016-2017)

What should social scientific research look like in this so-called age of “big” data, where everything is connected, and seemingly everything is digitized? Here I want to briefly reflect on some of the promises of new data and research methods, and consider the ways that we might integrate these computational approaches with traditional qualitative fieldwork. My main claim is that while the Internet has certainly transformed the world, our methods for understanding and explaining social life have not kept pace.

We live our life in a huge connected network. We check emails, make cell phone calls, text our friends, swipe our credit cards, communicate on social media, post videos, send money, or purchase our goods. Almost every transaction is recorded digitally, as doctors create digital records of our health, stores log our buying patterns, and so on, and so forth. Until recently, these behaviors - such as a simple phone call or simple store purchase - were not easily traceable. These digital “breadcrumbs” were not gathered. There were no digital timestamps or digital text duplicates of a handwritten note, or a cash exchange. Of course, this raises ethical concerns about privacy, of which certainly need to be front and center as scholars working outside of the private sector figure out how to incorporate this data into research for the public good.

In addition to the things we use every day, such as cell phones, tablets, and computers, there is also a burgeoning “Internet of Things” that provides opportunities for data collection to inform social scientific study. Examples might include environmental monitoring commonly used in other fields, such as sensors for water quality, atmospheric and soil conditions, movements of wildlife, earthquake and tsunami sensors, gas and wind turbine sensors measuring efficiency and cleanliness of energy. All of these (can and should) be of use for social research. Or consider human health, such as heart monitors or movement monitors, all of which provide real-time streams of data and can be monitored and collected remotely. All of these types of data are much more accurate than conducting a survey to ask for self-reports.

On top of all of this new data that is created and recorded every day is the digitization of old information, such as books, newspapers, photographs, speeches, television programs, websites, and any other written or spoken word. For example, Google is currently archiving all books ever written. They write, “Our ultimate goal is to work with publishers and libraries to create a comprehensive, searchable, virtual card catalog of all books in all languages that helps users discover new books and publishers discover new readers.” Google has now scanned more than 25 million books, available to read, search, and analyze.

Or consider the Internet Archive, where you can search this history of more than 286 billion historical web pages (!!!), 3.3 million movies, or 200 terabytes of government material. Still more, consider the HathiTrust, a large-scale collaboration between dozens of universities and libraries, who has archived tens of millions of books and articles that are all full-text searchable.

This flood of new data is exciting, and must be taken advantage of by folks in academia. Our methods training must adapt—especially to include text analysis and network analysis—not because of an obsession with the shiny new objects, or because it is trendy, but because it is our responsibility as researchers to use the best data available in service of our research questions, theories, and applied solutions.

To conclude, I want to provide a few concrete examples. The first is a study I conducted to map out in great detail, and at full-scale, the climate change counter-movement. Drawing on some of the sources described above, I collected every text ever written from every climate contrarian organization (more than 39 million words), as well as mapping out the entire social network of organization and individuals with ties to the movement. You can find links to the papers here.

In the end, we must use all the tools at our disposal in order to continue to move forward to creatively address the problems at the intersection of society, politics, and environmental science.