Chapter 4 - Through the Lens of Big Data: Toward a Macroanalysis of the USC Shoah Foundation's Visual History Archive

Containing more than fifty-five thousand individual testimonies and over seven million tables of metadata, the USC Shoah Foundation’s Visual History Archive (VHA) is the largest digital archive of Holocaust and genocide testimonies in the world. This chapter takes a macrohistorical view of the data and database related to the Holocaust and genocide testimonies in the VHA. It turns to the fundamental question of what makes a digital archive, database, information system, and interface ethical. The analysis moves between comparative, whole corpus visualizations and individual witness accounts in order to examine the changing content and narrative structures of testimonies. The chapter articulates the limits of literalist indexing systems and the need to develop methods of “saying” and “unsaying” the database to continually unleash new potentialities in testimony. At the same time, we raise questions about the “computational tractability” of testimony in the first place and what it means for the Holocaust to be a paradigm or template for developing digital archival systems for other genocides such as the testimonies recorded by the Armenian Film Foundation.

1. Time Stream of topics in 44,429 Shoah survivor testimonies

Developed by Lizhou Fan, David Shepard, and Todd Presner

The “time stream” visualization shows the occurrence of parent topics (the highest level indexing categories) over the course of 44,429 Jewish Holocaust survivor testimonies. Testimony percentile (narrative time represented as percentage for each testimony) is shown along the x-axis, and the 23 “parent topics” are shown along the y-axis relative to one another. Based on metadata provided by the USC Shoah Foundation’s Visual History Archive (2019).


2. Time Stream of topics in 67 Rwandan Genocide survivor testimonies

Developed by Lizhou Fan, David Shepard, and Todd Presner

Time Stream showing the occurrence of parent topics over the course of 67 Rwandan genocide survivor testimonies. Based on metadata provided by the USC Shoah Foundation’s Visual History Archive (2019).


3. Time Stream of topics in 246 Armenian Genocide survivor testimonies

Developed by Lizhou Fan, David Shepard, and Todd Presner

Time Stream showing the occurrence of parent topics over the course of 246 Armenian Genocide survivor testimonies. Based on metadata provided by the USC Shoah Foundation’s Visual History Archive (2019).


4. Time Stream of topics in 39 Nanjing Massacre survivor testimonies

Developed by Lizhou Fan, David Shepard, and Todd Presner

Time Stream showing the occurrence of parent topics over the course of 36 Nanjing Massacre survivor testimonies. Based on metadata provided by the USC Shoah Foundation’s Visual History Archive (2019).


5. Boder’s index keyed to the USC Shoah Foundation’s top level indexing categories

Developed by Lizhou Fan, Anna Bonazzi, and Todd Presner

This visualization shows the occurrence of the USC Shoah Foundation’s parent indexing terms throughout Boder’s testimonies. For this visualization, Boder’s original indexing terms were grouped using the parent categories later developed by the USC Shoah Foundation. We were interested in comparing testimonial narratives in order to examine changes in both the genre of Holocaust testimony and approaches to indexing.

Data source: David Boder, Topical Autobiographies (1957), UCLA Young Research Library Special Collections.


6. Time Stream of topics in Boder’s Topical Autobiographies

Developed by Lizhou Fan, David Shepard, and Todd Presner

Time Stream showing the occurrence of parent topics over the course of 70 testimonies from Boder’s Topical Autobiographies. Data sources: David Boder, Topical Autobiographies (1957), UCLA Young Research Library Special Collections; Voices of the Holocaust (Paul V. Galvin Library, Illinois Institute of Technology); and metadata provided by the USC Shoah Foundation’s Visual History Archive.


7. Five Time Streams: Cross-corpus Comparison

Developed by Lizhou Fan and Todd Presner

A comparison of the five Time Streams: Boder’s Topical Autobiographies and testimonies from the USC Shoah Foundation’s Holocaust (Jewish survivors), Rwandan Genocide, Armenian Genocide, and Nanjing Massacre collections (2019).

The visible differences in the shapes of the time streams are due to the different sizes of the five archives as well as variations in the internal order of the interview topics.

You can select specific topics in the “Term Name” menu on the right to compare their appearance and distribution across the five corpora (normalized to 100%, relative to their proportional frequency).

Visualizations are based on data provided by the USC Shoah Foundation’s Visual History Archive (2019); David Boder, Topical Autobiographies (1957), UCLA Young Research Library Special Collections; and Voices of the Holocaust (Paul V. Galvin Library, Illinois Institute of Technology).