Corpus of Spoken Yiddish in Europe

A digital language archive sourced from Holocaust survivor testimonies

Discover the sounds of Yiddish

The Corpus of Spoken Yiddish in Europe (CSYE) is an Open Access digital language archive sourced from hundreds of video-recorded interviews with survivors of the Holocaust. The materials contained in the corpus are a testament to the social and linguistic diversity of Yiddish-speaking Jewish society and an invaluable resource for linguistic research, Yiddish language instruction, and Holocaust education and commemoration.

The CSYE is a multi-year project, developed with the support of grants and research fellowships. Upon its completion, the corpus will be the most extensive source of conversational Yiddish ever compiled.

Explore our resources

The CSYE consists of testimony interviews from the USC Shoah Foundation and time-aligned transcripts, both in the Yiddish alphabet and in transliteration. These materials are available free of charge to researchers, students and teachers, and the broader public, subject to our Terms of Use. Visitors to the corpus website can:

  • Discover interviews through the Testimonies Index
  • Explore materials by place of origin and dialect area on our interactive Map
  • Access metadata, video with subtitles (in both orthographies), and downloadable audio and transcripts in various formats

For more details, see our User Guide.

The corpus website also features a section called Glosses, with original articles that offer linguistic insights, survivor profiles, and pedagogical materials for Yiddish language learning based on CSYE materials.

Future developments

Additional digital resources to be made available in the CSYE include:

  • Transcripts with word- and phoneme-level alignments
  • A pronunciation dictionary and acoustic model for use with forced alignment software

The CSYE website is constantly expanding with new materials. Press Ctrl+D (Windows) or Cmd+D (Mac) to bookmark this page today.