Monographs

Medieval Manuscripts and the Computational Humanities : Big Data, Scribes, and the “Paris Bible.”
Arc Humanities Press, 2025.
This book examines the transformations in medieval studies–and the humanities more broadly–enabled by decades of digitization and advances in machine learning. Centring on the Paris Bible, a widely copied thirteenth- and fourteenth-century manuscript genre, we show how automatic transcription produces scribal data at a scale once inaccessible, and how automation can support new approaches to localizing, dating, and contextualizing manuscripts. We argue that applying artificial intelligence to medieval studies requires re-centring expert human intelligence within computational systems. Beyond this case study, the book models how digital medieval studies can rethink interpretation, highlighting both the promise and risks of machine learning in manuscript research, and uses bibliometrics to trace the field’s move toward co-authorship and the infrastructures needed for collaborative scholarship.
Peer-reviewed articles

Everyone Leaves a Trace: Exploring Transcriptions of Medieval Manuscripts with Computational Methods.
Digital Studies in Language and Literature, 2024.
The topic of this paper is a thirteenth-century manuscript from the French National Library (Paris, BnF français 24428) containing three popular texts: an encyclopedic work, a bestiary and a collection of animal fables. We have automatically transcribed the manuscript using a custom handwritten text recognition (HTR) model for old French. Rather than a content-based analysis of the manuscript’s transcription, we adapt quantitative methods normally used for authorship attribution and clustering to the analysis of scribal contribution in the manuscript. Furthermore, we explore the traces that are left when texts are copied, transcribed and/or edited, and the importance of that trace for computational textual analysis with orthographically unstable historical languages. We argue that the method of transcription is fundamental for being able to think about complex modes of authorship which are so important for understanding medieval textual transmission. The paper is inspired by trends in digital scholarship in the mid-2020s, such as public transcribe-a-thons in the GLAM (Galleries, Libraries, Archives and Museums) sector, the opening up of digitized archival collections with methods such as HTR, and computational textual analysis of the transcriptions.

Is Medieval Distant Viewing Possible? : Extending and Enriching Annotation of Legacy Image Collections using Visual Analytics.
Digital Scholarship in the Humanities, Volume 39, Issue 2, June 2024, Pages 638–656
Distant viewing approaches have typically used image datasets close to the contemporary image data used to train machine learning models. To work with images from other historical periods requires expert annotated data, and the quality of labels is crucial for the quality of results. Especially when working with cultural heritage collections that contain myriad uncertainties, annotating data, or re-annotating, legacy data is an arduous task. In this paper, we describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata. Since a manual reconciliation of the two legacy ontologies would be very expensive, we aim (1) to create a more uniform set of descriptive labels to serve as a “bridge” in the combined dataset, and (2) to establish a high-quality hierarchical classification that can be used as a valuable input for subsequent supervised machine learning. To achieve these goals, we developed visualization and interaction mechanisms, enabling medievalists to combine, regularize and extend the vocabulary used to describe these, and other cognate, image datasets. The visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata. Word and image embeddings as well as co-occurrences of labels across the datasets enable batch re-annotation of images, recommendation of label candidates, and support composing a hierarchical classification of labels.

Transcribing Medieval Manuscripts for Machine Learning.
Journal of Data Mining and Digital Humanities, 2024.
This article focuses on the transcription of medieval manuscripts. Whereas problems of transcription have long interested medievalists, few workable options in the era of printed editions were available besides normalisation. The automation of this process, known as handwritten text recognition (HTR), has made new kinds of digital text creation possible, but also has foregrounded the necessity of theorising transcription in our scholarly practices. We reflect here on different notions of transcription against the backdrop of changing text technologies. Moreover, drawing on our own research on medieval Latin Bibles, we present general guidelines for customizing transcription schemes, arguing that they must be designed with specific research questions and scholarly end use in mind. Since we are particularly interested in the scribal contribution to the production of codices, our transcription guidelines aim to capture abbreviations and orthographic variation between different textual witnesses for downstream machine learning tasks. In the final section of the article, we discuss a few examples of how the HTR-created transcriptions allow us to address new questions at scale in medieval manuscripts, such as textual variance across witnesses, the prediction of a change in scribal hands within a single manuscript as well as the profiling of individual and regional scribal characteristics.

Les manuscrits médiévaux occidentaux dans la collection du Louvre Abu Dhabi. 2009-2017.
Le manuscrit médiéval: texte, objet et outil de transmission. Volume I. Brepols: Pecia. Le livre et l’écrit. N°22, p. 105-153.
Between 2009 ‒ the year of the first acquisition ‒ and 2017, the year the museum opened to the public, the Louvre Abu Dhabi acquired twenty-one manuscripts or manuscript leaves. Among them are six western medieval manuscripts, representative of the religious, courteous, literary and scientific culture of the end of the Middle Ages, instruments of the dissemination of knowledge and faith. These objects having been previously kept in private hand and some of them being new or little known, this paper aims to make them accessible to the scientific community by presenting for each of them the codicological elements, the content, the decoration, their history as well as bibliographic references.

Forthcoming: Automatic Transcription and the Fragment: New Technologies for Situating Large Manuscript Traditions.
Proceedings of the conference From Fragments to Whole: Interpreting Medieval Manuscripts Fragments. Bristol University, UK, September 2021.
Editorially-reviewed articles

Creating New Audiences for Digital Objects Through Museum-University Collaboration.
Museums in the Middle East Journal.* Sharjah, UAE.
This article discusses the Paris Bible Project (PBP), an initiative that started during the pandemic between the Louvre Abu Dhabi (LAD) and New York University Abu Dhabi (NYUAD), with a professional relationship between researchers on Saadiyat Island, Abu Dhabi, turned virtual. This inter-institutional partnership has proven to be a vibrant scholarly exchange, with a growing list of participation in international conferences and papers. In this article, we argue that by studying digitized objects from the museum, not only do researchers break new ground in their fields of study, but the museum objects they study also gain new audiences.

Les souvenirs des Val-de-marnais collectés par les archives départementales.
Culture et Recherche, N°131, 2025, p. 42.