<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.2" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">2371-4549</journal-id>
<journal-title-group>
<journal-title>Journal of Cultural Analytics</journal-title>
</journal-title-group>
<issn pub-type="epub">2371-4549</issn>
<publisher>
<publisher-name>Center for Digital Humanities</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.22148/jca.1097</article-id>
<article-categories>
<subj-group>
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Bibliographic Metadata as Relational Data: A Cross-Disciplinary Methodological Reflection</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0003-4303-6874</contrib-id>
<name>
<surname>Scebba</surname>
<given-names>Rossana</given-names>
</name>
<email>rossana.scebba@kuleuven.be</email>
<email>rossana.scebba@uclouvain.be</email>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>IRES/LIDAM, Universit&#233; Catholique de Louvain and Research Unit of Early Modern History, Katholieke Universiteit Leuven</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2026-04-25">
<day>25</day>
<month>04</month>
<year>2026</year>
</pub-date>
<pub-date pub-type="collection">
<year>2026</year>
</pub-date>
<volume>11</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>34</lpage>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2026 The Author(s)</copyright-statement>
<copyright-year>2026</copyright-year>
<license license-type="open-access" xlink:href="http://creativecommons.org/licenses/by/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See <uri xlink:href="http://creativecommons.org/licenses/by/4.0/">http://creativecommons.org/licenses/by/4.0/</uri>.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://culturalanalytics.org/articles/10.22148/jca.1097/"/>
<abstract>
<p>In this paper, I reflect on the growing cross-disciplinary convergence of the use of bibliographic metadata as empirical material in digital humanities and computational history as well as in quantitative economic history. I discuss the main challenges involved in preprocessing metadata from library catalogs, comparing techniques from the abovementioned fields. A case study based on the <italic>Collectio academica antiqua</italic> of the Old University of Louvain serves to demonstrate how its metadata can be parsed, disambiguated, and reorganized into relational format, that is, into linked tabular datasets describing different aspects of the collection&#8217;s records. Building on this foundation, I examine the use of bibliographic data as empirical material in historical network analysis, with particular attention to the assumptions underlying co-occurrence representations. I show that role-differentiated participation encoded in the original metadata is not naturally accommodated by flat network projections, and I argue that a multilayer network representation provides a coherent way to preserve this heterogeneity. I use this to assess how differences in participation across roles relate to the institutional context in which publications were produced. I also highlight both the analytical limits of bibliographic metadata when used in isolation and the gains that arise when relational representations derived from catalog data are integrated with complementary sources.</p>
</abstract>
</article-meta>
</front>
<body>
<sec>
<title>Section 1: Introduction</title>
<p>Traditionally compiled by librarians for purposes of bibliographic control and collection management, metadata from union catalogues and curated library collections are increasingly being repurposed as empirical material in historical research both in the fields of digital humanities and computational history as well as in the field of quantitative social sciences. Digital humanities scholars have been at the forefront of this reappropriation. Building on the framework introduced by <xref ref-type="bibr" rid="B30">Lahti et al., &#8220;Bibliographic Data Science and the History of the Book,&#8221;</xref> who coined the term <italic>bibliographic data science</italic> to describe the systematic use of catalog metadata for historical inquiry, a growing body of work has shown how information encoded in title pages, imprint statements, dedications, approbations, and physical descriptions can support large-scale analyses of intellectual and book history. Key contributions include studies drawing on large-scale book catalogs to examine cultural production, canon formation, book pricing and the circulation of classical and vernacular texts (see <xref ref-type="bibr" rid="B31">Lahti et al., &#8220;A Quantitative Study of History in the English Short-Title Catalogue,&#8221;</xref> <xref ref-type="bibr" rid="B48">Tolonen et al., &#8220;A Quantitative Approach to Book-Printing in Sweden and Finland,&#8221;</xref> <xref ref-type="bibr" rid="B49">Tolonen et al., &#8220;Examining the Early Modern Canon,&#8221;</xref> <xref ref-type="bibr" rid="B24">Hill et al.</xref>, &#8220;Reconstructing Intellectual Networks,&#8221; <xref ref-type="bibr" rid="B47">Tiihonen and Tolonen</xref>, <xref ref-type="bibr" rid="B17">Fantoli et al.</xref>), as well as network analyses of print culture that trace influence through dedications, printers&#8217; relationships, and publishing communities (<xref ref-type="bibr" rid="B18">Gavin</xref>, <xref ref-type="bibr" rid="B24">Hill et al., &#8220;Communication and Idea Transmission Across Historical Communities,&#8221;</xref> <xref ref-type="bibr" rid="B29">Ladd</xref>, <xref ref-type="bibr" rid="B22">Greteman</xref>, <xref ref-type="bibr" rid="B20">Gittel</xref>, <xref ref-type="bibr" rid="B50">Valleriani et al.</xref>, <xref ref-type="bibr" rid="B41">Ryan and Tolonen, &#8220;Networks of Influence in Scottish Enlightenment Publishing,&#8221;</xref> <xref ref-type="bibr" rid="B42">Ryan and Tolonen, &#8220;The Evolution of Scottish Enlightenment Publishing,&#8221;</xref> <xref ref-type="bibr" rid="B23">He&#223;br&#252;ggen-Walter</xref>). A parallel strand of scholarship draws on metadata of epistolary sources, as in <xref ref-type="bibr" rid="B26">Hotson and Wallnig</xref> on the Republic of Letters or <xref ref-type="bibr" rid="B40">Roller, &#8220;Tracing the Footsteps of Ideas&#8221;</xref> on the circulation of Reformation ideas through correspondence.</p>
<p>In parallel, researchers in the social sciences&#8212;particularly, though not exclusively, applied economic historians&#8212;have also begun to employ bibliographic data to address topics such as the decline of Islamic science and preindustrial city growth (<xref ref-type="bibr" rid="B10">Chaney</xref>, &#8220;Religion and the Rise and Fall of Islamic Science,&#8221; <xref ref-type="bibr" rid="B9">Chaney</xref>, &#8220;Modern Library Holdings and Historic City Growth&#8221;), teacher-directed scientific progress in early modern England (<xref ref-type="bibr" rid="B28">Koschnick</xref>), the diffusion of ideas through railroad networks in nineteenth-century German-speaking regions (<xref ref-type="bibr" rid="B11">Chiopris</xref>), the relationship between limited academic career prospect and the rise of dissenting religious print (<xref ref-type="bibr" rid="B14">de Pleijt and Koschnick</xref>), and the role of the Republic of Letters in Britain&#8217;s Industrial Revolution take-off (<xref ref-type="bibr" rid="B8">Cervellati et al.</xref>) This convergence around the same materials reveals both the expanding research potential of bibliographic data and the persistence of distinct disciplinary conventions in how these data are employed. The digital humanities have often developed without full integration with the quantitative social sciences (<xref ref-type="bibr" rid="B33">Lemercier 273</xref>), which helps explain why the two traditions, despite similar empirical aims, have adopted different, though occasionally overlapping, methodological conventions.</p>
<p>The way the very same bibliographic data are mobilized in the two literatures reveals this divergence most clearly. While not all digital humanities or computational history projects using these materials adopt network methods, the network perspective remains one of the most common modelling frameworks paired with bibliographic metadata in the field. Although catalog data do not inherently encode relationships, they are often interpreted as networked systems by treating the co-occurrence of agents within the same publication event as evidence of collaboration or shared involvement. This approach has been productive for mapping and exploration, though its analytical development remains uneven. In this perspective, skepticism toward digital approaches among historians and humanists trained in textual or archival traditions is not without foundation (<xref ref-type="bibr" rid="B21">Gregory 2</xref>). Yet this limitation is increasingly being addressed more broadly, with <xref ref-type="bibr" rid="B39">Roller&#8217;s &#8220;Theory-Driven Statistics for the Digital Humanities: Presenting Pitfalls and a Practical Guide by the Example of the Reformation,&#8221;</xref> insisting that quantitative methods be tied to explicit research questions and historiographical theories that establish testable relationships between measurable concepts of interest.</p>
<p>By contrast, in quantitative economic history, recent trends shaped in part by methodological norms associated with the so-called &#8220;credibility revolution&#8221; (<xref ref-type="bibr" rid="B1">Angrist and Pischke</xref>) have reinforced expectations about explicit research design and the use of linked datasets (<xref ref-type="bibr" rid="B7">Cantoni and Yuchtman 216</xref>). Within this framework, bibliographic data are routinely employed in research designs that integrate multiple data sources to produce theoretically grounded interpretations of historical phenomena. Economic history thus offers a useful point of comparison for digital humanities and computational history because it faces similar preprocessing and interpretative challenges and shows how formal analytical modeling can expand the scope of the same bibliographic metadata corpora.</p>
<p>The paper builds on this cross-disciplinary comparison by examining how metadata from library catalogs can be de-structured, parsed, and reorganized in relational form and how the resulting data can support analytical modeling within a network framework. In doing so, it responds to calls for closer alignment between quantitative analysis and explicit theoretical frameworks (<xref ref-type="bibr" rid="B39">Roller, &#8220;Theory-Driven Statistics&#8221;</xref>). Related work in the digital humanities has pursued more explicitly analytical modeling strategies, including multimodal combinations of network methods and text analysis (<xref ref-type="bibr" rid="B24">Hill et al., &#8220;Communication and Idea Transmission Across Historical Communities&#8221;</xref>) and temporal modeling approaches (<xref ref-type="bibr" rid="B40">Roller, &#8220;Tracing the Footsteps of Ideas&#8221;</xref>). More broadly, the present study aligns with the commitment articulated by <xref ref-type="bibr" rid="B32">Lahti et al., &#8220;Best Practices in Bibliographic Data Science,&#8221;</xref> to treat bibliographic metadata as a substantive source for historical inquiry.</p>
<p>Against this background, this paper is guided by three related research questions. First, how can bibliographic catalog records, originally designed for item-level description, be transformed into relational data structures suitable for longitudinal and network-based analysis? Second, what modeling assumptions are implicitly introduced when bibliographic metadata are interpreted as co-occurrence networks, and how do these assumptions shape the interpretation of relationships inferred from catalog data? Third, what analytical leverage is gained by preserving role-differentiated participation through a multilayer network representation compared with flat co-occurrence projections commonly used in digital history?</p>
<p>To address these questions, I focus on the metadata of the <italic>Collectio academica antiqua</italic>, the cultural heritage collection of the Old University of Louvain (1425&#8211;1797), and I examine how relational modeling choices affect the identification of systematic patterns in academic print production. The central claim of the paper is that bibliographic data encode role-differentiated forms of participation that are not naturally accommodated by standard, flat co-occurrence network representations. When such data are modeled as such, distinct forms of involvement in book production are implicitly collapsed. I argue that, if one adopts a network perspective, a multilayer representation offers the most coherent way, within a co-occurrence framework, to preserve this role differentiation instead of flattening it into an undifferentiated relation. This modeling choice is justified because it preserves distinctions encoded in the original sources and because these distinctions might correspond to historically meaningful differences in modes of participation to book production. I use my case study of the old academic collection of Louvain to address whether holding multiple production roles is associated with a higher likelihood that a publication involves university professors.</p>
<p>This analysis also makes clear that recovering features of Louvain&#8217;s academic publishing ecosystem requires integrating bibliographic metadata with external sources beyond the catalog itself. However rich, bibliographic data <italic>alone</italic> offer limited leverage for addressing complex historical questions. Their analytical potential is substantially expanded when they are linked to complementary sources, such as prosopographical&#8212;the study of background characteristics&#8212;or institutional datasets. The comparison with quantitative economic history reinforces this point: In that field, bibliographic data typically function as one component within a broader data infrastructure combining multiple sources.</p>
<p>The paper proceeds in five sections. Section 2 outlines the structure of metadata from union catalogs and library collections and discusses how key bibliographic elements can be identified, extracted, and transformed into a relation format. Section 3 details the conceptual and practical steps involved in preprocessing the resulting data for systematic quantitative analysis. Section 4 examines the assumptions involved in interpreting catalog metadata as network data and shows how a multilayer co-occurrence representation allows role-differentiated participation to be modeled and combined with external sources for analytical use. Section 5 concludes.</p>
</sec>
<sec>
<title>Section 2: What is in a Catalog? Anatomy of Bibliographic Metadata</title>
<p>Digital library catalogs are curated databases describing and indexing the holdings of one or more libraries. The structured information stored in these catalogs is known as bibliographic metadata. Metadata refer to attributes about catalog entries, rather than their full content, and support organization, classification, and discovery. For printed book collections, such metadata include elements such as title, publication date and place, names of agents involved in the creation and distribution, the language, and the physical description of each holding.</p>
<p>Internally, library catalogs rely on two complementary layers: an encoding format and a content standard. The encoding determines how metadata is structured for storage and exchange, whereas the content standard governs what information is recorded and how it is expressed. Most libraries use MARC (MAchine-Readable Cataloging) as their encoding format. Developed at the Library of Congress by <xref ref-type="bibr" rid="B3">Henriette Avram</xref> and later formalized as MARC 21, this schema makes cataloging data legible to computers across libraries and associates text strings with numbered fields that are conventionally linked to specific bibliographic properties. Despite its limitations and its limited alignment with linked-data infrastructures (<xref ref-type="bibr" rid="B46">Tennant</xref>), MARC 21 continues to underpin most catalog systems. Even when library records are exported to relational tables or Extensible Markup Language (XML) formats such as MARCXML or Metadata Object Description Schema (MODS), they retain the underlying MARC 21 structure. Researchers must therefore parse fields, indicators, and subfields according to MARC conventions. Tools such as pymarc assist with extraction and conversion but still require familiarity with MARC 21 logic.<xref ref-type="fn" rid="n1">1</xref></p>
<p>While encoding formats define <italic>where</italic> information is stored, content standards determine <italic>what</italic> information is entered and how consistently. These standards shape decisions such as how titles are transcribed, how imprints are formatted, and how names, roles, and subject information are recorded. General collections typically follow Resource Description and Access (RDA), rare book collections often rely on more specialized descriptive standards such as Descriptive Cataloging of Rare Materials (DCRM), and archival holdings use standards such as Describing Archives: A Content Standard (DACS). As a result, records encoded in the same MARC 21 infrastructure may display distinct descriptive logics depending on the content standard applied. In other words, it is only in light of the cataloging conventions that one can interpret which MARC fields are actually available for analysis.</p>
<p>I work with the <italic>Collectio academica antiqua</italic>, a curated collection of early modern printed works associated with the Old University of Louvain, the first university founded in the historic Low Countries, in the Brabant region (present-day Belgium) in 1425, from which the contemporary sister universities of Katholieke Universiteit Leuven (KU Leuven) and Universit&#233; Catholique de Louvain (UCLouvain) descend, following its abolition in 1797.<xref ref-type="fn" rid="n2">2</xref> The collection gathers works linked to the university, its affiliated scholars, its illustrious alumni, and its institutional history. Although printing activity in Louvain is attested from 1473 onward, incunabula at the KU Leuven Libraries are held in a separate preservation collection, so the <italic>Collectio academica antiqua</italic> begins artificially in 1501 and so does not reflect the chronology of local printing (<xref ref-type="bibr" rid="B44">Scebba and Fantoli</xref>). The catalog records of the <italic>Collectio academica antiqua</italic> are encoded in MARC 21 and follow the RDA-based cataloging standard used at the KU Leuven Libraries, supplemented by internal guidelines. For early printed materials, catalogers also draw on the Short Title Catalogue Vlaanderen (STCV) rule set, which provides descriptive guidance for rare books (i.e., books printed before 1801).</p>
<p><xref ref-type="table" rid="T1">Table 1</xref> summarizes the MARC 21 fields most commonly encountered in early modern and rare book contexts. It generalizes from the data structure of the <italic>Collectio academica antiqua</italic> and from comparable early modern catalogs. Whether these fields become analytically useful depends on the research question as well as on the content standards and cataloging practices that shaped the metadata. This dependence is evident when early modern catalogs are compared with contemporary circulating collections. For example, <xref ref-type="bibr" rid="B37">Petras et al.</xref> use metadata from modern library collections to construct time-period directories that allow users to navigate holdings by historical era and place. Their approach relies on subject strings containing explicit chronological and geographic subdivisions, which are common in contemporary catalogs but generally absent from early modern collections such as the <italic>Collectio academica antiqua</italic>.</p>
<table-wrap id="T1">
<label>Table 1</label>
<caption>
<p>The table lists selected MARC 21 fields with their Library of Congress definitions and brief content description.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Tag</bold></td>
<td align="left" valign="top"><bold>LoC Definition</bold></td>
<td align="left" valign="top"><bold>Content</bold></td>
</tr>
<tr>
<td align="left" valign="top" colspan="3"><bold>Agents</bold></td>
</tr>
<tr>
<td align="left" valign="top">100</td>
<td align="left" valign="top">Main Entry&#8211;Personal Name</td>
<td align="left" valign="top">Personal name primarily responsible for the work (e.g., author)</td>
</tr>
<tr>
<td align="left" valign="top">110</td>
<td align="left" valign="top">Main Entry&#8211;Corporate Name</td>
<td align="left" valign="top">Institutional body primarily responsible for the work (e.g., university press)</td>
</tr>
<tr>
<td align="left" valign="top">600</td>
<td align="left" valign="top">Subject Added Entry&#8211;Personal Name</td>
<td align="left" valign="top">Person discussed or referenced in the work (e.g., biography subject)</td>
</tr>
<tr>
<td align="left" valign="top">610</td>
<td align="left" valign="top">Subject Added Entry&#8211;Corporate Name</td>
<td align="left" valign="top">Corporate entity discussed or referenced (e.g., religious order, university)</td>
</tr>
<tr>
<td align="left" valign="top">700</td>
<td align="left" valign="top">Added Entry&#8211;Personal Name</td>
<td align="left" valign="top">Additional individual associated with the work (e.g., editor, translator)</td>
</tr>
<tr>
<td align="left" valign="top">710</td>
<td align="left" valign="top">Added Entry&#8211;Corporate Name</td>
<td align="left" valign="top">Additional institutional body involved (e.g., publishers, sponsors)</td>
</tr>
<tr>
<td align="left" valign="top" colspan="3"><bold>Content</bold></td>
</tr>
<tr>
<td align="left" valign="top">245</td>
<td align="left" valign="top">Title Statement</td>
<td align="left" valign="top">Full title of the work, including subtitles</td>
</tr>
<tr>
<td align="left" valign="top">650</td>
<td align="left" valign="top">Subject Added Entry&#8211;Topical Term</td>
<td align="left" valign="top">Keywords or subject headings describing the topic</td>
</tr>
<tr>
<td align="left" valign="top">655</td>
<td align="left" valign="top">Index Term&#8211;Genre/Form</td>
<td align="left" valign="top">Material type or genre designation (e.g., ephemera, pamphlets, and other genre or form designations)</td>
</tr>
<tr>
<td align="left" valign="top" colspan="3"><bold>Paratext and imprint</bold></td>
</tr>
<tr>
<td align="left" valign="top">500</td>
<td align="left" valign="top">General Note</td>
<td align="left" valign="top">Miscellaneous notes, often paratextual (e.g., dedications, colophons)</td>
</tr>
<tr>
<td align="left" valign="top">260/264</td>
<td align="left" valign="top">Publication, Distribution, etc. (Imprint)</td>
<td align="left" valign="top">Publisher information, place, and date of publication</td>
</tr>
<tr>
<td align="left" valign="top">041</td>
<td align="left" valign="top">Language Code</td>
<td align="left" valign="top">Language(s) in which the work is written</td>
</tr>
<tr>
<td align="left" valign="top" colspan="3"><bold>Access information</bold></td>
</tr>
<tr>
<td align="left" valign="top">852</td>
<td align="left" valign="top">Location</td>
<td align="left" valign="top">Physical location and call number of the holding institution</td>
</tr>
<tr>
<td align="left" valign="top">856</td>
<td align="left" valign="top">Electronic Location and Access</td>
<td align="left" valign="top">URL or link to digital version or online resource</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>This closer examination of the technical structure of catalog data underscores that working with bibliographic metadata requires an explicit understanding of their internal organization and their transformation into relational form. It also makes immediately clear that the extracted data often require harmonization. Because cataloging practices prioritize item-level description over cross-record consistency, records frequently lack internal coherence, particularly in the identification of persons, corporate entities, and place names (<xref ref-type="bibr" rid="B36">Padilla 20</xref>). Unless a catalog relies on robust authority control or controlled vocabularies, these elements tend to appear in multiple, nonaligned forms. As a result, they must be interpreted, cleaned, and restructured before they can serve as input for empirical modeling. The next section outlines the conceptual and practical decisions involved in converting catalog records into harmonized relational data.</p>
</sec>
<sec>
<title>Section 3: From Structured to Relational Data</title>
<p>Transforming catalog metadata into a form suitable for quantitative modeling involves a series of preprocessing steps. This stage raises questions that are increasingly shared across disciplines. Scholars in both digital humanities and social sciences working with historical data are confronting the same challenges: how to identify and reconcile references to historical persons, places, and subjects across historical data, including bibliographic metadata. Digital humanists have long grappled with these issues in prosopographies, bibliographies, and textual corpora (<xref ref-type="bibr" rid="B16">Ehrmann et al.</xref>), whereas social scientists are more recently engaging with similar problems (<xref ref-type="bibr" rid="B2">Arora et al.</xref>). Both traditions are converging toward workflows that combine domain knowledge with scalable, semi-automated tools in order to prepare historical data for analysis. In this context, extracting and parsing structured bibliographic metadata is a crucial step because converting MARC&#8217;s heterogeneous field structure into an analytically usable form requires normalizing it into relational data, with entities separated into tables and linked through keys. In what follows, I show how these workflows unfold in practice, using the <italic>Collectio academica antiqua</italic> as a case study across three key stages: actors identification, spatial referencing of place names, and content classification.</p>
<sec>
<title>Identifying Historical Actors: Disambiguation and Authority Alignment</title>
<p>Catalogs often record the names of both personal and corporate entities associated with a given holding, in line with the MARC 21 schema.<xref ref-type="fn" rid="n3">3</xref> These entries may include additional identifying information such as life dates or floruit, numeration, pseudonyms, religious affiliation or noble titles, and places of activity, as well as the role under which the individual appears in the publication. One of the most challenging steps in working with bibliographic metadata is the identification and disambiguation of historical actors based on these name strings. This difficulty arises from variant spellings, inconsistent orthography, and uneven role attribution across entries. These challenges may be more or less pronounced depending on the uniformity of cataloging conventions and the degree of authority control implemented, and this may apply to curated collections and union catalogs alike. To construct meaningful relational data, it is essential to reconcile these inconsistencies and unambiguously identify historical individuals. The disambiguation step ensures that relationships in the data are constructed around unified entities instead of fragmented or duplicated name strings. This process relies on accompanying biographical attributes to distinguish homonyms and resolve pseudonyms. In the case of homonyms, these additional attributes can help differentiate distinct individuals with similar names, while in the case of pseudonyms or variant forms referring to the same person, shared dates or roles may allow for accurate consolidation even when string similarity alone is insufficient. Ideally, at the end of this reconciliation process, one should be able to assign a persistent identifier to each person or corporate entity.</p>
<p>In practice, two main strategies are typically employed: a bottom-up approach, based on string similarity and clustering, or a top-down strategy that aligns name strings with external authority files. Both techniques can be employed at different stages of the workflow and often both might benefit from a semi-automated approach. For instance, fuzzy string matching and clustering algorithms can be used to group likely name variants, which are then manually reviewed for validation. Similarly, a name search can be launched against a selected external authority file to identify potential matches, but the final assignment of identifiers should remain under human supervision.</p>
<p>In my pilot study using the <italic>Collectio academica antiqua</italic>, I implemented a structured pipeline to extract, standardize, and cluster personal and corporate names from MARC 21 records. I began by identifying the relevant MARC 21 tags containing names of interest, then used pymarc to extract their content into a flat, tabular format while retaining the original tag associated with each entry. I grouped all name strings from subfield $a into a single column, regardless of the MARC field they originated from. I applied the same principle to other associated attributes&#8211;such as numeration, life dates, and roles&#8211;by creating standardized columns based on subfield content, across different MARC tags.</p>
<p>At this stage, I cleaned the strings to ensure consistency: I removed leading and trailing spaces, standardized the formatting of numeration and dates, and eliminated unnecessary punctuation. While reviewing the grouped values, I also checked for misplacements (e.g., numeration incorrectly stored in a name subfield) and reassigned them to the appropriate column when necessary.</p>
<p>I then processed the resulting dataframe in OpenRefine, which allows for semi-automated clustering with high user oversight.<xref ref-type="fn" rid="n4">4</xref> I applied different clustering techniques and manually evaluated the suggested matches. I recommend working on a duplicate column and proceeding in successive passes: first clustering on the combination of surname, given name, numeration, and parsed life dates (possibly formatted as YYYY&#8211;YYYY or as YYYY in case only one date is available); then reapplying clustering without the dates; and lastly filtering by common life dates to identify remaining variants that earlier steps may have missed. After this bottom-up procedure, I adopted a top-down approach by matching the clustered names against an external authority file. Given that my case study mainly involved early modern persons active in Europe, the Consortium of European Research Libraries (CERL) Thesaurus proved particularly useful.<xref ref-type="fn" rid="n5">5</xref> I relied on the dedicated Python library, cerl, which is a wrapper for the CERL Thesaurus API.<xref ref-type="fn" rid="n6">6</xref> However, other reconciliation workflows are equally viable. Many thesauri, including Virtual International Authority File (VIAF), Gemeinsame Normdatai (GND), and Wikidata, offer dedicated reconciliation services within OpenRefine or provide public APIs, which allow users to build custom pipelines for automated queries. Crucially, querying authority files often returns more than one potential match per name. These results must be carefully reviewed and manually approved, ideally by inspecting supporting metadata such as life dates, places of activity, or roles. This step requiring human validation is essential to ensure accurate reconciliation, especially when dealing with common names or minimal contextual information.</p>
<p>External authority files offer three main advantages. First, they often have already resolved duplications across records by grouping known name variants. Second, they enrich the dataset with additional information such as birthplaces, deathplaces, or institutional affiliations, which can be repurposed in later stages of our own analysis. Third, if the metadata needs to be merged with other datasets, records reconciled against the same thesaurus can be matched more reliably and seamlessly by relying on the same identifier. Of course, authority files come with some limitations, namely, they are inevitably incomplete, and many lesser-known historical figures remain unlisted. In such cases, these individuals will likely be unmatched in the thesaurus. Still, the majority of complex name variation tends to cluster around well-documented individuals, precisely those who are more likely to appear in authority files. In practice, researchers may need to consult multiple thesauri. To bring consistency to the resulting patchwork of matches, it becomes essential to generate a consistent internal identifier system, one that accounts for entities matched across authority files, those identified through clustering, and those that remain unmatched.</p>
<p>In the context of my pilot study on the <italic>Collectio academica antiqua</italic>, the disambiguation process began with 5,018 name strings extracted from MARC 21 records. I first applied a bottom-up procedure using OpenRefine&#8217;s clustering functions, combined with biographical information such as life dates and numeration. I then adopted a top-down approach by matching these entities to external authority files. I ended up identifying a total of 4,192 distinct individuals. Among these identified individuals, 47.5% (2,033) were matched to the CERL Thesaurus and a further 5% (213) to other authority sources such as Wikidata or Onderzoekssteunpunt en Databank Intermediaire Structuren (ODIS). The remaining 2,032 (47.5%) had no external match and were retained as internally disambiguated entities.</p>
<p>Disambiguation remains one of the most time-consuming steps in preparing bibliographic metadata, yet it is essential. Without it, individual identities would not be consistently established, and relationships would rest on unreliable associations. Crucially, it is what gives the data meaning for relational study. The process can also create synergies: matching entities to authority files often yields additional information that proves useful even beyond the immediate project.</p>
</sec>
<sec>
<title>Geolocating Place Names</title>
<p>Bibliographic metadata is typically rich in geographic information because it often records the place of publication, manufacture, and distribution for each library holding. As with personal and corporate names, place names are typically transcribed as they appear on the title page. This practice, once again, prioritizes fidelity to the original source over internal consistency of the catalog. As a result, the same city may appear under multiple variants. These may reflect different languages, historical spellings, political jurisdictions, or typographical inconsistencies. To make this information usable for relational analysis, place names must be standardized and geolocated. This task is not fundamentally different from disambiguating historical actors, but it is often somewhat easier due to the more stable nature of place identities. Nonetheless, caution is essential. Even with present-day place names, it is surprisingly easy to misidentify a location because many places can share the same name within or across countries. This issue may be less common for cities that are on average better known and thus more consistently recorded, like printing centers. Still, historical geolocation poses a challenge. The process often requires close attention to historical context and supporting metadata.</p>
<p>For harmonizing place names, best practice is to reconcile each string against a standardized entry in a historical urban gazetteer or geographic authority file. These resources contain a wide range of name variants, including multilingual and obsolete forms, which significantly facilitate disambiguation and geolocation. Among the most comprehensive options are Wikidata, GeoNames, the CERL Thesaurus, the RBMS/BSC Latin Place Names File, and the World Historical Gazetteer, all of which are particularly well suited to historical case studies.<xref ref-type="fn" rid="n7">7</xref> In my pilot study of Louvain&#8217;s <italic>Collectio academica antiqua</italic>, I processed 631 valid place-name strings, reconciled into 166 distinct locations. <xref ref-type="table" rid="T2">Table 2</xref> reports, for the ten most frequent printing locations, the number of distinct place-name variants and the corresponding number of imprints.</p>
<table-wrap id="T2">
<label>Table 2</label>
<caption>
<p>Top ten unified locations by number of imprints.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Place name</bold></td>
<td align="left" valign="top"><bold>Number of variants</bold></td>
<td align="left" valign="top"><bold>Number of imprints</bold></td>
</tr>
<tr>
<td align="left" valign="top">Louvain</td>
<td align="left" valign="top">64</td>
<td align="left" valign="top">1,143</td>
</tr>
<tr>
<td align="left" valign="top">Lyon</td>
<td align="left" valign="top">18</td>
<td align="left" valign="top">129</td>
</tr>
<tr>
<td align="left" valign="top">Antwerp</td>
<td align="left" valign="top">68</td>
<td align="left" valign="top">493</td>
</tr>
<tr>
<td align="left" valign="top">Ingolstadt</td>
<td align="left" valign="top">5</td>
<td align="left" valign="top">9</td>
</tr>
<tr>
<td align="left" valign="top">Mainz</td>
<td align="left" valign="top">5</td>
<td align="left" valign="top">17</td>
</tr>
<tr>
<td align="left" valign="top">Amsterdam</td>
<td align="left" valign="top">14</td>
<td align="left" valign="top">62</td>
</tr>
<tr>
<td align="left" valign="top">Brussels</td>
<td align="left" valign="top">30</td>
<td align="left" valign="top">176</td>
</tr>
<tr>
<td align="left" valign="top">Hanover</td>
<td align="left" valign="top">1</td>
<td align="left" valign="top">11</td>
</tr>
<tr>
<td align="left" valign="top">Rome</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">31</td>
</tr>
<tr>
<td align="left" valign="top">Cologne</td>
<td align="left" valign="top">16</td>
<td align="left" valign="top">215</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>After harmonization, locations can be geographically referenced. Although coordinates typically refer to a single point rather than the full spatial extent of a city or region, they still enable meaningful spatial representation. For example, in <xref ref-type="bibr" rid="B45">Schich et al.</xref> and <xref ref-type="bibr" rid="B13">de la Croix and Scebba</xref>, coordinates are used to visualize city attractiveness based on the mobility of, respectively, notable individuals and medieval and pre-modern scholars affiliated to European academic institutions.</p>
<p>Once coordinates are available, they can be plotted on vectorized shapefiles or digital atlases, which represent geopolitical or cultural regions as spatial polygons. In historical analyses, it is particularly important to work with maps that reflect time-consistent boundaries. Using modern administrative boundaries for early periods can misrepresent political geography and bias spatial patterns. A wide array of historical vectorized digital atlases or shapefiles are available to support this kind of analysis. Both computational humanists and applied economic historians are increasingly contributing projects in this direction. However, the limited integration between disciplines means that humanists are often unaware of the spatial datasets used by economists, and vice versa. These resources may vary in geographic coverage and temporal resolution, where some provide century-scale snapshots, while others offer yearly updates with shifting polity boundaries. Below, I present a selection of the most commonly encountered and widely used resources across computational humanities and quantitative economic history.</p>
<list list-type="bullet">
<list-item><p>The Centennia Historical Atlas: Academic Research Edition offers high-resolution, time-sensitive boundaries of European polities between the late Middle Ages and the nineteenth century (<xref ref-type="bibr" rid="B38">Reed</xref>).</p></list-item>
<list-item><p>The Cliopatria database records overlapping and successor polities from 3400 BCE to the present (<xref ref-type="bibr" rid="B4">Bennett et al.</xref>).</p></list-item>
<list-item><p>EurAtlas provides digital cartographic reconstructions of European boundaries spanning two millennia.<xref ref-type="fn" rid="n8">8</xref></p></list-item>
<list-item><p>Developed by economist <xref ref-type="bibr" rid="B19">Victor Gay</xref> and hosted by Harvard Dataverse, the Third Republic France Geographic Information System (TRF-GIS) maps French administrative constituencies spanning the period 1870&#8211;1940.<xref ref-type="fn" rid="n9">9</xref></p></list-item>
<list-item><p>The IPUMS Mosaic project, as the name suggests, provides a patchwork of harmonized historical census microdata and corresponding geographic boundary files for selected countries in the European area at different points in time.<xref ref-type="fn" rid="n10">10</xref></p></list-item>
<list-item><p>The Historical GIS Collection at ETH Zurich, compiled by computational sociologist Ramona Roller, supplies static polygon boundaries for territories within the sixteenth-century Holy Roman Empire, along with attributes such as foundation dates and confessional status.<xref ref-type="fn" rid="n11">11</xref></p></list-item>
<list-item><p>The project (Re)counting the Uncounted, led by historian and digital humanist Rombert Stapel, produced the Historical Atlas of the Low Countries, a detailed GIS dataset designed to anchor premodern population censuses to historically accurate administrative units.<xref ref-type="fn" rid="n12">12</xref></p></list-item>
</list>
<p>A comparative overview of these resources is provided in <xref ref-type="table" rid="T3">Table 3</xref>, which summarizes their temporal and geographic coverage, resolution, and accessibility. While the Centennia, Cliopatria, and EurAtlas shapefiles offer broad historical coverage&#8212;either pan-European or global&#8212;and span from antiquity to the modern era, they were primarily designed as general-purpose tools and are widely used in applied economic history. By contrast, the other atlases are more narrowly focused, both in geographic scope and in time frame. <xref ref-type="fig" rid="F1">Figure 1</xref> plots publication centers of the <italic>Collectio academica antiqua</italic>&#8217;s holdings over time, employing time-consistent and evolving boundaries.</p>
<table-wrap id="T3">
<label>Table 3</label>
<caption>
<p>The table compares selected historical shapefiles by time span, spatial scope, temporal resolution, and accessibility.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Name</bold></td>
<td align="left" valign="top"><bold>Coverage (Time)</bold></td>
<td align="left" valign="top"><bold>Coverage (Geography)</bold></td>
<td align="left" valign="top"><bold>Temporal Resolution</bold></td>
<td align="left" valign="top"><bold>Open Access</bold></td>
</tr>
<tr>
<td align="left" valign="top">Centennia Historical Atlas</td>
<td align="left" valign="top">ca. 1000&#8211;2000 CE</td>
<td align="left" valign="top">Europe</td>
<td align="left" valign="top">Variable, up to one-tenth of a year</td>
<td align="left" valign="top">No</td>
</tr>
<tr>
<td align="left" valign="top">Cliopatria</td>
<td align="left" valign="top">3400 BCE&#8211;present</td>
<td align="left" valign="top">Global</td>
<td align="left" valign="top">Variable</td>
<td align="left" valign="top">Yes</td>
</tr>
<tr>
<td align="left" valign="top">EurAtlas</td>
<td align="left" valign="top">1&#8211;2000 CE</td>
<td align="left" valign="top">Europe</td>
<td align="left" valign="top">100-year steps</td>
<td align="left" valign="top">No</td>
</tr>
<tr>
<td align="left" valign="top">TRF-GIS</td>
<td align="left" valign="top">1870&#8211;1940</td>
<td align="left" valign="top">France</td>
<td align="left" valign="top">Annual</td>
<td align="left" valign="top">Yes</td>
</tr>
<tr>
<td align="left" valign="top">IPUMS Mosaic</td>
<td align="left" valign="top">1770&#8211;2003</td>
<td align="left" valign="top">Europe and selected countries</td>
<td align="left" valign="top">30-year steps (Europe-wide); finer for some states</td>
<td align="left" valign="top">Yes</td>
</tr>
<tr>
<td align="left" valign="top">ETH Zurich HGIS</td>
<td align="left" valign="top">ca. 1500&#8211;1600</td>
<td align="left" valign="top">Holy Roman Empire</td>
<td align="left" valign="top">Static (single date)</td>
<td align="left" valign="top">Yes</td>
</tr>
<tr>
<td align="left" valign="top">Historical Atlas of the Low Countries</td>
<td align="left" valign="top">1350&#8211;1850</td>
<td align="left" valign="top">Low Countries</td>
<td align="left" valign="top">Variable (per census)</td>
<td align="left" valign="top">Yes</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F1">
<label>Figure 1</label>
<caption>
<p>The maps show printing centers in the Low Countries and surroundings at fifty-year intervals. Bubble area is proportional to the number of printed holdings from the <italic>Collectio academica antiqua</italic> associated with each city. Historical boundaries are time-varying and drawn from the Centennia Historical Atlas.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jca-1097_scebba-g1.png"/>
</fig>
</sec>
<sec>
<title>Classifying Content: Assigning Topical Terms</title>
<p>Library catalogs typically include some indication of what each book is about. In the MARC 21 schema, this information tends to be stored in tags 650 and 655, which refer respectively to topical words and genre categories. However, this metadata might be incomplete, inconsistently applied, or skewed by cataloging practices. While a close reading approach might allow for a direct understanding of the subject of each book, this method is not feasible in large-scale, data-driven studies. In these cases, a systematic and ideally automated workflow for classifying the content of library holdings becomes essential. Most of these approaches draw on a key element, that is, textual input, most commonly the book title, which in early modern imprints is often lengthy and content-rich. Although full-text content could, in theory, offer even richer input, the focus here is on completing and standardizing existing catalog metadata to enable structured comparisons across records. Assigning accurate subject metadata is an essential preprocessing step for higher-level analyses tracking thematic trends over time, identifying disciplinary clusters, or studying the diffusion of ideas across communities.</p>
<p>Automated classification techniques broadly fall into two categories: supervised and unsupervised methods. Supervised methods rely on a labeled dataset, that is, a subset of records for which the correct classification (e.g., field or topic) is already known. Machine-learning models are trained to learn these associations and predict labels for unclassified data. An example of supervised topic modeling in the case of bibliographic metadata is introduced in <xref ref-type="bibr" rid="B28">Koschnick</xref> and it involves fine-tuning a pretrained transformer model such as DistilBERT (<xref ref-type="bibr" rid="B43">Sanh et al.</xref>) on a dataset of titles or abstracts paired with subject headings. The model learns to classify each input into one of the predefined topics, based on patterns in the text. Unsupervised methods, by contrast, do not require labeled input. Topic modeling techniques, such as Latent Dirichlet Allocation (LDA) (<xref ref-type="bibr" rid="B5">Blei et al.</xref>) attempt to infer hidden thematic structures by analyzing word co-occurrence patterns in a corpus. Similarly, algorithms such as k-means clustering can group texts based on similarity in their vector representations without any prior knowledge of categories. Applications of k-means clustering to bibliographic metadata include the analysis of genre-indicating subtitles in German literature (<xref ref-type="bibr" rid="B20">Gittel</xref>) and the definition of academic fields from publication titles (<xref ref-type="bibr" rid="B12">Curtis and de la Croix</xref>).</p>
<p>While unsupervised methods can help identify thematic patterns, they are often less appropriate when the goal is to complete or standardize existing subject metadata. In such cases, including the present paper, supervised approaches are favored because they directly learn from cataloged examples and preserve alignment with established classification schemes.</p>
<p>In my case study of the <italic>Collectio academica antiqua</italic>, I used a supervised approach. I identified a topical classification term for 38.4% of the holdings (1,450 out of 3,777 records). This subset served as the labeled data for training. I then proceeded to derive a higher-level classification by grouping subject headings in more general categories. To assign topics to the books in the <italic>Collectio academica antiqua</italic> that lacked subject metadata, I fine-tuned a DistilBERT model to classify titles based on the labeled records. Each title was mapped to one of three broad disciplinary categories&#8212;theology, humanities, or sciences&#8212;based on the existing catalog labels. Because the model was pretrained on English text, I first translated titles to ensure compatibility. The model was then trained over four epochs, that is, it went through the entire training dataset four times to refine its predictions.</p>
<p>To evaluate the model, I set aside 20% of the labeled data (290 titles) as a validation set. Model performance differs markedly across categories. As displayed in <xref ref-type="table" rid="T4">Table 4</xref>, precision, recall, and F1 scores are high for theology and humanities, the two dominant classes in the labeled data, indicating that the classifier learns stable and interpretable patterns for these categories. By contrast, the sciences category is severely underrepresented in the training set, and the model fails to predict this class in the validation data, resulting in zero recall for this category. As a consequence, overall accuracy is driven by the two majority classes, while macro-averaged performance metrics provide a more informative summary under class imbalance.</p>
<table-wrap id="T4">
<label>Table 4</label>
<caption>
<p>The table reports precision, recall, and F1 scores by class for the validation set of the <italic>Collectio academica antiqua</italic>. Support indicates the number of titles per category in the held-out validation data. Performance is strong for theology and humanities, the two majority classes, while the sciences category is severely underrepresented and not predicted by the model, reflecting the limits imposed by class imbalance rather than classifier instability.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"></td>
<td align="left" valign="top"><bold>Precision</bold></td>
<td align="left" valign="top"><bold>Recall</bold></td>
<td align="left" valign="top"><bold>F1-score</bold></td>
<td align="left" valign="top"><bold>Support</bold></td>
</tr>
<tr>
<td align="left" valign="top">Humanities</td>
<td align="left" valign="top">0.793</td>
<td align="left" valign="top">0.836</td>
<td align="left" valign="top">0.814</td>
<td align="left" valign="top">110</td>
</tr>
<tr>
<td align="left" valign="top">Sciences</td>
<td align="left" valign="top">0.000</td>
<td align="left" valign="top">0.000</td>
<td align="left" valign="top">0.000</td>
<td align="left" valign="top">6</td>
</tr>
<tr>
<td align="left" valign="top">Theology</td>
<td align="left" valign="top">0.897</td>
<td align="left" valign="top">0.897</td>
<td align="left" valign="top">0.897</td>
<td align="left" valign="top">174</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="fig" rid="F2">Figure 2</xref> illustrates this imbalance, reporting both raw counts and row-normalized percentages in order to make performance comparable across unevenly sized classes. Each row of the matrix represents the actual category of the entries from the validation set, while each column shows the predicted category assigned by the model. If the model worked perfectly, each class count would fall in the diagonal, where predicted labels match the actual ones, and the off-diagonal entries, which indicate misclassification, would be empty. Most errors occurred between theology and humanities, which reflect their historical overlap. The sciences category is severely underrepresented in the training data, and the model fails to predict this class in the validation set, reflecting the limits imposed by class imbalance. Once the model was trained, I used it to assign topics to the remaining set of titles that lacked subject metadata. These predicted labels were then combined with the original metadata to generate a complete, standardized topical classification across the entire collection.</p>
<fig id="F2">
<label>Figure 2</label>
<caption>
<p>Confusion matrix for the validation set of the labeled holdings in the <italic>Collectio academica antiqua</italic>. Each row shows the true topic based on catalog metadata, and each column shows the model&#8217;s predicted topic. Values along the diagonal (off-diagonal) indicate correct (incorrect) classifications.</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jca-1097_scebba-g2.png"/>
</fig>
</sec>
</sec>
<sec>
<title>Section 4: Modeling Catalog Data for Analysis</title>
<p>This section returns to the research questions outlined in the Introduction by examining how different modeling choices condition the interpretation of bibliographic metadata when represented as relational and network-based data. Having outlined the steps required to transform bibliographic metadata into analyzable data, I now turn to their analytical modeling, focusing on network representations. First, I revisit the assumptions and limitations of interpreting catalog metadata as a graph. Then, drawing on a co-occurrence network derived from the <italic>Collectio academica antiqua</italic>, I show how the role attribute encoded in catalog metadata is not naturally accommodated by a single-layer projection and how a multilayer framework offers a coherent way to preserve this heterogeneity when adopting a co-occurrence representation. Lastly, I turn to quantitative economic history as a comparative analytical framework for the use of bibliographic metadata.</p>
<sec>
<title>Adopting a Network Perspective</title>
<p>Catalog records are not inherently networked systems. However, they do encode structured associations among entities&#8212;authors linked to books, books linked to subjects, people linked to other people via shared roles in a publication, and so on. A network representation emerges only when we interpret these associations under a specific set of assumptions. For example, if two agents appear together on a title page, one might interpret that co-occurrence as a connection between them. Many studies in the computational humanities adopt this strategy. For instance, <xref ref-type="bibr" rid="B18">Gavin</xref> and <xref ref-type="bibr" rid="B25">Hill et al., &#8220;Reconstructing Intellectual Networks&#8221;</xref> on early English literary criticism and book trade, respectively; <xref ref-type="bibr" rid="B22">Greteman</xref> on the English print network from its origins through the eighteenth century; <xref ref-type="bibr" rid="B29">Ladd</xref> on early modern dedications; <xref ref-type="bibr" rid="B50">Valleriani et al.</xref> on networks of printers and publishers and their influence on the evolution of scientific knowledge; Ryan and Tolonen on Scottish Enlightenment publishing; and <xref ref-type="bibr" rid="B23">He&#223;br&#252;ggen-Walter</xref> on interdisciplinarity in seventeenth-century German dissertations. All of these works use the co-occurrence of individuals in print artifacts as an interpretative lens to reconstruct intellectual, textual, or social communities. This raises two broader methodological questions: Under what interpretive assumptions can bibliographic metadata be treated as network data, and what are the limits of such representations? The <italic>Collectio academica antiqua</italic> provides a particularly fitting test for these questions because it documents a well-bounded institutional world (i.e., an academic print production orbiting around the university center of Louvain) where the logic of collaboration can be observed directly in the paratext and imprint data. This discussion directly addresses the second research question mentioned early on in the paper, by making explicit the interpretive assumptions under which bibliographic metadata can be treated as network data and by delineating the limits of co-occurrence-based representations.</p>
<p>First and foremost, library catalog data were never intended to capture explicit relationships between historical actors. Their purpose is providing bibliographic description and information management. Nonetheless, converting a collection&#8217;s metadata into a co-occurrence network means establishing links based on the structured associations present in the records (people, texts, places, and topics connected via shared bibliographic entries). This reinterpretation is not without pitfalls. A key risk is assuming that co-occurrence automatically implies a meaningful social relationship. Simply sharing a title page does not guarantee contemporaneity, collaboration, or even mutual awareness, especially in cases like posthumous editions or honorary dedications.</p>
<p>Yet, despite this ambiguity, the connections we derive from bibliographic metadata are grounded in historical evidence, that is, the sources themselves: title pages, imprint statements, colophons, and dedications. They are not arbitrary, ad hoc constructions imposed by the researcher. While the meaning of each connection may be open to interpretation, the very co-occurrence in the production of a shared work constitutes material evidence derived directly from the historical record (<xref ref-type="bibr" rid="B25">Hill et al., &#8220;Reconstructing Intellectual Networks&#8221;</xref>). At the same time, bibliographic metadata can sometimes support a stronger definition of network ties than simple co-occurrence. For instance, using bibliographic metadata from the <italic>Sphaera</italic> corpus, a curated collection of 359 astronomy and cosmology textbooks published between 1472 and 1650, <xref ref-type="bibr" rid="B50">Valleriani et al.</xref> reconstruct a network of book producers by defining &#8220;awareness relationships&#8221; between early modern printers and publishers. Such relationships arise when two similar editions are attributed to different producers, typically indicating imitation or participation in the same print run. Here, bibliographic fingerprint of editions and additional metadata on printers&#8217; and publishers&#8217; biographical timelines are used to establish, respectively, chains of editions and contemporaneity, yielding ties that can be interpreted more robustly than co-occurrence alone.</p>
</sec>
<sec>
<title>The Multilayer Approach</title>
<p>With these caveats in mind, bibliographic metadata have been used to construct various kinds of networks depending on the entities and relationships of interest. One of the most common projections is a person-to-person network where an edge represents two individuals&#8217; joint participation in the same publication. This co-occurrence network is frequently employed to study scholarly or intellectual communities. Even this straightforward representation involves important modeling choices. A first decision concerns an edge&#8217;s directionality. Defining edges as directed or undirected depends on the interpretative aim. For example, cases reflecting unilateral acknowledgment like links from authors to dedicatees might call for directed edges, whereas undirected edges more accurately capture the symmetric nature of shared participation in other contexts. A second choice concerns encoding some data as node attributes (e.g., relevant life dates, biographical information, and religious or institutional affiliation), or as edge attributes (e.g., the publication year and the number of shared works).</p>
<p>However, there is one critical dimension that resists both types of attribute encoding: the role of each person within the book. This is because roles are not fixed properties of individuals. Rather, a node may be an author in a publication and a dedicatee in another. Nor do they characterize the tie itself, because each tie reflects individual involvement in a shared work, not the work as a whole. For example, the connection between a printer and an author cannot be reduced to either an attribute of the person or an attribute of their tie.<xref ref-type="fn" rid="n13">13</xref> Basically, individuals&#8217; roles in each publication are asymmetric and two-way, which disqualifies both node and edge attributes as adequate carriers of this information. Capturing these asymmetries is essential for understanding early modern print economies, where the same person could occupy different positions across publications.</p>
<p>Hence, I argue that the most coherent solution is to model the co-occurrence network as a multilayer graph. Following the definition offered by <xref ref-type="bibr" rid="B27">Kivel&#228; et al.</xref>, a multilayer network extends beyond nodes and edges to include <italic>layers</italic> as core components, allowing each node to appear in multiple layers and enabling edges to connect any pair of node-layer instances. Additionally, multiple <italic>aspects</italic>, or features&#8212;that is, different sets of layers&#8212;can be modeled on top of each other (209). In practice, an aspect defines a distinct dimension of variation in the data, such as the type of activity, the temporal slice, or the institutional context, so that multiple aspects can coexist within one unified, multilayer structure.</p>
<p>In the present case, I only focus on a single aspect, the co-occurrence dimension in the catalog, where each layer corresponds to a distinct role and where nodes may participate in one or more role-layers depending on their involvement in each publication. This framework allows for both inter-layer edges (i.e., connections stemming from shared works in different roles) and intra-layer edges (i.e., ties between individuals sharing the same publication <italic>and</italic> role). This representation makes it possible to measure how role multiplicity structures collaboration and to distinguish between within-role cohesion and cross-role integration.</p>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> offers a visual example of what a multilayer co-occurrence network can look like.<xref ref-type="fn" rid="n14">14</xref> I built a person-to-person co-occurrence network from the bibliographic metadata of the <italic>Collectio academica antiqua</italic>. The layers are defined from the roles recorded in the metadata, which I grouped into broader categories as detailed in <xref ref-type="table" rid="T6">Table 6</xref> in Appendix A.1.<xref ref-type="fn" rid="n15">15</xref> The figure illustrates the methodological shift from descriptive catalog lists to an explicitly relational view, where each role layer isolates one channel of participation. Although the full network comprises several thousand individuals and ten role layers, the figure focuses on the fifty most active nodes and four main layers to improve legibility. In this representation, layers are the loci where nodes appear when a given role emerges in at least one shared publication. Thus, if an individual has different roles in different works, they appear in multiple layers. The layer classification preserves the co-occurrence principle: Edges are still defined by co-occurrence in publications. Dashed lines denote inter-layer edges and solid lines denote intra-layer edges.</p>
<fig id="F3">
<label>Figure 3</label>
<caption>
<p>Multilayer person-to-person co-occurrence network constructed from the bibliographic metadata of the <italic>Collectio academica antiqua</italic>. Each layer isolates one type of role-based connection. Nodes represent individuals (only those with highest degree are shown for legibility). Intra-layer edges (blue lines) and inter-layer edges (red lines) connect individuals who co-occur in at least one book, respectively in the same or in different roles. The plots were generated with pymnet (<xref ref-type="bibr" rid="B35">Nurmi et al.</xref>).</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="jca-1097_scebba-g3.png"/>
</fig>
<p>Thinking of co-occurrence in multilayer terms allows for several modes of analysis. One can examine each layer separately to study the network of collaborations within a single role. This is particularly useful when intra-layer ties dominate or when the aim is to identify role-specific structural patterns. In practice, within-role co-occurrence in early modern data tends to be sparse. In the <italic>Collectio academica antiqua</italic> network, fewer than one-quarter of the ties link individuals who share the same role in a publication; more than 75% of connections are between people in different roles (inter-layer links; see <xref ref-type="table" rid="T8">Table 8</xref> in the Appendix). This is consistent with the low density observed in each individual layer (<xref ref-type="table" rid="T7">Table 7</xref>), which in turn reflects the historical reality that collaborative roles, like for instance co-authorship, were relatively rare in early modern print culture. If we were to apply this method to modern scientific publications, we would expect to see a much denser authorship co-occurrence layer, whereas the early modern dataset under analysis shows that authors typically connect to others through complementary roles (e.g., author to printer, author to dedicatee, etc.) rather than directly to other authors. This dominance of inter-layer ties underscores how production and intellectual labor were intertwined: Most connections bridge material and scholarly functions rather than occur within a single occupational sphere.</p>
<p>Second, the layers can also be collapsed into a single projection when applying classical metrics such as centrality or modularity, though doing so sacrifices the interpretive nuance that multilayer distinctions offer.</p>
<p>Third, and most distinctively, one can analyze patterns that involve the interplay between layers, something not possible in a single-layer projection. In this case, the layers linked to material production (printers, publishers, and booksellers) provide a controlled setting to study how the combination of roles across publications relates to the university ecosystem from which much of the corpus originated.<xref ref-type="fn" rid="n16">16</xref> Focusing on the production side is particularly revealing because the <italic>Collectio academica antiqua</italic> primarily contains academic works&#8212;dissertations, textbooks, and reprints of classical authors&#8212;commissioned for or produced within the orbit of the Old University of Louvain. In such a context, holding multiple production roles can be read as an indicator of alignment with the university&#8217;s printing economy, where a small number of trusted workshops frequently combined several functions under one roof. Analyzing these overlaps therefore allows one to test whether functional integration within print workshops corresponded to closer institutional or intellectual integration with the university world itself.</p>
<p>To examine this mechanism empirically, I traced the involvement of academic actors across the corpus. The <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.atom.com/name/STUDIUM.AI">Studium.AI</ext-link> research infrastructure is developing a comprehensive list of the students enrolled at the Old University of Louvain, but academic scholars can already be identified through the <italic>Repertorium Eruditorum Totius Europae</italic> (RETE) project, a pan-European prosopographical database documenting the university careers of medieval and early modern professors.<xref ref-type="fn" rid="n17">17</xref></p>
<p>Through the disambiguation process described in Section 3&#8217;s Identifying Historical Actors, I linked each person in the <italic>Collectio academica antiqua</italic> to a unique CERL and Wikidata identifier. This, combined with a manual search, allows the matching of individuals appearing in the collection to scholars from the RETE database. Given RETE&#8217;s pan-European scope, the linkage allowed me to identify which books involve professors in general and which specifically involve professors affiliated with Louvain. The integration with RETE data adds a socio-institutional dimension to the bibliographic co-occurrence network: Some nodes now carry information on whether the person held a university position, thus allowing the relationship between print production and academia to be tested empirically. I find that around three-fourths of the books of the <italic>Collectio academica antiqua</italic> see the participation of a university professor, specifically Louvain professors in most of the cases but also professors active at other European universities (around 7%). These figures confirm that although the collection is institutionally rooted in Louvain, its intellectual network extended beyond the city. After all, the transregional dimension of the book trade in the Low Countries is a well-established fact in the literature (<xref ref-type="bibr" rid="B15">De Ridder et al.</xref>).</p>
<p>I then tested whether production agents who held multiple roles, for instance those appearing as both printer and bookseller in different publications, are more frequently associated with books linked to professors. The test is based on a simple comparison of proportions. For each book, I classified the involved production agents as single-role or multi-role and recorded whether the book involves at least one professor. I computed the share of professor-linked books among multi-role and single-role agents and assessed whether the difference is larger than expected by chance using a difference-in-proportions (z) test.<xref ref-type="fn" rid="n18">18</xref></p>
<p>Results in <xref ref-type="table" rid="T5">Table 5</xref> show that books involving multi-role production agents are significantly more likely to involve professors. The share of professor-linked books is about 83% for multi-role agents, compared with 73% for single-role agents, a difference of 9 percentage points (<italic>z</italic> = 6.96, <italic>p</italic> = 3.35&#215;10<sup>&#8211;</sup><sup>12</sup>). The association is even stronger for books involving professors affiliated with Louvain, where the corresponding shares are 77% and 63%, yielding a difference of 14 percentage points (<italic>z</italic> = 9.53, <italic>p</italic> = 1.61&#215;10<sup>&#8211;</sup><sup>21</sup>). These patterns illustrate how preserving role differentiation in a multilayer co-occurrence representation makes visible systematic associations between production structure and academic involvement that would be obscured in a flattened network view. In this sense, the case study demonstrates how modeling choices are not merely technical, but substantively shape the kinds of institutional and historical patterns that can be recovered from catalog data.</p>
<table-wrap id="T5">
<label>Table 5</label>
<caption>
<p>The table reports association of multi-role production agents with professors.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Group</bold></td>
<td align="left" valign="top"><bold>Share<sub>profs</sub></bold></td>
<td align="left" valign="top"><bold>Share<sub>Louvain</sub></bold></td>
<td align="left" valign="top"><bold>Observations</bold></td>
</tr>
<tr>
<td align="left" valign="top">Single-role</td>
<td align="left" valign="top">0.734</td>
<td align="left" valign="top">0.631</td>
<td align="left" valign="top">1,589</td>
</tr>
<tr>
<td align="left" valign="top">Multi-role</td>
<td align="left" valign="top">0.826</td>
<td align="left" valign="top">0.770</td>
<td align="left" valign="top">2,403</td>
</tr>
<tr>
<td align="left" valign="top">Difference (multi minus single)</td>
<td align="left" valign="top">0.092</td>
<td align="left" valign="top">0.139</td>
<td align="left" valign="top"></td>
</tr>
<tr>
<td align="left" valign="top">z statistic</td>
<td align="left" valign="top">6.96</td>
<td align="left" valign="top">9.53</td>
<td align="left" valign="top"></td>
</tr>
<tr>
<td align="left" valign="top">p value</td>
<td align="left" valign="top">3.35&#215;10<sup>&#8211;12</sup></td>
<td align="left" valign="top">1.61&#215;10<sup>&#8211;21</sup></td>
<td align="left" valign="top"></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The same reasoning applies when considering how many distinct layers each person appears in, sometimes called the actor&#8217;s &#8220;multiplexity,&#8221; or the number of role dimensions in their activity. In the <italic>Collectio academica antiqua</italic> network, only about 12% of individuals participate in more than one role layer (see <xref ref-type="table" rid="T9">Table 9</xref> in the Appendix). Those who do are often individuals who also served in another capacity (e.g., editors of other work, authors who also wrote dedications, etc.). <xref ref-type="table" rid="T10">Table 10</xref> in the Appendix shows the share of multi-role individuals in each layer. Unsurprisingly, roles like authorship and paratextual attributions account for a large fraction of the multiplex individuals, whereas very few censors, translators, or illustrators in this dataset also took on additional roles. The prevalence of paratextual connections (e.g., authors who share a printer also often share an editor or a contributor) is partly a reflection of how thoroughly this particular catalog records those secondary roles. By contrast, the censorship role appears underrepresented in the network. Historically, from the 1520s onward, works printed in places like Louvain often included an approbation or censor&#8217;s note (<xref ref-type="bibr" rid="B6">Cammaerts 241</xref>), but in the data derived from the <italic>Collectio academica antiqua</italic>, such connections are few. This likely indicates that many approbations were not captured in the catalog metadata, not that censorship was absent. A comparison with a dataset where censorship information is fully recorded could yield a denser censorship layer and alter the overall network structure. This example highlights how the depth and focus of cataloging practices (which information was recorded or omitted) directly influence the networks we can extract. In this perspective, a comparative analysis using this analytical lens could substantiate how differences in cataloging practices shape resulting network structures.</p>
</sec>
<sec>
<title>Insights from Quantitative Economic History</title>
<p>Some studies in history and literary scholarship that use bibliographic metadata often combine meticulous preprocessing pipelines with network visualizations and descriptive commentary, typically focused on centrality measures. These exercises undoubtedly elevate our understanding of the collections and catalogues used as material, yet, as <xref ref-type="bibr" rid="B34">Lemercier and Zalc</xref> observe, such work frequently remains at the stage of mapping and visual speculation, without fully mobilizing networks as formal analytical tools. The patterns observed in Section 4.1 (e.g., the sparsity of within-role links or the rarity of multi-role connections) highlight both the potential and the limitations of co-occurrence networks derived solely from catalog data. On one hand, such networks can reveal hidden structures in the catalog or collection studied; on the other, relying on networked bibliographic data exclusively, without complementary data or further analytical techniques, may limit the types of historical questions we can answer. To overcome this limitation, bibliographic network studies can draw on frameworks that have demonstrated how relational data can be embedded in explicitly articulated analytical designs, most notably, quantitative economic history.</p>
<p>In recent years, bibliographic metadata have begun to feature in applied economic history, often alongside other sources. These studies go beyond descriptive and exploratory scopes and use a range of empirical tools beyond network analysis. Importantly, bibliographic metadata are not treated as econometric data but instead as surviving historical evidence, interpreted and combined with other sources within composite datasets tailored to specific historical questions, for example, building on library catalogues, biographical sources, and manuscript data. Chaney, &#8220;Religion and the Rise and Fall of Islamic Science&#8221;, and &#8220;Modern Library Holdings Historic City Growth&#8221; constructs a georeferenced dataset of authors from which he derives metrics such as author counts, the share of authors engaged in scientific topics, and number of authors&#8217; deaths. He uses this resource to address two distinct questions, respectively, that of the decline of scientific output in the medieval Islamic world and that of the enhancement of preindustrial city growth estimates.</p>
<p>Chiopris analyzes how spatial connections, namely the introduction of the railroad network in nineteenth-century German-speaking regions, shaped the creation and diffusion of ideas. Using comprehensive library catalogues from the German Collective Library Consortium, the author measures the emergence and spread of new ideas, broadly defined as novel publication topics, new words, or new combinations of existing concepts derived from enriched catalogue metadata.</p>
<p><xref ref-type="bibr" rid="B28">Koschnick</xref> links author-level publication data from the English Short Title Catalogue (ESTC) to collegiate records from Oxford and Cambridge Universities to examine the diffusion of ideas between teachers and students from 1600 to 1800. In another study, <xref ref-type="bibr" rid="B14">de Pleijt and Koschnick</xref> combine ESTC data with information on scholars&#8217; socioeconomic backgrounds, and using sentiment analysis on the titles of printed works, they find that limited career prospects for less advantaged academics coincided with a rise in dissenting religious publications.</p>
<p>Finally, Cervellati et al. use metadata from the epistolary union catalogue Early Modern Letters Online (EMLO) to study the role of the Republic of Letters in explaining why Britain experienced a sustained economic take-off at the time of the Industrial Revolution. These contributions do not leverage networks as an analytical lens, but they do use bibliographic metadata as part of broader quantitative frameworks to address questions about scholarly output, institutional change, and idea diffusion. What they also show is that bibliographic metadata can sustain formal empirical strategies once linked with complementary datasets, to the end of the pursuit of a specific research question.</p>
<p>The disciplinary silo between quantitative economic history and digital or computational history has left analytical strategies developed in the former largely unexplored in the latter. From this perspective, large-scale patterns revealed by bibliographic metadata can be connected to deeper historical questions when analyzed with the appropriate combination of domain knowledge and formal methods. A crucial step in this direction involves enriching catalog metadata with external sources, based on the research question at hand. The multilayer model proposed in Section 4 might operationalize this integration as follows. The bibliographic co-occurrence network can be conceived as an initial aspect or feature, that is, a first set of layers, onto which additional layers sets could be stacked to encode relations for the same entities drawn from complementary sources. This approach facilitates the incorporation of external knowledge into digital history analyses and lay the groundwork for applying more sophisticated network techniques, like, for example, peer-effects models.</p>
</sec>
</sec>
<sec>
<title>Section 5: Conclusions</title>
<p>In an effort to provide perspective on research materials and methods from neighboring fields, this paper has highlighted a growing convergence of computational history and quantitative economic history around shared sources and tools, most notably bibliographic metadata and network analysis.</p>
<p>Using a role-differentiated co-occurrence network representation of the <italic>Collectio academica antiqua</italic> of the Old University of Louvain, I demonstrated how different involvement types in book production (authorship, publication and distribution, dedication, censorship, and so on) can be modeled without being flattened into a single type of tie. The interplay of inter- and intra-layer connections reveals structural patterns that reflect both historical publishing practices and cataloging norms. Yet the exploration of my case study also shows that when bibliographic metadata is used in isolation and interpreted solely through network visualizations, the analytical power remains limited.</p>
<p>This underlines two distinct but complementary points. First, when bibliographic metadata are modeled as networks, analytical coherence depends on preserving the internal structure of the records rather than flattening heterogeneous forms of participation into undifferentiated ties. Second, even when such structure is preserved, bibliographic metadata alone offer limited leverage for addressing complex historical questions. Their analytical potential is fully realized only when relational representations derived from catalog data are linked to complementary sources that provide institutional, social, or biographical context. Together, these points respond to the research questions posed at the outset by clarifying both the conditions under which bibliographic metadata can be meaningfully modeled as relational data and the limits of such modeling when used in isolation.</p>
<p>In this sense, the contribution of the present article is threefold: a methodologically reflective case study, a role-aware relational modeling strategy, and a cross-field perspective that situates bibliographic metadata at the intersection of humanities and social science research.</p>
</sec>
</body>
<back>
<fn-group>
<fn id="n1"><p>pymarc is a Python library for reading, writing, and parsing MARC21 records. For the documentation, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://pypi.org/project/pymarc/">https://pypi.org/project/pymarc/</ext-link>.</p></fn>
<fn id="n2"><p>The institutional history is more complex: the original university, known as the Old University of Louvain, was suppressed under French revolutionary rule in 1797; then briefly refounded as the secular State University of Louvain in 1817 before being abolished again in 1835; and finally re-established as the Catholic University in 1834. In 1968, amid the linguistic tensions between Flemish and Francophone communities that have long shaped Belgian public life, it was split into the Dutch-language Katholieke Universiteit Leuven, remaining in Louvain (in the province of Flemish Brabant), and the French-language Universit&#233; Catholique de Louvain, relocated to the newly built campus of Louvain-la-Neuve (in the province of Walloon Brabant).</p></fn>
<fn id="n3"><p>Corporate entities usually designate institutions such as religious orders, universities, academies, government bodies, or, in some cases, printing houses and workshops, which may be catalogued as such rather than as individual printers.</p></fn>
<fn id="n4"><p>OpenRefine is an open-source data cleaning and transformation tool designed for working with messy or semi-structured data. For more, see OpenRefine, <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://openrefine.org/">https://openrefine.org/</ext-link>.</p></fn>
<fn id="n5"><p>The CERL Thesaurus is maintained by the Consortium of European Research Libraries and aggregates historical name authorities for persons and corporate entities active in book production. For more on this, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://data.cerl.org/thesaurus/">https://data.cerl.org/thesaurus/</ext-link>.</p></fn>
<fn id="n6"><p>For more on the Python library cerl, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://pypi.org/project/cerl/">https://pypi.org/project/cerl/</ext-link>. The library was developed by Andreas Walker. A minor compatibility fix is required to use it with recent versions of urllib3.</p></fn>
<fn id="n7"><p>GeoNames is geographical database providing rich geospatial metadata for each location, including names in multiple languages and spatial coordinates. For more on the World Historical Gazetteer, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://whgazetteer.org/">https://whgazetteer.org/</ext-link>. The World Historical Gazetteer is a curated index of historical place names with spatial and temporal metadata and can be used by uploading individual or group projects. For more on RBMS/BSC Latin Place Names File, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://rbms.info/lpn/">https://rbms.info/lpn/</ext-link>. The RBMS/BSC Latin Place Names File is a curated list of Latin place names and their modern equivalents, maintained by the Bibliographic Standards Committee (BSC) of the Rare Books and Manuscripts Section (RBMS) of the American Library Association.</p></fn>
<fn id="n8"><p>Developed by Christos and Marc-Antoine N&#252;ssli. For more on EurAtlas, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://euratlas.com/">https://euratlas.com/</ext-link>.</p></fn>
<fn id="n9"><p>For more on the Harvard Dataverse, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://dataverse.harvard.edu/dataverse/TRF-GIS">https://dataverse.harvard.edu/dataverse/TRF-GIS</ext-link>.</p></fn>
<fn id="n10"><p>For more on the IPUMS Mosaic project, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://mosaic.ipums.org/historical-gis-datafiles">https://mosaic.ipums.org/historical-gis-datafiles</ext-link>.</p></fn>
<fn id="n11"><p>For more on the Historical GIS Collection at ETH Zurich, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://www.research-collection.ethz.ch/handle/20.500.11850/472583">https://www.research-collection.ethz.ch/handle/20.500.11850/472583</ext-link>.</p></fn>
<fn id="n12"><p>For more on <italic>(Re)counting the Uncounted</italic>, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://datasets.iisg.amsterdam/dataverse/recountingtheuncounted">https://datasets.iisg.amsterdam/dataverse/recountingtheuncounted</ext-link>.</p></fn>
<fn id="n13"><p>An exception arises when focusing solely on a specific mode of co-occurrence, such as author&#8211;dedicatee pairings (see <xref ref-type="bibr" rid="B29">Ladd</xref>).</p></fn>
<fn id="n14"><p>Multilayer networks are not to be confused with multiplex networks. As discussed in See <xref ref-type="bibr" rid="B27">Kivel&#228; et al.</xref>, the latter are a special case of the former in which inter-layer links <italic>only</italic> connect identical nodes across layers (diagonal coupling), and layers are often&#8212;though not necessarily&#8212;node-aligned (i.e., all nodes exist in every layer). In the present case, we refer specifically to a multilayer network, as the very nature of co-occurrence entails inter-layer edges between <italic>different</italic> nodes, and nodes are not guaranteed to appear in every layer.</p></fn>
<fn id="n15"><p>This grouping is a practical necessity for visualization and interpretive clarity, yet it is not without consequences. Aggregating fine-grained roles into broader categories reduces the apparent number of ties and slightly alters degree distributions, since individuals active under multiple specific designations (e.g., &#8220;printer&#8221; and &#8220;publisher&#8221;) are counted as one.</p></fn>
<fn id="n16"><p>Unlike in the general multilayer representation, in the production-side analysis I rely on the original, fine-grained role designations instead of the aggregated macro-categories. The distinction between &#8220;printer,&#8221; &#8220;publisher,&#8221; and &#8220;bookseller&#8221; is historically meaningful in the early modern context and captures variations in economic function that would be obscured by broader grouping.</p></fn>
<fn id="n17"><p>For a provisional interface of the project, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://studium-ai.org/">https://studium-ai.org/</ext-link>. For more on the RETE project, see <ext-link ext-link-type="uri" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="https://ojs.uclouvain.be/index.php/RETE/about">https://ojs.uclouvain.be/index.php/RETE/about</ext-link>.</p></fn>
<fn id="n18"><p>The test compares proportions across two independent groups using a large-sample normal approximation to the difference in means of a binary outcome. Given the sample sizes involved, the approximation is appropriate.</p></fn>
</fn-group>
<sec>
<title>Appendix</title>
<sec>
<title>Mapping of Roles</title>
<p>This subsection presents the grouping strategy used to translate detailed role descriptors into broader layer categories for multilayer network construction. The table clarifies how fine-grained roles, such as &#8220;writer of preface,&#8221; &#8220;censor,&#8221; &#8220;engraver,&#8221; and others, are aggregated into interpretable layers like Paratextual Attributions, Censorship, and Engraving/Illustration. The grouping reflects a deliberate analytical choice designed to balance representational fidelity with visual and conceptual clarity. Other researchers may opt for alternate classifications according to their research aims.</p>
<table-wrap id="T6">
<label>Table 6</label>
<caption>
<p>The table describes the mapping of granular roles to broader layer categories.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Layer</bold></td>
<td align="left" valign="top"><bold>Granular roles</bold></td>
</tr>
<tr>
<td align="left" valign="top">Authorship</td>
<td align="left" valign="top">author; author original work; dubious author</td>
</tr>
<tr>
<td align="left" valign="top">Paratextual Attributions</td>
<td align="left" valign="top">writer of preface; compiler; adapter; readapted by; author in quotations or text abstracts; narrator; commentator for written text; author of introduction, etc.; corrector; compiler of index; eulogist; collaborator; contributor</td>
</tr>
<tr>
<td align="left" valign="top">Translation</td>
<td align="left" valign="top">translator</td>
</tr>
<tr>
<td align="left" valign="top">Publication/Distribution</td>
<td align="left" valign="top">printer; editor; bookseller; publisher; auctioneer; patron</td>
</tr>
<tr>
<td align="left" valign="top">Dissertant</td>
<td align="left" valign="top">dissertant</td>
</tr>
<tr>
<td align="left" valign="top">Promotor</td>
<td align="left" valign="top">praeses; thesis advisor</td>
</tr>
<tr>
<td align="left" valign="top">Censorship</td>
<td align="left" valign="top">approbation; censor Engraving/Illustration engraver; illustrator; woodcutter; artist</td>
</tr>
<tr>
<td align="left" valign="top">Dedication</td>
<td align="left" valign="top">dedicatee; honoree; depicted; addressee</td>
</tr>
<tr>
<td align="left" valign="top">Unknown</td>
<td align="left" valign="top">other; responsible party; collection from; missing or unspecified role</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Per-layer Network Analysis</title>
<table-wrap id="T7">
<label>Table 7</label>
<caption>
<p>The table reports per-layer counts of nodes, intra-layer edges, and density.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Layer</bold></td>
<td align="left" valign="top"><bold>Nodes</bold></td>
<td align="left" valign="top"><bold>Intra-layer edges</bold></td>
<td align="left" valign="top"><bold>Density</bold></td>
</tr>
<tr>
<td align="left" valign="top">Authorship</td>
<td align="left" valign="top">467</td>
<td align="left" valign="top">1,103</td>
<td align="left" valign="top">0.0101</td>
</tr>
<tr>
<td align="left" valign="top">Censorship</td>
<td align="left" valign="top">172</td>
<td align="left" valign="top">718</td>
<td align="left" valign="top">0.0488</td>
</tr>
<tr>
<td align="left" valign="top">Dedication</td>
<td align="left" valign="top">376</td>
<td align="left" valign="top">1,120</td>
<td align="left" valign="top">0.0159</td>
</tr>
<tr>
<td align="left" valign="top">Dissertant</td>
<td align="left" valign="top">42</td>
<td align="left" valign="top">50</td>
<td align="left" valign="top">0.0581</td>
</tr>
<tr>
<td align="left" valign="top">Engraving/Illustration</td>
<td align="left" valign="top">71</td>
<td align="left" valign="top">68</td>
<td align="left" valign="top">0.0274</td>
</tr>
<tr>
<td align="left" valign="top">Paratextual Attributions</td>
<td align="left" valign="top">419</td>
<td align="left" valign="top">1,574</td>
<td align="left" valign="top">0.0180</td>
</tr>
<tr>
<td align="left" valign="top">Promotor</td>
<td align="left" valign="top">4</td>
<td align="left" valign="top">2</td>
<td align="left" valign="top">0.3333</td>
</tr>
<tr>
<td align="left" valign="top">Publication/Distribution</td>
<td align="left" valign="top">948</td>
<td align="left" valign="top">1,200</td>
<td align="left" valign="top">0.0027</td>
</tr>
<tr>
<td align="left" valign="top">Translation</td>
<td align="left" valign="top">35</td>
<td align="left" valign="top">31</td>
<td align="left" valign="top">0.0521</td>
</tr>
<tr>
<td align="left" valign="top">Unknown</td>
<td align="left" valign="top">277</td>
<td align="left" valign="top">670</td>
<td align="left" valign="top">0.0175</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T8">
<label>Table 8</label>
<caption>
<p>The table reports global multilayer network statistics, including node counts, layer combinations, and the number of intra- and inter-layer edges.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Metric</bold></td>
<td align="left" valign="top"><bold>Value</bold></td>
</tr>
<tr>
<td align="left" valign="top">Total intra-layer edges</td>
<td align="left" valign="top">6,536</td>
</tr>
<tr>
<td align="left" valign="top">Total inter-layer edges</td>
<td align="left" valign="top">20,487</td>
</tr>
<tr>
<td align="left" valign="top">Total edges overall</td>
<td align="left" valign="top">27,023</td>
</tr>
<tr>
<td align="left" valign="top">Total unique nodes across all layers</td>
<td align="left" valign="top">4,119</td>
</tr>
<tr>
<td align="left" valign="top">Total node-layer combinations</td>
<td align="left" valign="top">4,905</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec>
<title>Inter-layer Network Analysis</title>
<table-wrap id="T9">
<label>Table 9</label>
<caption>
<p>The table reports descriptive statistics of degree of multiplexity in the <italic>Collectio academica antiqua</italic> network.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Min</bold></td>
<td align="left" valign="top"><bold>Q1</bold></td>
<td align="left" valign="top"><bold>Median</bold></td>
<td align="left" valign="top"><bold>Mean</bold></td>
<td align="left" valign="top"><bold>Q3</bold></td>
<td align="left" valign="top"><bold>Max</bold></td>
<td align="left" valign="top"><bold>Single role</bold></td>
<td align="left" valign="top"><bold>Multiple roles</bold></td>
</tr>
<tr>
<td align="left" valign="top">1</td>
<td align="left" valign="top">1.0</td>
<td align="left" valign="top">1.0</td>
<td align="left" valign="top">1.2</td>
<td align="left" valign="top">1.0</td>
<td align="left" valign="top">7</td>
<td align="left" valign="top">3,595 (87.3%)</td>
<td align="left" valign="top">524 (12.7%)</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T10">
<label>Table 10</label>
<caption>
<p>The table reports the shares of multiplex nodes present in each layer.</p>
</caption>
<table>
<tbody>
<tr>
<td align="left" valign="top"><bold>Layer</bold></td>
<td align="left" valign="top"><bold>Share</bold></td>
</tr>
<tr>
<td align="left" valign="top">Authorship</td>
<td align="left" valign="top">0.618</td>
</tr>
<tr>
<td align="left" valign="top">Unknown</td>
<td align="left" valign="top">0.529</td>
</tr>
<tr>
<td align="left" valign="top">Paratextual Attributions</td>
<td align="left" valign="top">0.471</td>
</tr>
<tr>
<td align="left" valign="top">Publication/Distribution</td>
<td align="left" valign="top">0.328</td>
</tr>
<tr>
<td align="left" valign="top">Dedication</td>
<td align="left" valign="top">0.313</td>
</tr>
<tr>
<td align="left" valign="top">Censorship</td>
<td align="left" valign="top">0.095</td>
</tr>
<tr>
<td align="left" valign="top">Translation</td>
<td align="left" valign="top">0.088</td>
</tr>
<tr>
<td align="left" valign="top">Promotor</td>
<td align="left" valign="top">0.034</td>
</tr>
<tr>
<td align="left" valign="top">Engraving/Illustration</td>
<td align="left" valign="top">0.017</td>
</tr>
<tr>
<td align="left" valign="top">Dissertant</td>
<td align="left" valign="top">0.006</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec>
<title>Data Availability</title>
<p>All scripts and data supporting this article are available through the <italic>Journal of Cultural Analytics</italic> Dataverse repository.</p>
</sec>
<sec>
<title>Acknowledgements</title>
<p>The author acknowledges the support of the Global PhD Partnerships between KU Leuven and UCLouvain. The author thanks the two anonymous referees and the editor, as well as Margherita Fantoli, Violet Soen, Dirk van Miert, and Mikko Tolonen for their constructive feedback and comments.</p>
</sec>
<sec>
<title>Competing Interests</title>
<p>The author has no competing interests to declare.</p>
</sec>
<ref-list>
<title>Works Cited</title>
<ref id="B1"><mixed-citation publication-type="journal"><string-name><surname>Angrist</surname>, <given-names>Joshua D</given-names></string-name>, and <string-name><given-names>Jorn-Steffen</given-names> <surname>Pischke</surname></string-name>. <article-title>&#8220;The Credibility Revolution in Empirical Economics: How Better Research Design is Taking the Con out of Econometrics.&#8221;</article-title> <source>The Journal of Economic Perspectives</source>, vol. <volume>24</volume>, no. <issue>2</issue>, <year>2010</year>, pp. <fpage>3</fpage>&#8211;<lpage>30</lpage>, DOI: <pub-id pub-id-type="doi">10.1257/jep.24.2.3</pub-id>.</mixed-citation></ref>
<ref id="B2"><mixed-citation publication-type="journal"><string-name><surname>Arora</surname>, <given-names>Abhishek</given-names></string-name>, <string-name><given-names>Emily</given-names> <surname>Silcock</surname></string-name>, <string-name><given-names>Leander</given-names> <surname>Heldring</surname></string-name>, and <string-name><given-names>Melissa</given-names> <surname>Dell</surname></string-name>. <article-title>&#8220;Contrastive Entity Coreference and Disambiguation for Historical Texts.&#8221;</article-title> <year>2024</year>, arXiv:2406.15576.</mixed-citation></ref>
<ref id="B3"><mixed-citation publication-type="book"><string-name><surname>Avram</surname>, <given-names>Henriette</given-names></string-name>. <chapter-title>&#8220;MARC: Its History and Implications.&#8221;</chapter-title> <publisher-name>Superintendent of Documents, U.S. Government Printing Office</publisher-name>, <publisher-loc>Washington, D.C.</publisher-loc> 20402 (Stock number LC-1.2:M18/16), <year>1975</year>.</mixed-citation></ref>
<ref id="B4"><mixed-citation publication-type="journal"><string-name><surname>Bennett</surname>, <given-names>James S.</given-names></string-name>, <string-name><given-names>Erin</given-names> <surname>Mutch</surname></string-name>, <string-name><given-names>Andrew</given-names> <surname>Tollefson</surname></string-name>, <string-name><given-names>Ed</given-names> <surname>Chalstrey</surname></string-name>, <string-name><given-names>Majid</given-names> <surname>Benam</surname></string-name>, <string-name><given-names>Enrico</given-names> <surname>Cioni</surname></string-name>, <string-name><given-names>Jenny</given-names> <surname>Reddish</surname></string-name>, et al. <article-title>&#8220;Cliopatria: A Geospatial Database of World-Wide Political Entities from 3400 BCE to 2024 CE.&#8221;</article-title> <source>Scientific Data</source>, vol. <volume>12</volume>, no. <issue>247</issue>, <year>2025</year>, DOI: <pub-id pub-id-type="doi">10.1038/s41597-025-04516-9</pub-id>.</mixed-citation></ref>
<ref id="B5"><mixed-citation publication-type="journal"><string-name><surname>Blei</surname>, <given-names>David M.</given-names></string-name>, <string-name><given-names>Andrew Y.</given-names> <surname>Ng</surname></string-name>, and <string-name><given-names>Michael I.</given-names> <surname>Jordan</surname></string-name>. <article-title>&#8220;Latent Dirichlet Allocation.&#8221;</article-title> <source>Journal of Machine Learning Research</source>, vol. <volume>3</volume>, <year>2003</year>, pp. <fpage>993</fpage>&#8211;<lpage>1022</lpage>, DOI: <pub-id pub-id-type="doi">10.5555/944919.944937</pub-id>.</mixed-citation></ref>
<ref id="B6"><mixed-citation publication-type="thesis"><string-name><surname>Cammaerts</surname>, <given-names>Dieter</given-names></string-name>. <chapter-title>&#8220;Manuale Lovaniense: Een sociaaleconomische en typografische studie van het gedrukte handboek aan de vroegmoderne Leuvense universiteit (1474&#8211;1650).&#8221;</chapter-title> <year>2024</year>. <publisher-name>KU Leuven</publisher-name>, PhD dissertation.</mixed-citation></ref>
<ref id="B7"><mixed-citation publication-type="book"><string-name><surname>Cantoni</surname>, <given-names>Davide</given-names></string-name>, and <string-name><given-names>Noam</given-names> <surname>Yuchtman</surname></string-name>. <chapter-title>&#8220;Historical Natural Experiments: Bridging Economics and Economic History.&#8221;</chapter-title> <source>The Handbook of Historical Economics</source>, <publisher-name>Elsevier</publisher-name>, <year>2021</year>, pp. <fpage>213</fpage>&#8211;<lpage>41</lpage>.</mixed-citation></ref>
<ref id="B8"><mixed-citation publication-type="journal"><string-name><surname>Cervellati</surname>, <given-names>Matteo</given-names></string-name>, <string-name><given-names>Sara</given-names> <surname>Lazzaroni</surname></string-name>, <string-name><given-names>Gianni</given-names> <surname>Marciante</surname></string-name>, and <string-name><given-names>Paolo</given-names> <surname>Masella</surname></string-name>. <article-title>&#8220;The Rise of the Knowledge Economy: Republic of Letters and Communication Infrastructures in Early Modern England.&#8221;</article-title> Working paper, <year>2025</year>.</mixed-citation></ref>
<ref id="B9"><mixed-citation publication-type="journal"><string-name><surname>Chaney</surname>, <given-names>Eric</given-names></string-name>. <article-title>&#8220;Modern Library Holdings and Historic City Growth.&#8221;</article-title> Working paper, <year>2024</year>.</mixed-citation></ref>
<ref id="B10"><mixed-citation publication-type="journal"><string-name><surname>Chaney</surname>, <given-names>Eric</given-names></string-name>. <article-title>&#8220;Religion and the Rise and Fall of Islamic Science.&#8221;</article-title> Working paper, <year>2023</year>.</mixed-citation></ref>
<ref id="B11"><mixed-citation publication-type="journal"><string-name><surname>Chiopris</surname>, <given-names>Caterina</given-names></string-name>. <article-title>&#8220;The Diffusion of Ideas.&#8221;</article-title> Working paper, <year>2024</year>.</mixed-citation></ref>
<ref id="B12"><mixed-citation publication-type="journal"><string-name><surname>Curtis</surname>, <given-names>Matthew</given-names></string-name>, and <string-name><given-names>David</given-names> <surname>de la Croix</surname></string-name>. <article-title>&#8220;Seeds of Knowledge: Premodern Scholarship, Academic Fields, and European Growth.&#8221;</article-title> <year>2025</year>, SSRN Working Paper, DOI: <pub-id pub-id-type="doi">10.2139/ssrn.5078307</pub-id>.</mixed-citation></ref>
<ref id="B13"><mixed-citation publication-type="journal"><string-name><surname>de la Croix</surname>, <given-names>David</given-names></string-name>, and <string-name><given-names>Rossana</given-names> <surname>Scebba</surname></string-name>. <article-title>&#8220;Geolocalization and the Birth-to-Death Distance.&#8221;</article-title> <source>Repertorium Eruditorum Totius Europae</source>, vol. <volume>14</volume>, <year>2024</year>, pp. <fpage>37</fpage>&#8211;<lpage>42</lpage>, DOI: <pub-id pub-id-type="doi">10.14428/rete.v14i0/Locations</pub-id>.</mixed-citation></ref>
<ref id="B14"><mixed-citation publication-type="journal"><string-name><surname>de Pleijt</surname>, <given-names>Alexandra</given-names></string-name>, and <string-name><given-names>Julius</given-names> <surname>Koschnick</surname></string-name>. <article-title>&#8220;Alienated Intellectuals? Exploring the Political Consequences of the Educational Revolution in Early Modern England.&#8221;</article-title> Working paper, <year>2025</year>.</mixed-citation></ref>
<ref id="B15"><mixed-citation publication-type="book"><string-name><surname>De Ridder</surname>, <given-names>Bram</given-names></string-name>, <string-name><given-names>Violet</given-names> <surname>Soen</surname></string-name>, <string-name><given-names>Werner</given-names> <surname>Thomas</surname></string-name>, and <string-name><given-names>Sophie</given-names> <surname>Verreyken</surname></string-name>. <source>Transregional Territories: Crossing Borders in the Early Modern Low Countries and Beyond</source>. <publisher-name>Brepols Publishers</publisher-name>, <year>2020</year>.</mixed-citation></ref>
<ref id="B16"><mixed-citation publication-type="journal"><string-name><surname>Ehrmann</surname>, <given-names>Maud</given-names></string-name>, <string-name><given-names>Ahmed</given-names> <surname>Hamdi</surname></string-name>, <string-name><given-names>Elvys Linhares</given-names> <surname>Pontes</surname></string-name>, <string-name><given-names>Matteo</given-names> <surname>Romanello</surname></string-name>, and <string-name><given-names>Antoine</given-names> <surname>Doucet</surname></string-name>. <article-title>&#8220;Named Entity Recognition and Classification in Historical Documents: A Survey.&#8221;</article-title> <source>ACM Computing Surveys</source>, vol. <volume>56</volume>, no. <issue>2</issue>, <year>2023</year>, pp. <fpage>1</fpage>&#8211;<lpage>47</lpage>, DOI: <pub-id pub-id-type="doi">10.1145/3604931</pub-id>.</mixed-citation></ref>
<ref id="B17"><mixed-citation publication-type="journal"><string-name><surname>Fantoli</surname>, <given-names>Margherita</given-names></string-name>, <string-name><given-names>Jukka</given-names> <surname>Suomela</surname></string-name>, <string-name><given-names>Toon</given-names> <surname>Van Hal</surname></string-name>, <string-name><given-names>Mark</given-names> <surname>Depauw</surname></string-name>, <string-name><given-names>Lari</given-names> <surname>Virkki</surname></string-name>, and <string-name><given-names>Mikko</given-names> <surname>Tolonen</surname></string-name>. <article-title>&#8220;Quantifying the Presence of Ancient Greek and Latin Classics in Early Modern Britain.&#8221;</article-title> <source>Journal of Cultural Analytics</source>, vol. <volume>10</volume>, no. <issue>1</issue>, <year>2025</year>, DOI: <pub-id pub-id-type="doi">10.22148/001c.128008</pub-id>.</mixed-citation></ref>
<ref id="B18"><mixed-citation publication-type="webpage"><string-name><surname>Gavin</surname>, <given-names>Michael</given-names></string-name>. <article-title>&#8220;Historical Text Networks: The Sociology of Early English Criticism.&#8221;</article-title> <source>Eighteenth-Century Studies</source>, vol. <volume>50</volume>, no. <issue>1</issue>, <year>2016</year>, pp. <fpage>53</fpage>&#8211;<lpage>80</lpage>, <uri>https://www.jstor.org/stable/43956564</uri>.</mixed-citation></ref>
<ref id="B19"><mixed-citation publication-type="journal"><string-name><surname>Gay</surname>, <given-names>Victor</given-names></string-name>. <article-title>&#8220;Mapping the Third Republic: A Geographic Information System of France (1870&#8211;1940).&#8221;</article-title> <source>Historical Methods: A Journal of Quantitative and Interdisciplinary History</source>, vol. <volume>54</volume>, no. <issue>4</issue>, <year>2021</year>, pp. <fpage>189</fpage>&#8211;<lpage>207</lpage>, DOI: <pub-id pub-id-type="doi">10.1080/01615440.2021.1937421</pub-id>.</mixed-citation></ref>
<ref id="B20"><mixed-citation publication-type="journal"><string-name><surname>Gittel</surname>, <given-names>Benjamin</given-names></string-name>. <article-title>&#8220;An Institutional Perspective on Genres: Generic Subtitles in German Literature from 1500&#8211;2020.&#8221;</article-title> <source>Journal of Cultural Analytics</source>, vol. <volume>6</volume>, no. <issue>1</issue>, <year>2021</year>, pp. <fpage>1</fpage>&#8211;<lpage>38</lpage>, DOI: <pub-id pub-id-type="doi">10.22148/001c.22086</pub-id>.</mixed-citation></ref>
<ref id="B21"><mixed-citation publication-type="journal"><string-name><surname>Gregory</surname>, <given-names>Ian</given-names></string-name>. <article-title>&#8220;Challenges and Opportunities for Digital History.&#8221;</article-title> <source>Frontiers in Digital Humanities</source>, vol. <volume>1</volume>, <year>2014</year>, pp. <fpage>1</fpage>&#8211;<lpage>2</lpage>, DOI: <pub-id pub-id-type="doi">10.3389/fdigh.2014.00001</pub-id>.</mixed-citation></ref>
<ref id="B22"><mixed-citation publication-type="book"><string-name><surname>Greteman</surname>, <given-names>Blaine</given-names></string-name>. <source>Networking Print in Shakespeare&#8217;s England: Influence, Agency, and Revolutionary Change</source>. <publisher-name>Stanford UP</publisher-name>, <year>2021</year>.</mixed-citation></ref>
<ref id="B23"><mixed-citation publication-type="journal"><string-name><surname>He&#223;br&#252;ggen-Walter</surname>, <given-names>Stefan</given-names></string-name>. <article-title>&#8220;Interdisciplinarity in the Seventeenth Century? A Co-occurrence Analysis of Early Modern German Dissertation Titles.&#8221;</article-title> <source>Synthese</source>, vol. <volume>203</volume>, no. <issue>2</issue>, <year>2024</year> , p. <fpage>67</fpage>, DOI: <pub-id pub-id-type="doi">10.1007/s11229-024-04494-2</pub-id>.</mixed-citation></ref>
<ref id="B24"><mixed-citation publication-type="journal"><string-name><surname>Hill</surname>, <given-names>Mark J.</given-names></string-name>, <string-name><given-names>Ville</given-names> <surname>Vaara</surname></string-name>, and <string-name><given-names>Mikko</given-names> <surname>Tolonen</surname></string-name>. <article-title>&#8220;Communication and Idea Transmission Across Historical Communities: A Quantitative Analysis of Early Modern Nonconformist Networks.&#8221;</article-title> <source>Huntington Library Quarterly</source>, vol. <volume>86</volume>, no. <issue>2</issue>, <year>2023</year>, pp. <fpage>377</fpage>&#8211;<lpage>407</lpage>, DOI: <pub-id pub-id-type="doi">10.1353/hlq.2023.a936422</pub-id>.</mixed-citation></ref>
<ref id="B25"><mixed-citation publication-type="journal"><string-name><surname>Hill</surname>, <given-names>Mark J.</given-names></string-name>, <string-name><given-names>Ville</given-names> <surname>Vaara</surname></string-name>, <string-name><given-names>Tanja</given-names> <surname>S&#228;ily</surname></string-name>, <string-name><given-names>Leo</given-names> <surname>Lahti</surname></string-name>, and <string-name><given-names>Mikko</given-names> <surname>Tolonen</surname></string-name>. <article-title>&#8220;Reconstructing Intellectual Networks: From the ESTC&#8217;s Bibliographic Metadata to Historical Material.&#8221;</article-title> <source>Proceedings of the Digital Humanities in the Nordic Countries</source>, <year>2019</year>.</mixed-citation></ref>
<ref id="B26"><mixed-citation publication-type="book"><string-name><surname>Hotson</surname>, <given-names>Howard</given-names></string-name>, and <string-name><given-names>Thomas</given-names> <surname>Wallnig</surname></string-name>, editors. <source>Reassembling the Republic of Letters in the Digital Age</source>. <publisher-name>G&#246;ttingen UP</publisher-name>, <year>2019</year>.</mixed-citation></ref>
<ref id="B27"><mixed-citation publication-type="journal"><string-name><surname>Kivel&#228; Mikko</surname>, <given-names>Alex Arenas</given-names></string-name>, <string-name><given-names>Marc</given-names> <surname>Barthelemy, James P. Gleeson, Yamir Moreno</surname></string-name>, and <string-name><given-names>Mason A.</given-names> <surname>Porter</surname></string-name>. <article-title>&#8220;Multilayer Networks.&#8221;</article-title> <source>Journal of Complex Networks</source>, vol. <volume>2</volume>, no. <issue>3</issue>, <year>2014</year>, pp. <fpage>203</fpage>&#8211;<lpage>71</lpage>, DOI: <pub-id pub-id-type="doi">10.1093/comnet/cnu016</pub-id>.</mixed-citation></ref>
<ref id="B28"><mixed-citation publication-type="journal"><string-name><surname>Koschnick</surname>, <given-names>Julius</given-names></string-name>. <article-title>&#8220;Teacher-Directed Scientific Change: The Case of the English Scientific Revolution.&#8221;</article-title> EHES Working Paper Series, no. <issue>274</issue>, <year>2025</year>.</mixed-citation></ref>
<ref id="B29"><mixed-citation publication-type="journal"><string-name><surname>Ladd</surname>, <given-names>John R</given-names></string-name>. <article-title>&#8220;Imaginative Networks: Tracing Connections Among Early Modern Book Dedications.&#8221;</article-title> <source>Journal of Cultural Analytics</source>, vol. <volume>6</volume>, no. <issue>1</issue>, <year>2021</year>, DOI: <pub-id pub-id-type="doi">10.22148/001c.21993</pub-id>.</mixed-citation></ref>
<ref id="B30"><mixed-citation publication-type="journal"><string-name><surname>Lahti</surname>, <given-names>Leo</given-names></string-name>, <string-name><given-names>Jani</given-names> <surname>Marjanen</surname></string-name>, <string-name><given-names>Hege</given-names> <surname>Roivainen</surname></string-name>, and <string-name><given-names>Mikko</given-names> <surname>Tolonen</surname></string-name>. <article-title>&#8220;Bibliographic Data Science and the History of the Book (c. 1500&#8211;1800).&#8221;</article-title> <source>Cataloging and Classification Quarterly</source>, vol. <volume>57</volume>, no. <issue>1</issue>, <year>2019</year>, pp. <fpage>5</fpage>&#8211;<lpage>23</lpage>, DOI: <pub-id pub-id-type="doi">10.1080/01639374.2018.1543747</pub-id>.</mixed-citation></ref>
<ref id="B31"><mixed-citation publication-type="journal"><string-name><surname>Lahti</surname>, <given-names>Leo</given-names></string-name>, <string-name><given-names>Niko</given-names> <surname>Ilom&#228;ki</surname></string-name>, and <string-name><given-names>Mikko</given-names> <surname>Tolonen</surname></string-name>. <article-title>&#8220;A Quantitative Study of History in the English Short-Title Catalogue (ESTC), 1470&#8211;1800.&#8221;</article-title> <source>LIBER Quarterly</source>, vol. <volume>25</volume>, no. <issue>2</issue>, <year>2015</year>, pp. <fpage>87</fpage>&#8211;<lpage>116</lpage>, DOI: <pub-id pub-id-type="doi">10.18352/lq.10112</pub-id>.</mixed-citation></ref>
<ref id="B32"><mixed-citation publication-type="book"><string-name><surname>Lahti</surname>, <given-names>Leo</given-names></string-name>, <string-name><given-names>Ville</given-names> <surname>Vaara</surname></string-name>, <string-name><given-names>Jani</given-names> <surname>Marjanen</surname></string-name>, and <string-name><given-names>Mikko</given-names> <surname>Tolonen</surname></string-name>. <chapter-title>&#8220;Best Practices in Bibliographic Data Science.&#8221;</chapter-title> <source>Proceedings of the Research Data and Humanities (RDHum) 2019 Conference: Data, Methods and Tools</source>. Edited by <string-name><given-names>Juhani Harri</given-names> <surname>Jantunen</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Brunni</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Kunnas</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Palviainen</surname></string-name>, and <string-name><given-names>K.</given-names> <surname>V&#228;sti</surname></string-name>. <publisher-name>Studia Humaniora Ouluensia, U of Oulu</publisher-name>, <year>2019</year>, pp. <fpage>57</fpage>&#8211;<lpage>65</lpage>.</mixed-citation></ref>
<ref id="B33"><mixed-citation publication-type="journal"><string-name><surname>Lemercier</surname>, <given-names>Claire</given-names></string-name>. <article-title>&#8220;A History Without the Social Sciences?&#8221;</article-title> Translated by <string-name><given-names>Angela</given-names> <surname>Krieger</surname></string-name>. <source>Annales: Histoire, Sciences Sociales</source>, vol. <volume>70</volume>, no. <issue>2</issue>, <year>2015</year>, pp. <fpage>271</fpage>&#8211;<lpage>83</lpage>, DOI: <pub-id pub-id-type="doi">10.1017/S2398568200001163</pub-id>.</mixed-citation></ref>
<ref id="B34"><mixed-citation publication-type="book"><string-name><surname>Lemercier</surname>, <given-names>Claire</given-names></string-name>, and <string-name><given-names>Claire</given-names> <surname>Zalc</surname></string-name>. <source>Quantitative Methods in the Humanities: An Introduction</source>. Translated by <string-name><given-names>Arthur</given-names> <surname>Goldhammer</surname></string-name>. <publisher-name>U of Virginia P</publisher-name>, <year>2019</year>.</mixed-citation></ref>
<ref id="B35"><mixed-citation publication-type="journal"><string-name><surname>Nurmi</surname>, <given-names>Tarmo</given-names></string-name>, <string-name><given-names>Arash</given-names> <surname>Badie-Modiri</surname></string-name>, <string-name><given-names>Corinna</given-names> <surname>Coupette</surname></string-name>, and <string-name><given-names>Mikko</given-names> <surname>Kivel&#228;</surname></string-name>. <article-title>&#8220;pymnet: A Python Library for Multilayer Networks.&#8221;</article-title> <source>Journal of Open Source Software</source>, vol. <volume>9</volume>, no. <issue>99</issue>, <year>2024</year> , p. <fpage>6930</fpage>, DOI: <pub-id pub-id-type="doi">10.21105/joss.06930</pub-id>.</mixed-citation></ref>
<ref id="B36"><mixed-citation publication-type="book"><string-name><surname>Padilla</surname>, <given-names>Thomas</given-names></string-name>. <chapter-title>&#8220;Foreword.&#8221;</chapter-title> <source>Library Catalogues as Data: Research, Practice and Usage</source>. Edited by <string-name><given-names>Paul</given-names> <surname>Gooding</surname></string-name>, <string-name><given-names>Melissa</given-names> <surname>Terras</surname></string-name>, and <string-name><given-names>Sarah</given-names> <surname>Ames</surname></string-name>. <publisher-name>Facet Publishing</publisher-name>, <year>2025</year>, pp. <fpage>20</fpage>&#8211;<lpage>21</lpage>.</mixed-citation></ref>
<ref id="B37"><mixed-citation publication-type="book"><string-name><surname>Petras</surname>, <given-names>Vivien</given-names></string-name>, <string-name><given-names>Ray R.</given-names> <surname>Larson</surname></string-name>, and <string-name><given-names>Michael</given-names> <surname>Buckland</surname></string-name>. <chapter-title>&#8220;Time Period Directories: A Metadata Infrastructure for Placing Events in Temporal and Geographic Context.&#8221;</chapter-title> <source>Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL &#8217;06)</source>. <publisher-name>ACM</publisher-name>, <year>2006</year>, pp. <fpage>151</fpage>&#8211;<lpage>60</lpage>, DOI: <pub-id pub-id-type="doi">10.1145/1141753.1141782</pub-id>.</mixed-citation></ref>
<ref id="B38"><mixed-citation publication-type="journal"><string-name><surname>Reed</surname>, <given-names>Frank</given-names></string-name>. <article-title>&#8220;The Centennia Historical Atlas: Academic Research Edition.&#8221;</article-title> <source>Clockwork Mapping</source>, <year>2016</year>.</mixed-citation></ref>
<ref id="B39"><mixed-citation publication-type="journal"><string-name><surname>Roller</surname>, <given-names>Ramona</given-names></string-name>. <article-title>&#8220;Theory-Driven Statistics for the Digital Humanities: Presenting Pitfalls and a Practical Guide by the Example of the Reformation.&#8221;</article-title> <source>Journal of Cultural Analytics</source>, vol. <volume>7</volume>, no. <issue>4</issue>, <year>2023</year>, DOI: <pub-id pub-id-type="doi">10.22148/001c.57764</pub-id>.</mixed-citation></ref>
<ref id="B40"><mixed-citation publication-type="journal"><string-name><surname>Roller</surname>, <given-names>Ramona</given-names></string-name>. <article-title>&#8220;Tracing the Footsteps of Ideas: Time-Respecting Paths Reveal Key Reformers and Communication Pathways in Protestant Letter Networks.&#8221;</article-title> <source>SocArXiv</source>, <year>2023</year>, DOI: <pub-id pub-id-type="doi">10.31235/osf.io/cfqry</pub-id>.</mixed-citation></ref>
<ref id="B41"><mixed-citation publication-type="journal"><string-name><surname>Ryan</surname>, <given-names>Yann Ciar&#225;n</given-names></string-name>, and <string-name><given-names>Mikko</given-names> <surname>Tolonen</surname></string-name>. <article-title>&#8220;Networks of Influence in Scottish Enlightenment Publishing.&#8221;</article-title> <source>Connections</source>, vol. <volume>44</volume>, no. <issue>1</issue>, <year>2024</year>, DOI: <pub-id pub-id-type="doi">10.21307/connections-2019.034</pub-id>.</mixed-citation></ref>
<ref id="B42"><mixed-citation publication-type="journal"><string-name><surname>Ryan</surname>, <given-names>Yann Ciar&#225;n</given-names></string-name>, and <string-name><given-names>Mikko</given-names> <surname>Tolonen</surname></string-name>. <article-title>&#8220;The Evolution of Scottish Enlightenment Publishing.&#8221;</article-title> <source>The Historical Journal</source>, vol. <volume>67</volume>, no. <issue>2</issue>, <year>2024</year>, pp. <fpage>223</fpage>&#8211;<lpage>55</lpage>, DOI: <pub-id pub-id-type="doi">10.1017/S0018246X23000614</pub-id>.</mixed-citation></ref>
<ref id="B43"><mixed-citation publication-type="journal"><string-name><surname>Sanh</surname>, <given-names>Victor</given-names></string-name>, <string-name><given-names>Lysandre</given-names> <surname>Debut</surname></string-name>, <string-name><given-names>Julien</given-names> <surname>Chaumond</surname></string-name>, and <string-name><given-names>Thomas</given-names> <surname>Wolf</surname></string-name>. <article-title>&#8220;DistilBERT: A Distilled Version of BERT&#8212;Smaller, Faster, Cheaper and Lighter.&#8221;</article-title> <year>2019</year>, DOI: <pub-id pub-id-type="doi">10.48550/arXiv.1910.01108</pub-id>.</mixed-citation></ref>
<ref id="B44"><mixed-citation publication-type="journal"><string-name><surname>Scebba</surname>, <given-names>Rossana</given-names></string-name>, and <string-name><given-names>Margherita</given-names> <surname>Fantoli</surname></string-name>. <article-title>&#8220;Integrating Library and Prosopographical Data in the Publication Network of the Old University of Louvain.&#8221;</article-title> <source>Students, Scholars and Their Books at the University of Louvain (1425&#8211;1797)</source>, edited by <string-name><given-names>Violet</given-names> <surname>Soen</surname></string-name>, <string-name><given-names>Wouter</given-names> <surname>Druw&#233;</surname></string-name>, <string-name><given-names>Wim</given-names> <surname>Fran&#231;ois</surname></string-name>, <string-name><given-names>Ralph</given-names> <surname>Dekoninck</surname></string-name>, vol. <volume>17</volume>, Lectio Series Studium Lovaniense, 1, Brepols, forthcoming <year>2026</year>.</mixed-citation></ref>
<ref id="B45"><mixed-citation publication-type="journal"><string-name><surname>Schich</surname>, <given-names>Maximilian</given-names></string-name>, <string-name><given-names>Chaoming</given-names> <surname>Song</surname></string-name>, <string-name><given-names>Yong-Yeol</given-names> <surname>Ahn</surname></string-name>, <string-name><given-names>Alexander</given-names> <surname>Mirsky</surname></string-name>, <string-name><given-names>Mauro</given-names> <surname>Martino</surname></string-name>, <string-name><given-names>Albert-L&#225;szl&#243;</given-names> <surname>Barab&#225;si</surname></string-name>, and <string-name><given-names>Dirk</given-names> <surname>Helbing</surname></string-name>. <article-title>&#8220;A Network Framework of Cultural History.&#8221;</article-title> <source>Science</source>, vol. <volume>345</volume>, no. <issue>6196</issue>, <year>2014</year>, pp. <fpage>558</fpage>&#8211;<lpage>62</lpage>, DOI: <pub-id pub-id-type="doi">10.1126/science.1240064</pub-id>.</mixed-citation></ref>
<ref id="B46"><mixed-citation publication-type="webpage"><string-name><surname>Tennant</surname>, <given-names>Roy</given-names></string-name>. <article-title>&#8220;MARC Must Die.&#8221;</article-title> <source>Library Journal</source>, vol. <volume>127</volume>, no. <issue>17</issue>, <year>2002</year>, pp. <fpage>26</fpage>&#8211;<lpage>27</lpage>, <uri>https://www.libraryjournal.com/story/marc-must-die</uri>.</mixed-citation></ref>
<ref id="B47"><mixed-citation publication-type="journal"><string-name><surname>Tiihonen</surname>, <given-names>Iiro</given-names></string-name>, <string-name><given-names>Leo</given-names> <surname>Lahti</surname></string-name>, and <string-name><given-names>Mikko</given-names> <surname>Tolonen</surname></string-name>. <article-title>&#8220;Print Culture and Economic Constraints: A Quantitative Analysis of Book Prices in Eighteenth-Century Britain.&#8221;</article-title> <source>Explorations in Economic History</source>, vol. <volume>94</volume>, <year>2024</year>, DOI: <pub-id pub-id-type="doi">10.1016/j.eeh.2024.101614</pub-id>.</mixed-citation></ref>
<ref id="B48"><mixed-citation publication-type="journal"><string-name><surname>Tolonen</surname>, <given-names>Mikko</given-names></string-name>, <string-name><given-names>Leo</given-names> <surname>Lahti</surname></string-name>, <string-name><given-names>Hege</given-names> <surname>Roivainen</surname></string-name>, and <string-name><given-names>Jani</given-names> <surname>Marjanen</surname></string-name>. <article-title>&#8220;A Quantitative Approach to Book-Printing in Sweden and Finland, 1640&#8211;1828.&#8221;</article-title> <source>Historical Methods: A Journal of Quantitative and Interdisciplinary History</source>, vol. <volume>52</volume>, no. <issue>1</issue>, <year>2019</year>, pp. <fpage>57</fpage>&#8211;<lpage>78</lpage>, DOI: <pub-id pub-id-type="doi">10.1080/01615440.2018.1526657</pub-id>.</mixed-citation></ref>
<ref id="B49"><mixed-citation publication-type="journal"><string-name><surname>Tolonen</surname>, <given-names>Mikko</given-names></string-name>, <string-name><given-names>Mark J.</given-names> <surname>Hill</surname></string-name>, <string-name><given-names>Ali Zeeshan</given-names> <surname>Ijaz</surname></string-name>, <string-name><given-names>Ville</given-names> <surname>Vaara</surname></string-name>, and <string-name><given-names>Leo</given-names> <surname>Lahti</surname></string-name>. <article-title>&#8220;Examining the Early Modern Canon: The English Short Title Catalogue and Large-Scale Patterns of Cultural Production.&#8221;</article-title> <source>Data Visualization in Enlightenment Literature and Culture</source>, <year>2021</year>, pp. <fpage>63</fpage>&#8211;<lpage>119</lpage>, DOI: <pub-id pub-id-type="doi">10.1007/978-3-030-54913-8_3</pub-id>.</mixed-citation></ref>
<ref id="B50"><mixed-citation publication-type="journal"><string-name><surname>Valleriani</surname>, <given-names>Matteo</given-names></string-name>, <string-name><given-names>Malte</given-names> <surname>Vogl</surname></string-name>, <string-name><given-names>Hassan</given-names> <surname>el-Hajj</surname></string-name>, and <string-name><given-names>Kim</given-names> <surname>Pham</surname></string-name>. <article-title>&#8220;The Network of Early Modern Printers and Its Impact on the Evolution of Scientific Knowledge: Automatic Detection of Awareness Relationships.&#8221;</article-title> <source>Histories</source>, vol. <volume>2</volume>, no. <issue>4</issue>, <year>2022</year>, pp. <fpage>466</fpage>&#8211;<lpage>503</lpage>, DOI: <pub-id pub-id-type="doi">10.3390/histories2040033</pub-id>.</mixed-citation></ref>
</ref-list>
</back>
</article>