<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.2 20190208//EN" "https://jats.nlm.nih.gov/publishing/1.2/JATS-journalpublishing1-mathml3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.2" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">1832</journal-id>
      <journal-title-group>
        <journal-title>Journal of Cultural Analytics</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2371-4549</issn>
      <publisher>
        <publisher-name>Center for Digital Humanities, Princeton University</publisher-name>
      </publisher>
      <self-uri xlink:href="https://culturalanalytics.org/">Website: Journal of Cultural Analytics</self-uri>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">17585</article-id>
      <article-id pub-id-type="doi">10.22148/001c.17585</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>Divergence and the Complexity of Difference in Text and Culture</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <name>
            <surname>Chang</surname>
            <given-names>Kent K.</given-names>
          </name>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>DeDeo</surname>
            <given-names>Simon</given-names>
          </name>
        </contrib>
      </contrib-group>
      <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2020-10-07">
        <day>7</day>
        <month>10</month>
        <year>2020</year>
      </pub-date>
      <pub-date publication-format="electronic" date-type="collection" iso-8601-date="2021-09-02">
        <year>2020</year>
      </pub-date>
      <volume>5</volume>
      <issue seq="3">2</issue>
      <issue-title>Articles in 2020</issue-title>
      <elocation-id>17585</elocation-id>
      <permissions>
        <license license-type="open-access">
          <ali:license_ref xmlns:ali="http://www.niso.org/schemas/ali/1.0/">
              http://creativecommons.org/licenses/by/4.0
            </ali:license_ref>
          <license-p>
              This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0">Creative Commons Attribution License (4.0)</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
            </license-p>
        </license>
      </permissions>
      <self-uri content-type="pdf" xlink:href="https://culturalanalytics.org/article/17585.pdf"/>
      <self-uri content-type="xml" xlink:href="https://culturalanalytics.org/article/17585.xml"/>
      <self-uri content-type="json" xlink:href="https://culturalanalytics.org/article/17585.json"/>
      <self-uri content-type="html" xlink:href="https://culturalanalytics.org/article/17585"/>
      <abstract>
        <p>Measuring how much two documents differ is a basic task in the quantitative analysis of text. Because difference is a complex, interpretive concept, researchers often operationalize difference as distance, a mathematical function that represents documents through a metaphor of physical space. Yet the constraints of that metaphor mean that distance can only capture some of the ways that documents can relate to each other. We show how a more general concept, divergence, can help solve this problem, alerting us to new ways in which documents can relate to each other. In contrast to distance, divergence can capture enclosure relationships, where two documents differ because the patterns found in one are a partial subset of those in the other, and the emergence of shortcuts, where two documents can be brought closer through mediation by a third. We provide an example of this difference measure, Kullback–Leibler Divergence, and apply it to two worked examples: the presentation of scientific arguments in Charles Darwin’s Origin of Species (1859) and the rhetorical structure of philosophical texts by Aristotle, David Hume, and Immanuel Kant. These examples illuminate the complex relationship between time and what we refer to as an archive’s “enclosure architecture”, and show how divergence can be used in the quantitative analysis of historical, literary, and cultural texts to reveal cognitive structures invisible to spatial metaphors.</p>
      </abstract>
      <kwd-group>
        <kwd>textual difference</kwd>
        <kwd>philosophy of language</kwd>
        <kwd>computational linguistics</kwd>
        <kwd>data</kwd>
        <kwd>information theory</kwd>
        <kwd>theory</kwd>
      </kwd-group>
    </article-meta>
  </front>
</article>
