<article xmlns:ns0="http://www.w3.org/1999/xlink" xmlns:ns1="http://www.niso.org/schemas/ali/1.0/" article-type="research-article" dtd-version="1.2" xml:lang="en">
  <front>
    <journal-meta>
      <journal-id journal-id-type="publisher-id">1832</journal-id>
      <journal-title-group>
        <journal-title>Journal of Cultural Analytics</journal-title>
      </journal-title-group>
      <issn pub-type="epub">2371-4549</issn>
      <publisher>
        <publisher-name>Center for Digital Humanities, Princeton University</publisher-name>
      </publisher>
      <self-uri ns0:href="https://culturalanalytics.org/">Website: Journal of Cultural Analytics</self-uri>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="publisher-id">37588</article-id>
      <article-id pub-id-type="doi">10.22148/001c.37588</article-id>
      <article-categories>
        <subj-group subj-group-type="heading">
          <subject>Article</subject>
        </subj-group>
      </article-categories>
      <title-group>
        <article-title>The Evolution of the Idiolect over the Lifetime: A Quantitative and Qualitative Study of French 19th Century Literature</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0003-4617-5992</contrib-id>
          <name>
            <surname>Seminck</surname>
            <given-names>Olga</given-names>
          </name>
          <xref ref-type="aff" rid="author-aff-1">
            <sup>1</sup>
          </xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0001-7062-0262</contrib-id>
          <name>
            <surname>Gambette</surname>
            <given-names>Philippe</given-names>
          </name>
          <xref ref-type="aff" rid="author-aff-2">
            <sup>2</sup>
          </xref>
        </contrib>
        <contrib contrib-type="author">
          <name>
            <surname>Legallois</surname>
            <given-names>Dominique</given-names>
          </name>
          <xref ref-type="aff" rid="author-aff-1">
            <sup>1</sup>
          </xref>
        </contrib>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid" authenticated="false">https://orcid.org/0000-0003-3669-4051</contrib-id>
          <name>
            <surname>Poibeau</surname>
            <given-names>Thierry</given-names>
          </name>
          <xref ref-type="aff" rid="author-aff-1">
            <sup>1</sup>
          </xref>
        </contrib>
      </contrib-group>
      <aff id="author-aff-1">
        <label>1</label>
        <institution-wrap>
          <institution content-type="edu">Laboratoire Langues, Textes, Traitements informatiques, Cognition UMR 8094</institution>
        </institution-wrap>
      </aff>
      <aff id="author-aff-2">
        <label>2</label>
        <institution-wrap>
          <institution content-type="edu">Laboratoire d’Informatique Gaspard-Monge UMR 8049</institution>
        </institution-wrap>
      </aff>
      <pub-date publication-format="electronic" date-type="pub" iso-8601-date="2022-09-01">
        <day>1</day>
        <month>9</month>
        <year>2022</year>
      </pub-date>
      <pub-date publication-format="electronic" date-type="collection" iso-8601-date="2022-09-01">
        <year>2022</year>
      </pub-date>
      <volume>7</volume>
      <issue seq="0">3</issue>
      <elocation-id>37588</elocation-id>
      <history>
        <date date-type="received" iso-8601-date="2022-02-02">
          <day>2</day>
          <month>2</month>
          <year>2022</year>
        </date>
        <date date-type="accepted" iso-8601-date="2022-04-26">
          <day>26</day>
          <month>4</month>
          <year>2022</year>
        </date>
      </history>
      <permissions>
        <license license-type="open-access">
          <ns1:license_ref>
              http://creativecommons.org/licenses/by/4.0
            </ns1:license_ref>
          <license-p>
              This is an open access article distributed under the terms of the <ext-link ext-link-type="uri" ns0:href="http://creativecommons.org/licenses/by/4.0">Creative Commons Attribution License (4.0)</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
            </license-p>
        </license>
      </permissions>
      <self-uri content-type="pdf" ns0:href="https://culturalanalytics.org/article/37588.pdf" />
      <self-uri content-type="xml" ns0:href="https://culturalanalytics.org/article/37588.xml" />
      <self-uri content-type="json" ns0:href="https://culturalanalytics.org/article/37588.json" />
      <self-uri content-type="html" ns0:href="https://culturalanalytics.org/article/37588" />
      <abstract>
        <p>The way in which authors express themselves is unique but changes over their lifetime. However, quantitative studies of this idiolectal evolution are rare. Using the Corpus for Idiolectal Research (CIDRE) that contains the dated works of 11 prolific 19th century French fiction writers, we propose new methods to identify, quantify and describe the grammatical-stylistic changes that take place using lexico-morphosyntactic patterns, also called motifs. To examine the strength of the chronological signal of change, we developed a method to calculate if a distance matrix of literary works contains a stronger chronological signal than expected by chance. Ten out of 11 corpora showed a higher than chance chronological signal, leading us to conclude that the evolution of the idiolect is in a mathematical sense monotonic, supporting the rectilinearity hypothesis previously put forward in the stylometric literature. The rectilinear property of the evolution of the idiolect found for most authors in CIDRE subsequently enabled us to propose a machine learning task: predicting the year in which a work was written. For the majority of the authors in our corpus, the accuracy and the amount of variance that is explained by the model were high and we discuss why the technique might fail for others. After applying a feature selection algorithm, we examined the most important features, i.e. the motifs that have the greatest influence on idiolectal evolution. We find that some of those features are stylistic and have been previously identified in qualitative literature studies. We report some remarkable stylistic constructions revealed by our algorithm to illustrate which kind of stylistic patterns can be extracted using our method.</p>
      </abstract>
      <kwd-group>
        <kwd>idiolect</kwd>
        <kwd>French literature</kwd>
        <kwd>authorship</kwd>
        <kwd>stylometry</kwd>
        <kwd>literature</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec>
      <title>1. Introduction</title>
      <p>Is it true that we do not speak at 20 as we do at 60? In this article we examine if an individual’s representation of a language — the idiolect — and the utterances that are its product, are fixed once and for all or not. Little research has been carried out to characterize and measure the evolution of the idiolect in an extensive manner. Our main goal was thus to develop methods in this direction that can also be applied to other longitudinal corpora. We evaluated our methods on a corpus containing the idiolects of 11 French fiction writers.</p>
      <p>There are several reasons why it is interesting to take into account interpersonal variation over time. First, the notion of idiolect is relevant for corpus linguistics. As <xref ref-type="bibr" rid="ref-137967">Heck</xref> stressed, idiolects are the primary objects of study in linguistics (in the end, we can only observe utterances that are the products of idiolects). However, most often corpora are considered as homogenous, and do not take into account the influence of individual authors on their content. It is somehow assumed that the large number of different authors in a resource will erase any individual differences. But taking these differences into account could help better understand to what extent some features are specific to a genre, a community or, on the contrary, to an individual author or speaker. In this article, we focus on the chronological evolution of the idiolect.</p>
      <p>Let us start by introducing some important terminology. The first term that needs to be clarified is idiolect. In <xref ref-type="bibr" rid="ref-137968">Bloch</xref>’s original definition, the idiolect represents: “The totality of the possible utterances of one speaker at one time in using a language to interact with one other speaker.” More recently, <xref ref-type="bibr" rid="ref-137969">Dittmar</xref> (111) proposed this definition: an idiolect is “the language of the individual, which because of the acquired habits and the stylistic features of the personality differs from that of other individuals and in different life phases shows, as a rule, different or differently weighted CM [communicative means]”.</p>
      <p>In addition to this, we believe that the definition of idiolect should take into account the fact that every utterance (written or oral) of an individual is part of a particular discursive practice — or, put differently, of a particular textual genre (informal conversation, tweet, philosophical essay, etc.). The idiolect should thus necessarily be considered in relation to a particular practice: it corresponds to the use by an individual of only part of the possible linguistic forms related to a discursive practice. <xref ref-type="bibr" rid="ref-137968">Bloch</xref> takes this into account when he states “that a given speaker may have different idiolects at successive stages of his career, and […] that he may have two or more different idiolects at the same time.”</p>
      <p>Another relevant notion is style. Style corresponds to linguistic forms that an observer considers as remarkable from an aesthetic point of view, in a particular discourse, compared to the discourse of others. Although the definitions of idiolect and style are intertwined, especially for those working with literary corpora, the main difference is that, for stylistic studies, a judgement has generally to be performed on the stylistic value of the linguistic phenomena under study. It should be noted that the notion of stylistic judgement is in itself highly subjective, and no clear criteria seem to be available to determine what has a stylistic value and what does not. Consequently, we focused on the evolution of idiolects, instead of styles, so as to avoid aesthetic judgements. We therefore use <xref ref-type="bibr" rid="ref-137968">Bloch</xref>’s definition of the idiolect that includes “the totality of possible utterances”, instead of <xref ref-type="bibr" rid="ref-137969">Dittmar</xref>’s that focuses on stylistic features.</p>
      <p>In this article, after the presentation of related work, we present two computational experiments to study the chronological evolution of the idiolect, followed by a qualitative analysis. We start by examining the rectilinearity hypothesis mentioned in the work of <xref ref-type="bibr" rid="ref-137970">Stamou</xref>, i.e. <italic>“the hypothesis that certain aspects of an author’s writing style evolve rectilinearly over the course of an author’s lifetime, hence with appropriate methods and stylistic markers, such changes ought to be detectable”</italic>. First, we evaluate the chronological signal in corpora including the dated works of French 19th century authors by so-called Robinsonian matrices. Second, we build linear regression models for each author studied to see whether it is possible to predict the year in which a particular novel was written by extrapolation from other works by the same author. Our regression models rely on special linguistic-stylistic patterns, called <italic>motifs</italic> <xref ref-type="bibr" rid="ref-137971">(Legallois et al.)</xref> and are able to identify the patterns that play the greatest role in the chronological evolution of the idiolect. We discuss the stylistic value of some of these motifs in a qualitative analysis presented in section 6. The article ends with a discussion and conclusion.</p>
    </sec>
    <sec>
      <title>2. Literature Review</title>
      <p>Our work is directly in line with the notion of stylochronometry, a special research “niche” that studies the diachronic evolution of style. The term was coined by <xref ref-type="bibr" rid="ref-137972">Forsyth</xref> and encompasses the characterization of style according to different time periods, as well as the attribution of tentative dates to literary works. <xref ref-type="bibr" rid="ref-137970">Stamou</xref> reviewed a large number of studies on this topic, discussing literary works by writers such as the poet W.B. Yeats <xref ref-type="bibr" rid="ref-137973">(Jaynes)</xref>, or the prose of Samuel Becket <xref ref-type="bibr" rid="ref-137974">(Opas)</xref>, to the lyrics of the Beatles <xref ref-type="bibr" rid="ref-137975">(Whissell)</xref>. The review draws some important conclusions that are still valid today. The first is that even though dating methods would eventually be most useful to date works with an uncertain date of creation — such as texts by Plato, Euripides and Shakespeare — these methods should be developed, tested and evaluated using gold standard corpora (where texts can be reasonably associated with a precise date of creation so as to get a solid performance evaluation), a criterion that was already underlined by <xref ref-type="bibr" rid="ref-137972">Forsyth</xref>. Despite this, there is a large literature on the topic of stylochronometry with experiments investigating only works with problematic dating (e.g. <xref ref-type="bibr" rid="ref-137976 ref-137977 ref-137978 ref-137979">Cox and Brandwood; Wishart and Leach; T. M. Robinson; Ledger</xref>; <xref ref-type="bibr" rid="ref-137980">Temple</xref> on Plato’s works; <xref ref-type="bibr" rid="ref-137981 ref-137982">Devine and Stephens; Cropp and Fick</xref>; <xref ref-type="bibr" rid="ref-137983">Smith and Kelly</xref> on Euripides, and <xref ref-type="bibr" rid="ref-137984 ref-137985">Brainerd; Derks</xref>; <xref ref-type="bibr" rid="ref-137986">Jackson</xref> on Shakespeare), making it difficult to evaluate the results. Experiments in which the methods used are carefully compared to reference corpora with known dates are rather rare (however, see <xref ref-type="bibr" rid="ref-137987">Can and Patton</xref>, on two modern Turkish writers). <xref ref-type="bibr" rid="ref-137988">Daelemans</xref> also underlines this evaluation problem and suggests using evaluation methods from the field of Natural Language Processing (NLP). In the same vein, <xref ref-type="bibr" rid="ref-137989">Craig</xref> states that “stylistic analysis needs finally to pass the same tests of rigor, repeatability, and impartiality as authorship analysis if it is to offer new knowledge”.</p>
      <p>Specifically for works on French, we came across studies using off-the-shelf methods developed for statistical textual analysis <xref ref-type="bibr" rid="ref-137990">(Pincemin)</xref>, for instance stylo R <xref ref-type="bibr" rid="ref-137991">(Eder et al.)</xref>, Lexico <xref ref-type="bibr" rid="ref-137992">(Lamalle et al.)</xref>, TXM <xref ref-type="bibr" rid="ref-137993">(Heiden et al.)</xref>, Le Trameur <xref ref-type="bibr" rid="ref-137994">(Fleury and Zimina)</xref>, and Hyperdeep <xref ref-type="bibr" rid="ref-137995">(Vanni et al.)</xref>. For example, <xref ref-type="bibr" rid="ref-137996">Guaresi et al.</xref> study the evolution of style on a corpus of the annals of the congress of the French communist party from 1936 to 2018 using correspondence analysis on the vocabulary. However, a drawback of off-the-shelf methods is that they provide mainly exploratory analyses or visualizations which leave considerable room for interpretation and cannot be used directly to focus on specific stylochronometry questions or hypotheses with a rigorous evaluation procedure. We will therefore not go into detail about this type of study but will instead discuss more focused studies that have, in our opinion, developed interesting and relevant approaches for the task.</p>
      <p><xref ref-type="bibr" rid="ref-137997">Mollin</xref> investigated Tony Blair’s idiolect. Her goal was to identify ‘’maximizer collocations’’ (collocations involving adverbs such as <italic>fully</italic>, <italic>entirely</italic> or <italic>absolutely</italic>) that were specific to Blair’s idiolect, when compared to the English language in general (comparing a three million word corpus of Tony Blair with the British National Corpus <xref ref-type="bibr" rid="ref-137998">(BNC XML Edition)</xref>). Collocations were selected using the three measures: relative frequency, Mutual Information <xref ref-type="bibr" rid="ref-137999">(Church et al.)</xref> and the log-likelihood measure <xref ref-type="bibr" rid="ref-138000">(Dunning)</xref>. A series of collocations that are typical of Blair’s idiolect was identified using these three measures.</p>
      <p><xref ref-type="bibr" rid="ref-137997">Mollin</xref>’s article nicely combines quantitative and qualitative analysis and presents a clear methodology. Unfortunately, we could not find an online accessible repository of the Tony Blair Corpus, but the methods are explained to an extent that the research should be replicable. <xref ref-type="bibr" rid="ref-137997">Mollin</xref>’s results suggest that the notion of idiolect is indeed a relevant linguistic concept, and that there are some linguistic patterns that are highly idiosyncratic for a speaker (even though her study only included one individual).</p>
      <p>A second researcher working on the notion of idiolect is <xref ref-type="bibr" rid="ref-138001">Barlow</xref>. He studied the idiolect of five White House Press Secretaries who held this function from 1 to 4 years. For each person the author collected a corpus of approximately 200K to 1200K tokens and compared the individual frequencies of the most frequent bigrams (lexical and Part of Speech) of each press secretary against the others. He showed that individual patterns are highly recognizable and that inter-speaker variability is much larger than intra-speaker variability. Moreover, he found that the inter-speaker differences were “core aspects of language and not peripheral idiosyncrasies”, meaning that they play a role in the use of function words and high frequency words, such as ‘by the’ and ‘we have’. He also found that the speech of an individual remained remarkably stable over time, but of course, one needs to keep in mind that the maximum period for a secretary in the corpus was only four years.</p>
      <p>Another study that concluded in favour of the staticness of the idiolect is <xref ref-type="bibr" rid="ref-138002">Meyerhoff and Walker</xref>. They tried to determine to what extent the grammar of individuals is morpho-syntactically similar to that of a community. They studied the absence of the verb ‘be’ in a community speaking an English-based creole compared to other members of this community who had joined an urban community speaking a more ‘standard’ English. Their conclusions are mixed, suggesting that, despite the possibility of the idiolect evolving, conservatism can also play an important role. However, it should be noted that this study only applied to one grammatical construction in a multilingual setting, so that the reported results may be hard to generalize.</p>
      <p><xref ref-type="bibr" rid="ref-138003">Evans</xref> wrote her PhD thesis on diachronic morpho-syntactic changes in the idiolect of Queen Elizabeth I from a sociolinguistic perspective by comparing her letters, speeches and translations (forming a corpus of 78K tokens) from before her ascension to those from the period after this event, which is often speculated to have had the greatest influence on Elizabeth’s language by other scholars. Interestingly, <xref ref-type="bibr" rid="ref-138003">Evans</xref> used a reference corpus — the Corpus of Early English Correspondence <xref ref-type="bibr" rid="ref-138004">(Raumolin-Brunberg and Nevalainen)</xref> — and previous studies on it to identify 9 morpho-syntactic features present in this corpus and the corpus of Queen Elizabeth. The goal was to see whether the two corpora evolved in the same way. The author found that the ascension to the throne only influenced two features (the increase in the use of the royal ‘we’ and the decrease in the use of periphrastic superlative adjectives), and that time in general and the long education of Queen Elizabeth had a constant influence on the development of her idiolect. <xref ref-type="bibr" rid="ref-138003">Evans</xref>’ study provides a good example of how corpus linguistics and qualitative analysis can jointly contribute to the study of the evolution of the idiolect.</p>
      <p>In the field of stylochronometry, the work of <xref ref-type="bibr" rid="ref-138005">Klaussner and Vogel</xref> of 2015 and 2018<xref ref-type="bibr" rid="ref-138006" /> and <xref ref-type="bibr" rid="ref-138007">Klaussner</xref> of 2017 should be mentioned. They developed regression models and evaluated them on two individual corpora of North American writers, a reference corpus and against a baseline. They used a machine learning task that aimed to predict the year of writing of a given work, using a relevant evaluation metric. Their methods play an important role in the second part of our quantitative study (Section 5) and will therefore be discussed in more detail below. However, we can already conclude that the methods proposed by <xref ref-type="bibr" rid="ref-138006">Klaussner and Vogel</xref> show how quantitative machine learning methods can be used to fuel qualitative research on stylistic changes.</p>
      <p>We have said that relevant large scale resources in this domain are scarce. We should however mention a large recent resource: the EMMA corpus <xref ref-type="bibr" rid="ref-138008">(Petré et al.)</xref>. It features 87 million words of prolific English 17th century writers. Various studies on the evolution of the idiolect were conducted using this resource, for example <xref ref-type="bibr" rid="ref-138009">Petré and Van de Velde</xref> — although using an earlier slightly smaller version of the corpus — investigated the role of individual language users and the language community in the semantic and morphosyntactic process of grammaticalization of a specific construction: ‘be going to INF’. They show how the rate of this construction was influenced before, during and after its conventionalization and observe differences between generations. Their method shows how the process of grammaticalization can be studied for individuals but also for a community at the same time. <xref ref-type="bibr" rid="ref-138010">Anthonissen and Petré</xref> also show how the use of larger corpora helps to study lifespan changes affecting syntactic constructions; they demonstrate by the example of the construction ‘be going to’ that individual writers in the EMMA corpus can adopt and continue to participate in grammatical innovations during adulthood.</p>
      <p>Contrary to most of the studies examined in this section, our approach takes into account all kinds of patterns (called motifs) and not only a handful of predefined and carefully selected sequences. With this more comprehensive approach, we hope to produce a more robust and reliable model of the evolution of the idiolect.</p>
    </sec>
    <sec>
      <title>3. Corpus</title>
      <p>For this study, we used the Corpus for Idiolectal Research (CIDRE) <xref ref-type="bibr" rid="ref-138011">(Seminck et al.)</xref>. This corpus features 11 French 19th and early 20th century prolific fiction writers, with a total of 37 million words. All the works have been carefully dated and the corpus includes only works of fiction, so that the problem of individuals having different idiolects related to their social situation does not interfere.</p>
      <p>To address the question of diachronic language evolution in general in opposition to idiolectal evolution, we assembled a ‘reference corpus’. This corpus contains 361 works of fiction by French authors from the same time period as the works in CIDRE, but no particular attention was paid to individual authors: they can be included in the resource if they wrote only one work. To assemble the reference corpus, we used the online tool GutenTag <xref ref-type="bibr" rid="ref-138012">(Brooke et al.)</xref> that enables one to download a subcorpus from Project Gutenberg. With a semi-automatic approach to run GutenTag several times, to filter sufficiently long books and to automatically date the first edition of each work using the catalogue of the Bibliothèque nationale de France, we were able to obtain a total of 361 works of fiction in French, dated with a reasonably good precision (mean error of 1.71 years per book for books present in the CIDRE corpus). The quality of this corpus can be considered sufficient for it to be used as a reference corpus that serves to account for language change in general. Note that there is a substantial overlap between the content of the CIDRE corpus and our reference corpus (146 novels out of the 361).</p>
    </sec>
    <sec>
      <title>4. Testing the rectilinearity hypothesis</title>
      <p>In the introduction, we mentioned the rectilinearity hypothesis which posits that some aspects of the idiolect evolve in a linear fashion over time and that this evolution should be detectable. Importantly, the hypothesis does not say that this evolution is relevant for all linguistic features and should affect the same features for each individual. The use of some linguistic features may remain stable, but some evolution should nevertheless be observed for some others, contradicting the conservatist hypothesis (which assumes no linguistic changes). Furthermore, we add the prediction that idiolects evolve constantly and do not return to earlier stages, even if some linguistic features might.</p>
      <sec>
        <title>4.1. Methods: Robinsonian Matrices</title>
        <p>Robinsonian matrices are distance matrices that have cells whose values increase when moving away from the diagonal. They were introduced in the context of archeological deposits, to study the chronological evolution of the style of pottery fragments <xref ref-type="bibr" rid="ref-138013">(W. S. Robinson)</xref>. More formally, applying this concept to texts, given a matrix δ expressing the distance between novels, we say that δ is Robinsonian if for any set of three distinct texts <italic>text<sub>i</sub></italic>, <italic>text<sub>j</sub></italic> and <italic>text<sub>k</sub></italic> such that date(<italic>text<sub>i</sub></italic>) &lt; date(<italic>text<sub>j</sub></italic>) &lt; date(<italic>text<sub>k</sub></italic>),</p>
        <p>max(δ(<italic>text<sub>i</sub></italic>, <italic>text<sub>j</sub></italic>), δ(<italic>text<sub>j</sub></italic>, <italic>text<sub>k</sub></italic>)) ≤ δ(<italic>text<sub>i</sub></italic>, <italic>text<sub>k</sub></italic>).</p>
        <table-wrap id="attachment-96265">
          <object-id pub-id-type="publisher-id">96265</object-id>
          <label>Table 1.</label>
          <caption>
            <title>An example of a Robinsonian distance matrix: both (<italic>text</italic><sub>1</sub>, <italic>text</italic><sub>2</sub>) and (<italic>text</italic><sub>2</sub>, <italic>text</italic><sub>3</sub>) are lower than (<italic>text</italic><sub>1</sub>, <italic>text</italic><sub>3</sub>).</title>
          </caption>
          <table>
            <tbody>
              <tr>
                <td />
                <td>
                  <italic>text</italic>
                  <sub>1</sub>
                </td>
                <td>
                  <italic>text</italic>
                  <sub>2</sub>
                </td>
                <td>
                  <italic>text</italic>
                  <sub>3</sub>
                </td>
              </tr>
              <tr>
                <td>
                  <italic>text</italic>
                  <sub>1</sub>
                </td>
                <td>0</td>
                <td>2</td>
                <td>4</td>
              </tr>
              <tr>
                <td>
                  <italic>text</italic>
                  <sub>2</sub>
                </td>
                <td />
                <td>0</td>
                <td>1</td>
              </tr>
              <tr>
                <td>
                  <italic>text</italic>
                  <sub>3</sub>
                </td>
                <td />
                <td />
                <td>0</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <fig id="attachment-96266">
          <object-id pub-id-type="publisher-id">96266</object-id>
          <label>Figure 1.</label>
          <caption>
            <title>Illustration of the idea of Robinsonian distances between texts.</title>
            <p>The colored dots (schema on the left) and arrows (schema on the right) represent distances. If the chronology of the text in the corpus is reflected in the measured distances, we expect that max(δ(<italic>text<sub>i</sub></italic>, <italic>text<sub>j</sub></italic>), δ(<italic>text<sub>j</sub></italic>, <italic>text<sub>k</sub></italic>)) ≤ δ(<italic>text<sub>i</sub></italic>, <italic>text<sub>k</sub></italic>) is true.</p>
          </caption>
          <graphic ns0:href="culturalanalytics_2022_7_3_37588_96266.png" />
        </fig>
        <p>To evaluate the rectilinearity hypothesis on a distance matrix reflecting changes in the idiolect, we measure to what extent the distance matrices corresponding to the texts of the CIDRE corpus are Robinsonian. In order to do this, we can compute the <italic>Robinsonian score</italic>, which we define as the percentage of triples of cells (δ(<italic>text<sub>i</sub></italic>, <italic>text<sub>j</sub></italic>), δ(<italic>text<sub>j</sub></italic>, <italic>text<sub>k</sub></italic>), δ(<italic>text<sub>i</sub></italic>, <italic>text<sub>k</sub></italic>)), for which the inequality above is verified.</p>
        <p>It is also possible to estimate a <italic>p</italic>-value, i.e. the probability that a Robinsonian score as high as the one being tested could be obtained by chance, by evaluating this score again after randomly changing the order of the texts. Getting a low <italic>p</italic>-value would support the rectilinearity hypothesis.</p>
        <p>Before evaluating the rectilinearity hypothesis on the CIDRE corpus, we first used a dated corpus of Maurice Leblanc as a testbed to compare different feature representations for the texts and different ways of measuring distance. We used tokens, characters, lemmas and so-called <italic>motifs</italic> <xref ref-type="bibr" rid="ref-137971">(Legallois et al.)</xref> as features. A motif is a sequence of lemmas and POS-tags. As function words tend to be the most relevant features of idiolectal signals <xref ref-type="bibr" rid="ref-138001">(Barlow)</xref>, grammatical information, i.e. function words and POS-tags, are crucial for the task. However, the tagset of our part-of-speech tagger is not fine-grained enough, losing important information for some categories. For example the difference between ‘un’ and ‘le’ (‘a’ and ‘the’) is ignored, and both are tagged as determiners. We therefore used the following strategy: content words were replaced with their POS-tags while function words were replaced with their lemma. This approach allowed us to keep relevant linguistic information, especially at the grammatical level. <xref ref-type="bibr" rid="ref-137971">Legallois et al.</xref> proved that these motifs were effective in finding author-specific style characteristics, making it possible to identify interesting examples in corpus studies.</p>
        <p>We compared different lengths of n-grams (unigrams to pentagrams) of tokens, characters, lemmas and motifs (see <xref ref-type="table" rid="attachment-96267">Table 2</xref> for examples of different types of features). The texts of the corpus were represented by the top 500 features with the highest relative frequency. The different distance metrics we used come from stylo R <xref ref-type="bibr" rid="ref-137991">(Eder et al.)</xref>, in which we entered our corpora. <xref ref-type="fig" rid="attachment-96268">Figure 2</xref> shows the percentage of the distance matrix (calculated for the Leblanc corpus) that is Robinsonian for different feature configurations.</p>
        <table-wrap id="attachment-96267">
          <object-id pub-id-type="publisher-id">96267</object-id>
          <label>Table 2.</label>
          <caption>
            <title>Examples of unigrams and bigrams and the different types of features</title>
          </caption>
          <table>
            <tbody>
              <tr>
                <td>“Il est fâcheux que cela traîne en longueur…”</td>
                <td>tokens</td>
                <td>characters</td>
                <td>lemmas</td>
                <td>motifs</td>
              </tr>
              <tr>
                <td>unigrams</td>
                <td>[‘Il’, ‘est’, ‘fâcheux’, ‘que’, ‘cela’, ‘traîne’, ‘en’, ‘longueur’, ‘...’]</td>
                <td>[‘I’, ‘l’, ‘ ’, ‘e’, ‘s’, ‘t’, ‘ ’, ‘f’, ‘â’, ‘c’, ‘h’, ‘e’, ‘u’, ‘x’, ‘ ’, ‘q’, ‘u’, ‘e’, ‘ ’, ‘c’, ‘e’, ‘l’, ‘a’, ‘ ’, ‘t’, ‘r’, ‘a’, ‘î’, ‘n’, ‘e’, ‘ ’, ‘e’, ‘n’, ‘ ’, ‘l’, ‘o’, ‘n’, ‘g’, ‘u’, ‘e’, ‘u’, ‘r’, ‘...’]</td>
                <td>[‘il’, ‘être’, ‘fâcheux’, ‘que’, ‘cela’, ‘traîner’, ‘en’,<break />‘longueur’, ‘…’]</td>
                <td>[‘il’, ‘être’, ‘ADJ’, ‘que’,<break />‘cela’, ‘PRES’, ‘en’, ‘NC’, ‘...’]</td>
              </tr>
              <tr>
                <td>bigrams</td>
                <td>[(‘Il’,‘est’), (‘est’,‘fâcheux’), (‘fâcheux’,‘que’), (‘que’,‘cela’), (‘cela’,‘traîne’), (‘traîne’,‘en’), (‘en’,‘longueur’), (‘longueur’,‘...’)]</td>
                <td>[‘Il’, ‘l ’, ‘ e’, ‘es’, ‘st’, ‘t ’, ‘ f’, ‘fâ’, ‘âc’, ‘ch’, ‘he’, ‘eu’, ‘ux’, ‘x ’, ‘ q’, ‘qu’, ‘ue’, ‘e ’, ‘ c’, ‘ce’, ‘el’, ‘la’, ‘a ’, ‘ t’, ‘tr’, ‘ra’, ‘aî’, ‘în’, ‘ne’, ‘e ’, ‘ e’, ‘en’, ‘n ’, ‘ l’, ‘lo’, ‘on’, ‘ng’, ‘gu’, ‘ue’, ‘eu’, ‘ur’, ‘r...’]</td>
                <td>[(‘Il’,‘être’), (‘être’,‘fâcheux’),(‘fâcheux’,‘que’), (‘que’,‘cela’), (‘cela’,‘traîner’), (‘traîner’,‘en’), (‘en’,‘longueur’), (‘longueur’,‘...’)]</td>
                <td>[(‘Il’,‘être’), (‘être’,‘ADJ’),(‘ADJ’,‘que’), (‘que’,‘cela’), (‘cela’,‘PRES’), (‘PRES’,‘en’), (‘en’,‘NC’), (‘NC’,‘...’)]</td>
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <fig id="attachment-96268">
          <object-id pub-id-type="publisher-id">96268</object-id>
          <label>Figure 2.</label>
          <caption>
            <title>Robinsonian scores for different configurations of features used (eight different distance metrics, four different types of features, five different lengths of n-grams).</title>
            <p>Each combination of distance metric, type of features and length of n-gram was tested. Error bars represent the 95% confidence interval of the mean score of all configurations that includes a given parameter. No configuration is significantly better than others. The highest result of 0.50 is obtained using quadrigrams of lemmas with the canberra metric, but this experiment does not allow us to identify a combination of features that is significantly the best to capture the chronological signal in the data.</p>
          </caption>
          <graphic ns0:href="culturalanalytics_2022_7_3_37588_96268.png" />
        </fig>
        <p>We therefore decided to detect the chronological signal in the corpora of CIDRE and the reference corpus using motif trigrams and the canberra metric as the default option, since trigrams are a medium size and the canberra metric performed slightly better than the others. However, the choice of motifs was motivated by the use of these features in the second series of experiments presented next in this section and by the fact that the lower part of the error bar in <xref ref-type="fig" rid="attachment-96268">Figure 2</xref> is at the highest level of all four tested features. The feature vectors in this experiment contain the scores of the 500 features with the highest relative frequencies.</p>
      </sec>
      <sec>
        <title>4.2. Results</title>
        <p>The scores for the different authors of the CIDRE corpus and the reference corpus can be found in <xref ref-type="table" rid="attachment-96269">Table 3</xref>. To know whether these scores are meaningful, we compared them with a distribution of random permutations of the distance matrix. For 10 000 random permutations, we calculated the percentage that obtained a Robinsonian score higher than the score of the actual distance matrix. <xref ref-type="table" rid="attachment-96269">Table 3</xref> demonstrates that the distance matrices obtained are significantly more Robinsonian for all the authors than random permutations, except for Comtesse de Ségur.</p>
        <table-wrap id="attachment-96269">
          <object-id pub-id-type="publisher-id">96269</object-id>
          <label>Table 3.</label>
          <caption>
            <title>The Robinsonian scores for the authors in CIDRE and the reference corpus, followed by the probability of obtaining these scores by chance if the chronological signal in the data is absent.</title>
          </caption>
          <table>
            <tbody>
              <tr>
                <td>Corpus</td>
                <td>Robinsonian score</td>
                <td>Probability of Robinsonian score if no chronological signal is present in the data</td>
              </tr>
              <tr>
                <td>Comtesse de Ségur</td>
                <td>0.38</td>
                <td>0.14</td>
              </tr>
              <tr>
                <td>Daniel Lesueur</td>
                <td>0.41</td>
                <td>0.00</td>
              </tr>
              <tr>
                <td>Pierre-Alexis Ponson du Terrail</td>
                <td>0.41</td>
                <td>0.00</td>
              </tr>
              <tr>
                <td>Gustave Aimard</td>
                <td>0.42</td>
                <td>0.01</td>
              </tr>
              <tr>
                <td>Honoré de Balzac</td>
                <td>0.44</td>
                <td>0</td>
              </tr>
              <tr>
                <td>Michel Zévaco</td>
                <td>0.46</td>
                <td>0</td>
              </tr>
              <tr>
                <td>Jules Verne</td>
                <td>0.47</td>
                <td>0</td>
              </tr>
              <tr>
                <td>George Sand</td>
                <td>0.49</td>
                <td>0</td>
              </tr>
              <tr>
                <td>Paul Féval</td>
                <td>0.49</td>
                <td>0.00</td>
              </tr>
              <tr>
                <td>Henry Gréville</td>
                <td>0.62</td>
                <td>0</td>
              </tr>
              <tr>
                <td>Émile Zola</td>
                <td>0.63</td>
                <td>0</td>
              </tr>
              <tr>
                <td>Reference Corpus</td>
                <td>0.34</td>
                <td>0</td>
              </tr>
            </tbody>
          </table>
          <table-wrap-foot>
            <p>Decimal numbers come from rounding and plain zeros have not been rounded.</p>
          </table-wrap-foot>
        </table-wrap>
      </sec>
      <sec>
        <title>4.3. Discussion</title>
        <p>First, it should be noted that the percentage of Robinsonian cells in the matrix is dependent on the number of works in the corpus. Larger corpora lower the probability of getting Robinsonianness by chance. Therefore, the score of 0.34 for the reference corpus seems low, but is actually very high for this number of works. Second, it should be kept in mind that only the absolute order of works plays a role in our method and that it does not take into account the exact difference in years between works. That is to say: if max(δ(<italic>text<sub>i</sub></italic>, <italic>text<sub>j</sub></italic>), δ(<italic>text<sub>j</sub></italic>, <italic>text<sub>k</sub></italic>)) ≤ δ(<italic>text<sub>i</sub></italic>, <italic>text<sub>k</sub></italic>) is false, it does not matter how much more max(δ(<italic>text<sub>i</sub></italic>, <italic>text<sub>j</sub></italic>), δ(<italic>text<sub>j</sub></italic>, <italic>text<sub>k</sub></italic>)) is than δ(<italic>text<sub>i</sub></italic>, <italic>text<sub>k</sub></italic>).</p>
        <p>The fact that different types of features produce similar results on the Maurice Leblanc corpus is not that unexpected regarding the literature. <xref ref-type="bibr" rid="ref-137970">Stamou</xref> identified a number of stylistic markers that were of interest in many stylochronometric studies, namely: punctuation, characters, part of speech tags, most common words including function words, frequencies of selected content words, hapax and vocabulary richness. She suggested that there might not be a “single universal stylochronometer” that can apply to every corpus.</p>
        <p>The results from our experiments show that there is a strong chronological signal in the data, except for the corpus of Comtesse de Ségur. A possible explanation for this exception could be that this corpus is too small, representing only 3.8% of the total tokens in CIDRE. Another explanation is that this corpus might be heterogeneous, as it includes children’s stories, bible stories for children and fairy tales. However, in general our results are in line with the rectilinearity hypothesis: the style of an author generally evolves smoothly over time. No regression (texts stylistically similar to earlier texts) can be observed.</p>
        <p>In the next section, we discuss our second series of experiments in which we trained a linear regression model to automatically predict the date of writing of various novels from our reference corpus.</p>
      </sec>
    </sec>
    <sec>
      <title>5. Predicting Year of Writing using Linear Regression</title>
      <p>In this section, we first examine the chronological evolution of the idiolect (and the reference corpus) by training models on the corpora of idiolects in CIDRE and then predicting the year of writing of different works using cross-validation. The hypothesis is simple: if this type of experiment is successful, the results are in favor of the rectilinearity hypothesis. In other words, the frequency of some linguistic forms increases or decreases in a linear fashion to such an extent that we can detect the year of writing. Second, we do not just want to verify if there is a chronological signal, but also if we can identify the linguistic material at the heart of this evolution. Therefore, we will present a feature selection method that identifies the features that change the most in frequency over time. These features will be used for the qualitative study in section 6. Furthermore, besides these hypotheses and goals, we also have the general objective of proposing new ways to evaluate stylometric methods. For this purpose, we will have recourse to the state of the art literature on linear regression models and verify that it can be used in the stylochronometry context.</p>
      <sec>
        <title>5.1. Methods: Regression Models</title>
        <p>Various previous studies have used regression techniques in order to date literary works. For example, <xref ref-type="bibr" rid="ref-138014">Frischer</xref> used regression techniques (among other methods) to date the <italic>Ars Poetica</italic> of Horace. However, by today’s standards, the number of features in this regression was very low so we will mainly discuss more recent work. A representative study using regression is <xref ref-type="bibr" rid="ref-138006">Klaussner and Vogel</xref> work from 2018 (henceforth K&amp;V). They used it in a machine learning task that consisted in predicting the year of writing of a work, focusing on two corpora in English: the work of Henry James and that of Mark Twain. They also used the years 1860-1919 of The Corpus of Historical American English (COHA; <xref ref-type="bibr" rid="ref-138015">Davies et al.</xref>) as a reference corpus to capture the ‘general’ language in North America at the time, to check whether the changes detected in the work of James and Twain were shared by the community or were idiosyncratic. Four types of features were considered: character n-grams, part-of-speech tags, word stems, and lemmas with POS-tags; for each of them unigrams to quadrigrams were tested. This resulted in a total of 32 models (2 authors, 4 types of features and 4 types of n-grams). On each model, the elastic nets algorithm was applied to reduce the number of parameters. The models were evaluated using the measure of root mean squared error (RMSE), which reflects the difference (measured in years) between the prediction and the real year of writing. At this point, note that one should keep in mind that this metric is quite sensitive to outliers, as the error is squared. A baseline performance was also measured “<italic>by using the mean of the data for the prediction of every instance</italic>”; meaning that every work that was dated by the baseline received the same prediction (the mean of all training instances). This would correspond to a model that has a R² score of 0 (<xref ref-type="bibr" rid="ref-138016">Field</xref>). The best results were obtained by K&amp;V using lemmas with POS-tags in unigrams and bigrams.</p>
        <p>Although our experiments share many similarities with K&amp;V, we made some different choices for our models. First, we used only motifs consisting of n-grams (unigram to pentagram, but all incorporated in the same model instead of different models as in K&amp;V). The notion of motif is anchored in previous studies: it has been shown that they are helpful through qualitative analysis <xref ref-type="bibr" rid="ref-137971">(Legallois et al.)</xref>. Second, the feature section algorithm we chose is Lasso LARS <xref ref-type="bibr" rid="ref-138017">(Efron et al.)</xref> with cross validation of 5 (80% training, 20% testing) and not elastic nets. We chose this algorithm because our aim was not to find the most compact model, unlike K&amp;V, but a model that drastically reduces the number of features so that they can be inspected manually (see our qualitative study, Section 6 of this paper). Moreover, as a selection criterion of features, we require that the features be present in at least 20% of an author’s texts, whereas in the work of K&amp;V, features had to be present in all data points. This much lower threshold was chosen here because we think it is possible — at least theoretically — for a language innovation to be totally new or for some structures to entirely disappear. Also, K&amp;V concatenated texts written in the same year into one data point by putting the texts behind each other in the same file, whereas we kept them as separate data points with the same value for the year, since we believe that this better represents the data. However, to ensure comparability with K&amp;V, we decided to measure the RMSE and the RMSE-baseline for our experiments. We also compared our results to our reference corpus, and tried the algorithm of elastic nets,<xref ref-type="fn" rid="fn1">1</xref> as well as elastic nets cross validated.<xref ref-type="fn" rid="fn2">2</xref> In the end, however, we found that Lasso LARS cross validated performed much better on most of our corpora and that the number of features it selected was better suited for qualitative studies (the elastic nets selected either no features or thousands of features, making a qualitative study impossible). Details about the comparison of feature selection algorithms can be found in the supplementary material.<xref ref-type="fn" rid="fn3">3</xref></p>
      </sec>
      <sec>
        <title>5.2. Results</title>
        <p>For every author, we measured the correlation between the actual year and the predicted year and the value of R² (expressed between 0 and 1), which represents the amount of variation of the data that is explained by the model <xref ref-type="bibr" rid="ref-138016">(Field)</xref>. The results can be found in <xref ref-type="table" rid="attachment-96270">Table 4</xref> and <xref ref-type="fig" rid="attachment-96271">Figure 3</xref>. Excellent results were obtained for Jules Verne, Émile Zola, George Sand, Henry Gréville, Daniel-Lesueur and Honoré de Balzac: the models (selected n-grams of motifs) were capable of predicting the large majority of the variation in the data. The models explained a substantial amount of variation in the data for the authors Michel Zévaco, Gustave Aimard, la Comtesse de Ségur and Paul Féval, but less than half of it. Lastly, for Pierre Alexis Ponson du Terrail, the model was not able to explain any variance in the data, and thus the experiment was not successful at all. The same observations can also be made by comparing the evaluation metric root mean squared error (RMSE) and the baseline metric (RMSE-baseline) put forward by K&amp;V.</p>
        <table-wrap id="attachment-96270">
          <object-id pub-id-type="publisher-id">96270</object-id>
          <label>Table 4.</label>
          <caption>
            <title>The regression experiment was very successful in explaining the variance of the corpora in gray, and considerable for the other corpora, except for Pierre Alexis Ponson du Terrail, where it was inefficient.</title>
          </caption>
          <table>
            <tbody>
              <tr>
                <td>Author</td>
                <td>Correlation</td>
                <td>R²</td>
                <td>#β</td>
                <td>RMSE</td>
                <td>RMSE-⁠b</td>
                <td>Remarks</td>
              </tr>
              <tr>
                <td style="background-color:rgb(204,204,204)">Jules Verne</td>
                <td style="background-color:rgb(204,204,204)">0.94</td>
                <td style="background-color:rgb(204,204,204)">0.89</td>
                <td style="background-color:rgb(204,204,204)">57</td>
                <td style="background-color:rgb(204,204,204)">3.91</td>
                <td style="background-color:rgb(204,204,204)">11.83</td>
                <td style="background-color:rgb(204,204,204)" />
              </tr>
              <tr>
                <td style="background-color:rgb(204,204,204)">Daniel-Lesueur</td>
                <td style="background-color:rgb(204,204,204)">0.92</td>
                <td style="background-color:rgb(204,204,204)">0.84</td>
                <td style="background-color:rgb(204,204,204)">14</td>
                <td style="background-color:rgb(204,204,204)">3.39</td>
                <td style="background-color:rgb(204,204,204)">8.46</td>
                <td style="background-color:rgb(204,204,204)" />
              </tr>
              <tr>
                <td style="background-color:rgb(204,204,204)">Émile Zola</td>
                <td style="background-color:rgb(204,204,204)">0.92</td>
                <td style="background-color:rgb(204,204,204)">0.83</td>
                <td style="background-color:rgb(204,204,204)">34</td>
                <td style="background-color:rgb(204,204,204)">4.50</td>
                <td style="background-color:rgb(204,204,204)">11.02</td>
                <td style="background-color:rgb(204,204,204)" />
              </tr>
              <tr>
                <td style="background-color:rgb(204,204,204)">Honoré de Balzac</td>
                <td style="background-color:rgb(204,204,204)">0.90</td>
                <td style="background-color:rgb(204,204,204)">0.78</td>
                <td style="background-color:rgb(204,204,204)">42</td>
                <td style="background-color:rgb(204,204,204)">2.44</td>
                <td style="background-color:rgb(204,204,204)">5.26</td>
                <td style="background-color:rgb(204,204,204)" />
              </tr>
              <tr>
                <td style="background-color:rgb(204,204,204)">George Sand</td>
                <td style="background-color:rgb(204,204,204)">0.88</td>
                <td style="background-color:rgb(204,204,204)">0.77</td>
                <td style="background-color:rgb(204,204,204)">61</td>
                <td style="background-color:rgb(204,204,204)">6.13</td>
                <td style="background-color:rgb(204,204,204)">12.78</td>
                <td style="background-color:rgb(204,204,204)" />
              </tr>
              <tr>
                <td style="background-color:rgb(204,204,204)">Henry Gréville</td>
                <td style="background-color:rgb(204,204,204)">0.78</td>
                <td style="background-color:rgb(204,204,204)">0.55</td>
                <td style="background-color:rgb(204,204,204)">31</td>
                <td style="background-color:rgb(204,204,204)">2.85</td>
                <td style="background-color:rgb(204,204,204)">4.27</td>
                <td style="background-color:rgb(204,204,204)">Regressors in active set degenerate (1/5 folds)</td>
              </tr>
              <tr>
                <td style="background-color:rgb(204,204,204)">Michel Zévaco</td>
                <td style="background-color:rgb(204,204,204)">0.75</td>
                <td style="background-color:rgb(204,204,204)">0.55</td>
                <td style="background-color:rgb(204,204,204)">23</td>
                <td style="background-color:rgb(204,204,204)">3.52</td>
                <td style="background-color:rgb(204,204,204)">5.22</td>
                <td style="background-color:rgb(204,204,204)">ConvergenceWarning: Regressors in active set degenerate (2/5 folds)</td>
              </tr>
              <tr>
                <td>Gustave Aimard</td>
                <td>0.70</td>
                <td>0.49</td>
                <td>21</td>
                <td>5.96</td>
                <td>8.21</td>
                <td />
              </tr>
              <tr>
                <td>Paul Féval</td>
                <td>0.51</td>
                <td>0.26</td>
                <td>17</td>
                <td>8.72</td>
                <td>10.14</td>
                <td>ConvergenceWarning: Regressors in active set degenerate (3/5 folds)</td>
              </tr>
              <tr>
                <td>Comtesse de Ségur</td>
                <td>0.45</td>
                <td>0.18</td>
                <td>18</td>
                <td>3.57</td>
                <td>3.96</td>
                <td>ConvergenceWarning: Regressors in active set degenerate (1/5 folds)</td>
              </tr>
              <tr>
                <td>Pierre Alexis Ponson du Terrail</td>
                <td>-0.04</td>
                <td>-0.55</td>
                <td>10</td>
                <td>5.69</td>
                <td>4.57</td>
                <td />
              </tr>
              <tr>
                <td style="background-color:rgb(204,204,204)">Reference corpus</td>
                <td style="background-color:rgb(204,204,204)">0.84</td>
                <td style="background-color:rgb(204,204,204)">0.70</td>
                <td style="background-color:rgb(204,204,204)">208</td>
                <td style="background-color:rgb(204,204,204)">11.30</td>
                <td style="background-color:rgb(204,204,204)">20.50</td>
                <td style="background-color:rgb(204,204,204)" />
              </tr>
            </tbody>
          </table>
        </table-wrap>
        <fig id="attachment-96271">
          <object-id pub-id-type="publisher-id">96271</object-id>
          <label>Figure 3.</label>
          <caption>
            <title>The results of the regression experiment for all corpora in CIDRE, sorted by performance (left to right, top to bottom).</title>
            <p>The blue line represents a perfect correlation.</p>
          </caption>
          <graphic ns0:href="culturalanalytics_2022_7_3_37588_96271.png" />
        </fig>
        <p>It is important to mention that the modelling does not always (completely) converge for a given K-fold. This problem is mostly noticeable for Paul Féval. There seems to be a relation between the performance and this issue, but convergence cannot explain the poor performance of the model on the corpus of Pierre Alexis Ponson du Terrail: all the models on this corpus converged. We also had a look at which works were well predicted and which ones were outliers. See for example <xref ref-type="fig" rid="attachment-96272">Figure 4</xref>, where it can be seen that <italic>L’Auberge des Saules</italic> by Daniel-Lesueur was predicted about 6 years too late, but <italic>Comédienne</italic> exactly at the time it was written. Other figures in the same style can be found in the supplementary material (directory <italic>plots_regression_par_auteur</italic>).</p>
        <fig id="attachment-96272">
          <object-id pub-id-type="publisher-id">96272</object-id>
          <label>Figure 4.</label>
          <caption>
            <title>The result of the regression experiment for the corpus of Daniel-Lesueur with annotated data points on the scatterplot.</title>
            <p>The blue line represents a perfect correlation.</p>
          </caption>
          <graphic ns0:href="culturalanalytics_2022_7_3_37588_96272.png" />
        </fig>
      </sec>
      <sec>
        <title>5.3. Discussion</title>
        <p>The average proportion of each author in our whole corpus is 9% since we have 11 authors. Our results show that larger corpora (Sand: 15.3% of the whole corpus, Verne: 14%, Zola: 13% and Zévaco: 10.5%) perform very well and that smaller corpora obtain lower scores (Ségur: 3.8%, Aimard: 5.4%, Féval: 6.4%). This suggests that <xref ref-type="bibr" rid="ref-138009">Petré and Van de Velde</xref> are right when they say that corpus size matters a lot, even if it does not explain why our worst performing corpus is Pierre Alexis Ponson du Terrail, which is a medium size corpus (representing 9.1% of the total size of CIDRE). However, it is possible of course that we are not aware of certain circumstances in the publication process of the different authors that might explain these results or that the dating of this corpus is of lesser quality. For example, the literary work of Ponson du Terrail is less well known than that of some other writers in CIDRE: his works were dated using information mostly from Wikipedia, which is not as reliable a source as those used for other authors. If the poorer dating of some novels by this writer is the source of the failure of our model, it means that our method is sensitive to individual data points. Indeed, going back to the previous experiment and <xref ref-type="table" rid="attachment-96269">Table 3</xref>, we see that there is a highly significant chronological signal for this corpus, which means that the approach works globally and that specific cases of failure should be further investigated.</p>
        <p>However that may be, for most of the authors we get a very high value of R², which means that the chronology can explain almost all the variance of the models. This is confirmed by the fact that the value of RMSE is much lower than the RMSE-baseline value. We can thus conclude that the results on all our corpora minus one are in line with the rectilinearity hypothesis.</p>
        <p>A second interesting result that we obtained is the number of features selected per model (column #β in <xref ref-type="table" rid="attachment-96270">Table 4</xref>). For all the corpora in CIDRE the number lies between 10 and 61, which are numbers that make it possible to examine all the features of a corpus in a qualitative study. A direct comparison with K&amp;V is difficult, as we did not work on the same corpora. However, we observe that the number of predictors per model (column #β <xref ref-type="table" rid="attachment-96270">in Table 4</xref>) has a smaller range for the different models we developed (the models of K&amp;V range from counting 1 predictor to 315). Nevertheless, it should be kept in mind that our feature selection algorithm probably removes correlated features and that the random seed plays a role in which ones. Therefore, for a complete qualitative study of one author, it might be worthwhile repeating the regression experiment a number of times with different random seeds.</p>
        <p>The features that characterize the evolution of an author do not necessarily have to be frequent. While some features are quite frequent, such as the increasing motif ‘y’ (anaphor referring to a prepositional phrase beginning with the preposition ‘à’) found for Zola, which obtains the relative frequency of 0.0027 at its maximum, some others, such as the decreasing motif ‘autrui’ (other people), have a low relative frequency, 0.0008 at its maximum. This conclusion is similar to a finding in <xref ref-type="bibr" rid="ref-138018">Koppel and Schler</xref>, who found that idiosyncratic features that play an important role in authorship attribution also tend to be of low frequency. How we explore the features of the models will be discussed in the next section.</p>
      </sec>
    </sec>
    <sec>
      <title>6. A Qualitative Study of some Motifs Sensitive to Diachronic Change</title>
      <p>In this section, we look more closely at features selected by the regression algorithm (presented in the previous section). These features, or motifs, are at the heart of idiolectal evolution and are assumed to be easily interpretable. Let’s examine if this is true.</p>
      <sec>
        <title>6.1. Methods: Manual Inspection</title>
        <p>We scrutinized the motifs relevant for four authors on whom our models obtained good results: Balzac, Daniel-Lesueur, Sand and Zola (looking deeper into the motifs attached to authors for whom we obtained poor results would not make much sense). For these authors, we inspected whether the selected motifs were interpretable by looking at examples from the corpus in context, to see if they corresponded to meaningful linguistic patterns.</p>
        <p>First, it is clear that some forms are not interpretable: in Balzac, for example, there is a decrease in the use of adverbs over time, but the adverbial category is relatively heterogeneous so that it is difficult to interpret this phenomenon. The same can be observed with the motif “NC_,_avoir” (also in Balzac) which increases; this motif is realized in sequences (<italic>patronne, avait</italic> - mistress, had; <italic>frère, ont</italic> -brother, had; <italic>ciel, as</italic> - heaven, have ) of which, at first sight, nothing really relevant can be said. Another group of difficult motifs in Balzac is the more frequent use of the pronoun “<italic>on</italic>” (“on PRES”, “. on”, “ADJ que on”). These patterns are hard to interpret, especially since the pronoun “<italic>on</italic>” has a fairly wide range of referential values.</p>
        <p>Another example from Zola concerns verbs such as <italic>bouleverser</italic> (to upset) and <italic>convaincre</italic> (to convince), which have an increasing frequency over time. But here again, there is no immediate explanation for this usage. However, we were pleased to see that most motifs are interpretable (for example, we estimated that about three quarters of the motifs retained for Zola were). In the rest of this section, we will discuss some of the (groups of) motifs that we found interesting, or that have been previously noticed by researchers of the field of stylistic analysis.</p>
      </sec>
      <sec>
        <title>6.2. Results</title>
        <p>A number of interpretable motifs can be considered as stylemes. For example, in Zola, there is a set of motifs organized around “. Et” (the conjunction “and” at the beginning of the sentence): “. Et le NC être” ; “. Et, dès” ; “. Et ce être” whose use increases, as shown in example 1:</p>
        <disp-quote>
          <list list-type="order">
            <list-item>
              <p>Quoi donc ? Était-ce la fin ? Un souffle glacé avait couru sur le camp, anéanti de sommeil et d’angoisse. <bold>Et ce fut</bold> alors que Jean et Maurice reconnurent le colonel de Vineuil […]</p>
            </list-item>
          </list>
        </disp-quote>
        <preformat>(Zola, *La débâcle*)</preformat>
        <disp-quote>
          <p>What then? Was it the end? An icy breath had run over the camp, annihilated by sleep and anguish. <bold>And it was</bold> then that Jean and Maurice recognized Colonel de Vineuil […]</p>
        </disp-quote>
        <p>This use (called the revival “et”) was noticed very early by stylisticians (<xref ref-type="bibr" rid="ref-138019">Thibaudet</xref>; but see <xref ref-type="bibr" rid="ref-138020">Bordas</xref> and <xref ref-type="bibr" rid="ref-138021">Badiou-Monferrand</xref> for modern accounts). It is not considered an idiosyncrasy, since this motif was also used by Flaubert, who can be said to have been imitated by Zola <xref ref-type="bibr" rid="ref-138019 ref-138022">(Thibaudet; Gauthier)</xref>. Flaubert considered that it was “an old biblical tic which is annoying”.</p>
        <p>Another set of motifs is, meanwhile, on the decline, including “NCCOR”, a tag for parts of the human body, which are used in the physical description of the characters (<italic>épaules; tête; main; yeux</italic>, etc. - shoulders; head; hand; eyes, etc.). It is difficult to interpret the reason for this decrease; perhaps the form, or at least its repetition, was considered a cliché by the author.</p>
        <p>In the same way, in Sand, we also notice that motifs linked to units referring to parts of the body tend to decrease, for example “NCCOR avec NCABS”: <italic>et se jeta dans mes <bold>bras avec joie</bold>; Suzanne baissa la <bold>tête avec embarras</bold></italic>… and threw herself into my <bold>arms with joy</bold>; Suzanne lowered her <bold>head in embarrassment</bold>. This motif associates a movement of the body with a feeling. Again, this change could be considered to be due to the avoidance of a cliché, but this is a hypothesis that will have to be verified in further analysis.</p>
        <p>Among the positive motifs of Sand, we note these two forms (“, et, comme”, “, et, si ce”) which share the same rhythmic pattern (see examples 2 and 3). Without going into detail, these patterns may highlight the subordinate sentence by a kind of tension (↗), while the main phase is constructed in detension (↙).</p>
        <disp-quote>
          <list list-type="order">
            <list-item>
              <p>Après avoir fait quelques tours sous les galeries, il se crut assez calme pour retourner à l’atelier, <bold>et, comme</bold> il redescendait l’escalier des Géants, il se trouva tout à coup face à face avec le Bozza.</p>
            </list-item>
          </list>
        </disp-quote>
        <preformat>(Sand, *Les Maîtres mosaïstes*)</preformat>
        <disp-quote>
          <p>After having taken a few turns under the galleries, he believed himself calm enough to return to the workshop, <bold>and, as</bold> he went back down the staircase of the Giants, he found himself suddenly face to face with the Bozza.</p>
          <list list-type="order">
            <list-item>
              <p>Dès lors, j’espérais qu’elle pourrait aimer Narcisse, <bold>et, si cet</bold> excellent jeune homme pouvait être heureux par elle, c’était à la condition de ne plus souffrir du passé. (Sand, <italic>Narcisse</italic>)</p>
            </list-item>
          </list>
          <p>From then on, I hoped that she would be able to love Narcisse, <bold>and, if this</bold> excellent young man could be happy thanks to her, it was on the condition of not suffering from the past anymore.</p>
        </disp-quote>
        <p>For Balzac, we found the decreasing motif “tout_à_NC” which corresponds in the vast majority of cases to the adverb “tout à coup” <italic>all of a sudden</italic>. Again, we suspect that the decrease of this motif could be caused by the avoidance of clichés. An interesting example of increasing motifs of Balzac is the motif “dire_à_NP” “<italic>say to Proper Name”</italic>. When inspecting the corpus, we noticed that this phrase is used in different ways: sometimes it is inserted inside a dialogue as illustrated in example (4), often it is used to mark the transition from narration to dialogue as in (5) and vice-versa, as in (6). We consider it a stylistic means to dynamize the switches between narratives and dialogues. Often this construction is used to put a long grammatical subject after the verb and direct object (as in 4 and 6), which also creates a stylistic effect.</p>
        <disp-quote>
          <list list-type="order">
            <list-item>
              <p>Il ne faut pas demander à monsieur pourquoi il vient, <bold>dit à Castanier</bold> une vieille portière, vous ressemblez trop à ce pauvre cher défunt.</p>
            </list-item>
          </list>
          <p>(Balzac, <italic>Melmoth reconcilié</italic>)</p>
          <p>One shouldn’t ask this gentleman why he came, <bold>said</bold> an old doorkeeper <bold>to Castanier</bold>, you look too much like the poor, dear deceased.</p>
          <list list-type="order">
            <list-item>
              <p>Le commandant, qui l’ étudiait, s’apercevant de cette insensibilité, <bold>dit à Gérard</bold> : Le serin n’en sait pas long.</p>
            </list-item>
          </list>
        </disp-quote>
        <preformat>(Balzac, *Les Chouans*)</preformat>
        <disp-quote>
          <p>The commander, who was studying him, and noticed this insensitivity, <bold>said to Gérard</bold>: The fool does not know much.</p>
          <list list-type="order">
            <list-item>
              <p>J’attends la réponse, <bold>dit à Rastignac</bold> le commissionnaire de madame de Nucingen.</p>
            </list-item>
          </list>
        </disp-quote>
        <preformat>(Balzac, *Le père Goriot*)</preformat>
        <disp-quote>
          <p>I’m waiting for an answer, <bold>said</bold> the commissioner of Madame de Nucingen <bold>to Rastignac</bold>.</p>
        </disp-quote>
        <p>Finally, for Daniel-Lesueur, it is worth mentioning the increasing motif “…_DETPOSS_NC_…” (see examples 7 and 8), by which a noun preceded by a possessive determiner in between two ellipsis punctuation marks dramatizes reported thoughts and speech by invoking a close relation <italic>mon enfant; ma soeur; mon amie</italic> (<italic>my child; my sister; my friend</italic>).</p>
        <disp-quote>
          <list list-type="order">
            <list-item>
              <p>Ah ! ma mère <bold>… ma mère …</bold> pensait Hervé, […]</p>
            </list-item>
          </list>
        </disp-quote>
        <preformat>(Daniel-Lesueur, *Le Masque d'Amour II - Madame de Ferneuse*)</preformat>
        <disp-quote>
          <p>Ah ! my mother <bold>… my mother…</bold> thought Hervé, […]</p>
          <list list-type="order">
            <list-item>
              <p>Je suis perdue ! … Perdue ! <bold>… Ma chérie …</bold> Invente quelque chose ! … Ah ! sauve-moi !</p>
            </list-item>
          </list>
        </disp-quote>
        <preformat>(Daniel-Lesueur, *Justice de femme*)</preformat>
        <disp-quote>
          <p>I’m lost! … Lost! <bold>… My darling…</bold> Think of something! … Ah! save me!</p>
        </disp-quote>
      </sec>
      <sec>
        <title>6.3. Discussion</title>
        <p>As already mentioned, not all the motifs identified automatically are interpretable. Many, however, are stylistic in nature without it being possible to determine whether these uses are a deliberate choice by the author, or whether they are a form of automatism. To shed light on this question, a more precise analysis involving literary expertise should be undertaken. Our analysis provides the literary scholar, the stylistician and the linguist with statistically relevant evidence of the evolution of certain forms. It is up to these specialists to show correlations between forms, to propose interpretations. This type of approach can provide an empirical basis for more theoretical research <xref ref-type="bibr" rid="ref-138023">(Philippe)</xref>. Our hope is to have demonstrated that our method, which combines the use of motifs and the feature selection method of Lasso LARS, identifies a large number of stylistically interesting patterns and can be a useful tool in the qualitative analysis of the evolution of the idiolect.</p>
      </sec>
    </sec>
    <sec>
      <title>7. General Discussion and Future Work</title>
      <sec>
        <title>7.1. Contribution of the Work</title>
        <p>In this article we investigated the chronological evolution of the idiolect. We examined whether support could be found for the rectilinearity hypothesis which states that the evolution of the idiolect is rectilinear in time, and whether the linguistic structures at the heart of idiolectal change could be identified. Using the Corpus for Idiolectal Research (CIDRE), we developed two methods that could help reach these goals. First, we introduced the idea of evaluating to what extent the distance matrices of works of one author are robinsonian. For ten out of eleven corpora in CIDRE, we found that the Robinsonian score was significantly high, suggesting that chronology plays a crucial role in the idiolect of an author. Second, we developed linear regression methods to predict the year of writing of a work and selected linguistic features that are key in the process of idiolectal change. We found that the majority of regression models were highly successful, again supporting the rectilinearity hypothesis. Third, these models allowed us to find a number of features (in the form of <italic>motifs</italic>) that lent themselves to manual examination in a qualitative study, demonstrating both the usefulness of these features and the validity of our methods. We believe that the use of motifs is complementary to the use of lemmas and tokens. As for example <xref ref-type="bibr" rid="ref-138024">Brunet</xref> illustrates in his study of the vocabulary used by Zola, using lemmas allows a researcher to interpret the topics of a writer. In the present study we demonstrate that motifs, on the other hand, might give more insights in stylistic forms.</p>
        <p>We believe that working on the concepts of idiolect and chronological change can have an impact on related research themes. Modeling the idiolect could be useful, for example for the task of automatic text dating that was included in the 2015 SemEval campaign: <italic>Task 7: Diachronic Text Evaluation</italic> <xref ref-type="bibr" rid="ref-138025">(Popescu and Strapparava)</xref>. A corpus of snippets from newspaper articles dating from 1700 until 2010 was composed and the task consisted in dating these snippets. It could be interesting to see if the idiolect plays a role and if it can enhance the classification results.</p>
        <p>A theme for which the concept of chronological variation could be interesting is authorship attribution and authorship verification, which involves checking whether a pair of documents are written by the same person <xref ref-type="bibr" rid="ref-138026">(Kestemont et al.)</xref>. Nowadays, the chronology of the writing is not taken into account; only the idiolect of each author in the corpus is modeled. It is quite possible, however, that taking the date of writing into consideration would enhance the modeling. Many different features have been explored to model the idiolect of authors for this task: n-grams of words or characters (e.g. <xref ref-type="bibr" rid="ref-138027 ref-138028 ref-138029">Stamatatos; Antonia et al.; Sari et al.</xref>), syntactic structures <xref ref-type="bibr" rid="ref-138030 ref-138031">(Sundararajan and Woodard; Zhang et al.)</xref> and even discourse structure <xref ref-type="bibr" rid="ref-141939">(Ding et al.)</xref> but we are not aware of models that take idiolectal variance over time into account. However, especially for writers with long careers, it could be meaningful.</p>
        <p>In this study, we focused on methods and on the evaluation of results. We argue that the use of standard corpora, baselines and evaluation metrics could help enhance the comparability of studies in the field of stylometry and that this would help the research community gain greater insight into the robustness of the results. In our experiment on the Robinsonian matrices, we used random results as a baseline. For research questions that have not yet been addressed in the literature, this is a useful starting point, as shown in the work of <xref ref-type="bibr" rid="ref-138032">Bulteau et al.</xref>, who developed two algorithms to estimate the probability that a tree produced by a hierarchical clustering algorithm — for instance produced by stylo R <xref ref-type="bibr" rid="ref-137991">(Eder et al.)</xref> — reflects a chronological order by chance. In our experiment using regression models, we compared our methods with those of <xref ref-type="bibr" rid="ref-138005">Klaussner and Vogel</xref> from their 2018 publication, using their baseline RMSE and the standard baseline of regression models, R2 <xref ref-type="bibr" rid="ref-138016">(Field)</xref>.</p>
        <p>An important contribution of this study is that it addresses questions of evaluation. We have seen that the development of off-the-shelf-packages has made it possible to shed new light on long-standing research questions. For example <xref ref-type="bibr" rid="ref-138033">Schmidt-Petri et al.</xref> used the rolling-classification algorithm from stylo R <xref ref-type="bibr" rid="ref-137991">(Eder et al.)</xref> to examine the contribution of Harriet Taylor Mill to the essay <italic>On Liberty</italic>, which is officially contributed solely to John Stuart Mill, her husband. They found that there is stylometric evidence that she should indeed be considered a co-author of the work. However, as stylo R does not enable any statistical evaluation of the classification results, the authors had no straightforward means of interpreting their reliability and had to undertake considerable extra work to estimate the robustness of the results. We therefore think that working on the question of evaluation of stylometric methods is a topic in the field of stylometry that needs to be developed further and we hope to have made a useful contribution to it.</p>
      </sec>
      <sec>
        <title>7.2. Future Work</title>
        <p>The most obvious future work that should result from this study is a detailed qualitative analysis of the selected motifs in CIDRE, for which the regression models obtain good results. These studies should also contain a detailed comparison with the reference corpus in order to decide if the observed change can be interpreted as a rather general diachronic language change or an idiosyncrasy, using for example the methods that Mollin used to identify idiosyncratic collocations. This should be done, however, in collaboration with literary experts of the writers in question in order to compare the findings of the method with what is already known in the field of stylometry and stylistics. In addition to the identification of idiosyncratic motifs, collaborations with literary experts would allow us to get a more precise interpretation of the motifs of an author. Indeed, we could for example examine the role that dialogs, narratives, and descriptions play when experts provide us with theoretically and empirically motivated hypotheses on specific authors.</p>
        <p>Another straightforward direction for future work is to repeat our experiments on other text genres, for example drama or correspondence. We are considering trying our methods on plays, for example by using the <italic>Théâtre Classique</italic> corpus of <xref ref-type="bibr" rid="ref-138034">Fièvre</xref>. However, as theatrical works might be influenced/written by the actors in the plays, the idiolectal signal of the playwright may not be as strong as for works of fiction. Correspondence could be interesting in order to investigate idiolectal changes with respect to the addressee of the letter. We could for example use the Corpus of Early English Correspondence <xref ref-type="bibr" rid="ref-138004">(Raumolin-Brunberg and Nevalainen)</xref> and the correspondence of George Sand. Another advantage of using correspondence is that dating letters might be more precise than dating works of fiction. However, it would probably result in corpora that are significantly smaller than the corpora of the authors in CIDRE. As we suspect a strong relation between corpus size and the statistical power of the experiments, the success is not guaranteed for smaller corpora. But on the other hand, the number of letters per author is probably higher than the number of books in CIDRE, which could enhance the statistical power.</p>
        <p>A third direction for future work is to evaluate how different people influence each other with their idiolects. <xref ref-type="bibr" rid="ref-138003">Evans</xref> investigated how the idiolect of Queen Elizabeth I was influenced by others. It could be interesting to develop a methodology on how influence could be established between authors or even between literary movements.</p>
      </sec>
    </sec>
    <sec>
      <title>8. Conclusion</title>
      <p>Our experiments demonstrate that there is a significant evolution of the idiolect during an author’s lifetime. Our experiments also suggest that some features evolve in a rectilinear manner, steadily increasing or decreasing as the years go by. These features are sufficiently clear-cut to be used to date the year of writing of a book very accurately. We therefore conclude that we found strong support for the rectilinearity hypothesis and that the evolution of the idiolect is a relevant type of intrapersonal variation that exists alongside the strong signal of interpersonal variation. We thus dismiss the proposal that idiolects are stable over time, even though it is true that not all linguistic features evolve. A second contribution of our article is the development of new methods for which we have demonstrated the usefulness in 1) assessing the chronological signal of the idiolect in corpora and 2) identifying linguistic structures that are at the heart of this evolution. These features can in turn be used for qualitative studies with stylistic objectives.</p>
      <p>Peer-Reviewers: Simon DeDeo, David Mimno</p>
      <p>All scripts, supplementary materials and data used for our experiments are available in the online Harvard Dataverse directory: <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.7910/DVN/WCMZOK">https://doi.org/10.7910/DVN/WCMZOK</ext-link>, except for the CIDRE corpus, that is freely available in the following Zenodo repository: <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.5281/zenodo.4707812">https://doi.org/10.5281/zenodo.4707812</ext-link>.</p>
    </sec>
  </body>
  <back>
    <ack>
      <title>Acknowledgments</title>
      <p>We thank the two reviewers for their insightful comments and suggestions. This work has been developed in the framework of the IRN (International Research Network) Cyclades (Corpora and Computational Linguistics for Digital Humanities). This work was also supported in part by the French government under management of Agence Nationale de la Recherche as part of the “Investissements d’avenir” program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute) and reference ANR-16-IDEX-0003 (I-Site Future, programme “Cité des dames, créatrices dans la cité”).</p>
    </ack>
    <fn-group>
      <fn id="fn1">
        <label>1</label>
        <p>https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html</p>
      </fn>
      <fn id="fn2">
        <label>2</label>
        <p>https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNetCV.html#sklearn.linear_model.ElasticNetCV</p>
      </fn>
      <fn id="fn3">
        <label>3</label>
        <p>The filename is: ‘results_LassoLars_vs_ElasticNet.txt’.</p>
      </fn>
    </fn-group>
    <ref-list>
      <ref id="ref-138010">
        <element-citation publication-type="article-journal">
          <article-title>Grammaticalization and the linguistic individual: New avenues in lifespan research</article-title>
          <source>Linguistics Vanguard</source>
          <person-group person-group-type="author">
            <name>
              <surname>Anthonissen</surname>
              <given-names>Lynn</given-names>
            </name>
            <name>
              <surname>Petré</surname>
              <given-names>Peter</given-names>
            </name>
          </person-group>
          <date>
            <day>22</day>
            <month>6</month>
            <year>2019</year>
          </date>
          <volume>5</volume>
          <issue>s2</issue>
          <issn>2199-174X</issn>
          <pub-id pub-id-type="doi">10.1515/lingvan-2018-0037</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1515/lingvan-2018-0037">https://doi.org/10.1515/lingvan-2018-0037</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138028">
        <element-citation publication-type="article-journal">
          <article-title>Language chunking, data sparseness, and the value of a long marker list: Explorations with word n-grams and authorial attribution</article-title>
          <source>Literary and Linguistic Computing</source>
          <person-group person-group-type="author">
            <name>
              <surname>Antonia</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Craig</surname>
              <given-names>H.</given-names>
            </name>
            <name>
              <surname>Elliott</surname>
              <given-names>J.</given-names>
            </name>
          </person-group>
          <date>
            <day>22</day>
            <month>5</month>
            <year>2013</year>
          </date>
          <volume>29</volume>
          <issue>2</issue>
          <fpage>147</fpage>
          <lpage>163</lpage>
          <issn>0268-1145</issn>
          <pub-id pub-id-type="doi">10.1093/llc/fqt028</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1093/llc/fqt028">https://doi.org/10.1093/llc/fqt028</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138021">
        <element-citation publication-type="article-journal">
          <article-title>Rémanence des Et de relance en français moderne et contemporain: du “résidu” au “reliquat”</article-title>
          <source>Le français moderne</source>
          <person-group person-group-type="author">
            <name>
              <surname>Badiou-Monferrand</surname>
              <given-names>Claire</given-names>
            </name>
          </person-group>
          <date>
            <year>2020</year>
          </date>
          <volume>88</volume>
          <issue>2</issue>
          <fpage>295–312</fpage>
        </element-citation>
      </ref>
      <ref id="ref-138001">
        <element-citation publication-type="paper-conference">
          <source>Individual usage: a corpus-based study of idiolects</source>
          <person-group person-group-type="author">
            <name>
              <surname>Barlow</surname>
              <given-names>Michael</given-names>
            </name>
          </person-group>
          <date>
            <year>2010</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-137968">
        <element-citation publication-type="article-journal">
          <article-title>A set of postulates for phonemic analysis</article-title>
          <source>Language</source>
          <person-group person-group-type="author">
            <name>
              <surname>Bloch</surname>
              <given-names>Bernard</given-names>
            </name>
          </person-group>
          <date>
            <month>1</month>
            <year>1948</year>
          </date>
          <volume>24</volume>
          <issue>1</issue>
          <fpage>3</fpage>
          <lpage>46</lpage>
          <issn>0097-8507</issn>
          <pub-id pub-id-type="doi">10.2307/410284</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.2307/410284">https://doi.org/10.2307/410284</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138020">
        <element-citation publication-type="article-journal">
          <article-title>Et la conjonction resta tensive. Sur le et de relance rythmique</article-title>
          <source>Français moderne</source>
          <person-group person-group-type="author">
            <name>
              <surname>Bordas</surname>
              <given-names>Éric</given-names>
            </name>
          </person-group>
          <date>
            <year>2005</year>
          </date>
          <volume>73</volume>
          <issue>1</issue>
          <fpage>23–39</fpage>
        </element-citation>
      </ref>
      <ref id="ref-137984">
        <element-citation publication-type="article-journal">
          <article-title>The chronology of Shakespeare's plays: A statistical study</article-title>
          <source>Computers and the Humanities</source>
          <person-group person-group-type="author">
            <name>
              <surname>Brainerd</surname>
              <given-names>Barron</given-names>
            </name>
          </person-group>
          <date>
            <month>12</month>
            <year>1980</year>
          </date>
          <volume>14</volume>
          <issue>4</issue>
          <fpage>221</fpage>
          <lpage>230</lpage>
          <issn>0010-4817</issn>
          <pub-id pub-id-type="doi">10.1007/bf02404431</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1007/bf02404431">https://doi.org/10.1007/bf02404431</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138012">
        <element-citation publication-type="article-journal">
          <article-title>GutenTag: An NLP-driven tool for digital humanities research in the Project Gutenberg corpus</article-title>
          <source>Proceedings of the Fourth Workshop on Computational Linguistics for Literature</source>
          <person-group person-group-type="author">
            <name>
              <surname>Brooke</surname>
              <given-names>Julian</given-names>
            </name>
            <name>
              <surname>Hammond</surname>
              <given-names>Adam</given-names>
            </name>
            <name>
              <surname>Hirst</surname>
              <given-names>Graeme</given-names>
            </name>
          </person-group>
          <date>
            <year>2015</year>
          </date>
          <fpage>42</fpage>
          <lpage>47</lpage>
          <pub-id pub-id-type="doi">10.3115/v1/w15-0705</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.3115/v1/w15-0705">https://doi.org/10.3115/v1/w15-0705</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138024">
        <element-citation publication-type="book">
          <source>Le vocabulaire de Zola</source>
          <person-group person-group-type="author">
            <name>
              <surname>Brunet</surname>
              <given-names>Etienne</given-names>
            </name>
          </person-group>
          <publisher-name>Slatkine, Champion</publisher-name>
          <date>
            <year>1985</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-138032">
        <element-citation publication-type="paper-conference">
          <source>Reordering a tree according to an order on its leaves</source>
          <person-group person-group-type="author">
            <name>
              <surname>Bulteau</surname>
              <given-names>Laurent</given-names>
            </name>
            <name>
              <surname>Gambette</surname>
              <given-names>Philippe</given-names>
            </name>
            <name>
              <surname>Seminck</surname>
              <given-names>Olga</given-names>
            </name>
          </person-group>
          <publisher-name>Schloss Dagstuhl-Leibniz-Zentrum für Informatik</publisher-name>
          <date>
            <year>2022</year>
          </date>
          <pub-id pub-id-type="doi">10.4230/LIPIcs.CPM.2022.24</pub-id>
        </element-citation>
      </ref>
      <ref id="ref-137987">
        <element-citation publication-type="article-journal">
          <article-title>Change of writing style with time</article-title>
          <source>Computers and the Humanities</source>
          <person-group person-group-type="author">
            <name>
              <surname>Can</surname>
              <given-names>Fazli</given-names>
            </name>
            <name>
              <surname>Patton</surname>
              <given-names>Jon M.</given-names>
            </name>
          </person-group>
          <date>
            <month>2</month>
            <year>2004</year>
          </date>
          <volume>38</volume>
          <issue>1</issue>
          <fpage>61</fpage>
          <lpage>82</lpage>
          <issn>0010-4817</issn>
          <pub-id pub-id-type="doi">10.1023/b:chum.0000009225.28847.77</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1023/b:chum.0000009225.28847.77">https://doi.org/10.1023/b:chum.0000009225.28847.77</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137999">
        <element-citation publication-type="chapter">
          <chapter-title>Using statistics in lexical analysis</chapter-title>
          <source>Lexical acquisition: exploiting on-line resources to build a lexicon</source>
          <person-group person-group-type="author">
            <name>
              <surname>Church</surname>
              <given-names>Kenneth</given-names>
            </name>
            <name>
              <surname>Gale</surname>
              <given-names>William</given-names>
            </name>
            <name>
              <surname>Hanks</surname>
              <given-names>Patrick</given-names>
            </name>
            <name>
              <surname>Hindle</surname>
              <given-names>Donald</given-names>
            </name>
          </person-group>
          <publisher-name>Psychology Press</publisher-name>
          <date>
            <year>1991</year>
          </date>
          <fpage>115–164</fpage>
        </element-citation>
      </ref>
      <ref id="ref-137976">
        <element-citation publication-type="article-journal">
          <article-title>On a discriminatory problem connected with the works of Plato</article-title>
          <source>Journal of the Royal Statistical Society: Series B (Methodological)</source>
          <person-group person-group-type="author">
            <name>
              <surname>Cox</surname>
              <given-names>D. R.</given-names>
            </name>
            <name>
              <surname>Brandwood</surname>
              <given-names>Leonard</given-names>
            </name>
          </person-group>
          <date>
            <month>1</month>
            <year>1959</year>
          </date>
          <volume>21</volume>
          <issue>1</issue>
          <fpage>195</fpage>
          <lpage>200</lpage>
          <issn>0035-9246</issn>
          <pub-id pub-id-type="doi">10.1111/j.2517-6161.1959.tb00329.x</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1111/j.2517-6161.1959.tb00329.x">https://doi.org/10.1111/j.2517-6161.1959.tb00329.x</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137989">
        <element-citation publication-type="article-journal">
          <article-title>Stylistic analysis and authorship studies</article-title>
          <source>A companion to digital humanities</source>
          <person-group person-group-type="author">
            <name>
              <surname>Craig</surname>
              <given-names>Hugh</given-names>
            </name>
          </person-group>
          <date>
            <year>2004</year>
          </date>
          <volume>3</volume>
          <fpage>233–334</fpage>
          <comment>publisher: Blackwell Publishing Oxford, UK</comment>
        </element-citation>
      </ref>
      <ref id="ref-137982">
        <element-citation publication-type="article-journal">
          <article-title>Resolutions and chronology in Euripides: the fragmentary tragedies</article-title>
          <source>Bulletin Supplement (University of London. Institute of Classical Studies)</source>
          <person-group person-group-type="author">
            <name>
              <surname>Cropp</surname>
              <given-names>Martin</given-names>
            </name>
            <name>
              <surname>Fick</surname>
              <given-names>Gordon</given-names>
            </name>
          </person-group>
          <date>
            <year>1985</year>
          </date>
          <fpage>iii–92</fpage>
          <comment>publisher: JSTOR</comment>
        </element-citation>
      </ref>
      <ref id="ref-137988">
        <element-citation publication-type="chapter">
          <chapter-title>Explanation in Computational Stylometry</chapter-title>
          <source>Computational Linguistics and Intelligent Text Processing</source>
          <person-group person-group-type="author">
            <name>
              <surname>Daelemans</surname>
              <given-names>Walter</given-names>
            </name>
          </person-group>
          <person-group person-group-type="editor">
            <name>
              <surname>Gelbukh</surname>
              <given-names>Alexander</given-names>
            </name>
          </person-group>
          <publisher-name>Springer Berlin Heidelberg</publisher-name>
          <date>
            <year>2013</year>
          </date>
          <volume>7817</volume>
          <fpage>451</fpage>
          <lpage>462</lpage>
          <pub-id pub-id-type="doi">10.1007/978-3-642-37256-8_37</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1007/978-3-642-37256-8_37">https://doi.org/10.1007/978-3-642-37256-8_37</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138015">
        <element-citation publication-type="chapter">
          <chapter-title>The 400 million word corpus of historical American English (1810–2009)</chapter-title>
          <source>English Historical Linguistics 2010: Selected Papers from the Sixteenth International Conference on English Historical Linguistics (ICEHL 16), Pécs, 23-27 August 2010</source>
          <person-group person-group-type="author">
            <name>
              <surname>Davies</surname>
              <given-names>Mark.</given-names>
            </name>
            <etal />
          </person-group>
          <publisher-name>John Benjamins Publishing</publisher-name>
          <date>
            <year>2012</year>
          </date>
          <volume>325</volume>
          <fpage>231</fpage>
          <lpage>262</lpage>
          <pub-id pub-id-type="doi">10.1075/cilt.325.11dav</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1075/cilt.325.11dav">https://doi.org/10.1075/cilt.325.11dav</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137985">
        <element-citation publication-type="article-journal">
          <article-title>Clockwork Shakespeare: The Bard Meets the Regressive Imagery Dictionary</article-title>
          <source>Empirical Studies of the Arts</source>
          <person-group person-group-type="author">
            <name>
              <surname>Derks</surname>
              <given-names>Peter L.</given-names>
            </name>
          </person-group>
          <date>
            <month>7</month>
            <year>1994</year>
          </date>
          <volume>12</volume>
          <issue>2</issue>
          <fpage>131</fpage>
          <lpage>139</lpage>
          <issn>0276-2374</issn>
          <pub-id pub-id-type="doi">10.2190/h489-jh64-lq8c-l4t1</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.2190/h489-jh64-lq8c-l4t1">https://doi.org/10.2190/h489-jh64-lq8c-l4t1</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137981">
        <element-citation publication-type="article-journal">
          <article-title>A New Aspect of the Evolution of the Trimeter in Euripides</article-title>
          <source>Transactions of the American Philological Association (1974-)</source>
          <person-group person-group-type="author">
            <name>
              <surname>Devine</surname>
              <given-names>A. M.</given-names>
            </name>
            <name>
              <surname>Stephens</surname>
              <given-names>Laurence D.</given-names>
            </name>
          </person-group>
          <date>
            <year>1981</year>
          </date>
          <volume>111</volume>
          <fpage>43</fpage>
          <issn>0360-5949</issn>
          <pub-id pub-id-type="doi">10.2307/284118</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.2307/284118">https://doi.org/10.2307/284118</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-141939">
        <element-citation publication-type="article-journal">
          <article-title>Learning Stylometric Representations for Authorship Analysis</article-title>
          <source>IEEE Transactions on Cybernetics</source>
          <person-group person-group-type="author">
            <name>
              <surname>Ding</surname>
              <given-names>Steven H. H.</given-names>
            </name>
            <name>
              <surname>Fung</surname>
              <given-names>Benjamin C. M.</given-names>
            </name>
            <name>
              <surname>Iqbal</surname>
              <given-names>Farkhund</given-names>
            </name>
            <name>
              <surname>Cheung</surname>
              <given-names>William K.</given-names>
            </name>
          </person-group>
          <publisher-name>Institute of Electrical and Electronics Engineers (IEEE)</publisher-name>
          <date>
            <month>1</month>
            <year>2019</year>
          </date>
          <volume>49</volume>
          <issue>1</issue>
          <fpage>107</fpage>
          <lpage>121</lpage>
          <issn>2168-2267</issn>
          <pub-id pub-id-type="doi">10.1109/tcyb.2017.2766189</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1109/tcyb.2017.2766189">https://doi.org/10.1109/tcyb.2017.2766189</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137969">
        <element-citation publication-type="article-journal">
          <article-title>Explorations in 'Idiolects'</article-title>
          <source>Amsterdam Studies in the Theory and History of Linguistic Science Series 4</source>
          <person-group person-group-type="author">
            <name>
              <surname>Dittmar</surname>
              <given-names>Norbert</given-names>
            </name>
          </person-group>
          <date>
            <year>1996</year>
          </date>
          <fpage>109–128</fpage>
          <comment>publisher: JOHN BENJAMINS BV</comment>
        </element-citation>
      </ref>
      <ref id="ref-138000">
        <element-citation publication-type="article-journal">
          <article-title>Accurate methods for the statistics of surprise and coincidence</article-title>
          <source>Computational linguistics</source>
          <person-group person-group-type="author">
            <name>
              <surname>Dunning</surname>
              <given-names>Ted E.</given-names>
            </name>
          </person-group>
          <date>
            <year>1993</year>
          </date>
          <volume>19</volume>
          <issue>1</issue>
          <fpage>61–74</fpage>
        </element-citation>
      </ref>
      <ref id="ref-137991">
        <element-citation publication-type="article-journal">
          <article-title>Stylometry with R: A package for computational text analysis</article-title>
          <source>The R Journal</source>
          <person-group person-group-type="author">
            <name>
              <surname>Eder</surname>
              <given-names>Maciej</given-names>
            </name>
            <name>
              <surname>Rybicki</surname>
              <given-names>Jan</given-names>
            </name>
            <name>
              <surname>Kestemont</surname>
              <given-names>Mike</given-names>
            </name>
          </person-group>
          <date>
            <year>2016</year>
          </date>
          <volume>8</volume>
          <issue>1</issue>
          <issn>2073-4859</issn>
          <pub-id pub-id-type="doi">10.32614/rj-2016-007</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.32614/rj-2016-007">https://doi.org/10.32614/rj-2016-007</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138017">
        <element-citation publication-type="article-journal">
          <article-title>Least angle regression</article-title>
          <source>The Annals of statistics</source>
          <person-group person-group-type="author">
            <name>
              <surname>Efron</surname>
              <given-names>Bradley</given-names>
            </name>
            <name>
              <surname>Hastie</surname>
              <given-names>Trevor</given-names>
            </name>
            <name>
              <surname>Johnstone</surname>
              <given-names>Iain</given-names>
            </name>
            <name>
              <surname>Tibshirani</surname>
              <given-names>Robert</given-names>
            </name>
          </person-group>
          <date>
            <day>1</day>
            <month>4</month>
            <year>2004</year>
          </date>
          <volume>32</volume>
          <issue>2</issue>
          <fpage>407</fpage>
          <lpage>499</lpage>
          <issn>0090-5364</issn>
          <pub-id pub-id-type="doi">10.1214/009053604000000067</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1214/009053604000000067">https://doi.org/10.1214/009053604000000067</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138003">
        <element-citation publication-type="thesis">
          <source>Aspects of the idiolect of Queen Elizabeth I: A diachronic study on sociolinguistic principles</source>
          <person-group person-group-type="author">
            <name>
              <surname>Evans</surname>
              <given-names>Mel</given-names>
            </name>
          </person-group>
          <publisher-name>University of Sheffield</publisher-name>
          <date>
            <year>2011</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-138016">
        <element-citation publication-type="book">
          <source>Discovering statistics using SPSS: Book plus code for E version of text</source>
          <person-group person-group-type="author">
            <name>
              <surname>Field</surname>
              <given-names>Andy</given-names>
            </name>
          </person-group>
          <publisher-name>SAGE Publications Limited</publisher-name>
          <date>
            <year>2009</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-138034">
        <element-citation publication-type="article-journal">
          <article-title>Théâtre classique</article-title>
          <source>Université Paris-IV Sorbonne http://www. theatreclassique. fr</source>
          <person-group person-group-type="author">
            <name>
              <surname>Fièvre</surname>
              <given-names>Paul</given-names>
            </name>
          </person-group>
          <date>
            <year>2007</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-137994">
        <element-citation publication-type="paper-conference">
          <source>Trameur: A framework for annotated text corpora exploration</source>
          <person-group person-group-type="author">
            <name>
              <surname>Fleury</surname>
              <given-names>Serge</given-names>
            </name>
            <name>
              <surname>Zimina</surname>
              <given-names>Maria</given-names>
            </name>
          </person-group>
          <date>
            <year>2014</year>
          </date>
          <fpage>57–61</fpage>
        </element-citation>
      </ref>
      <ref id="ref-137972">
        <element-citation publication-type="article-journal">
          <article-title>Stylochronometry with substrings, or: A poet young and old</article-title>
          <source>Literary and Linguistic Computing</source>
          <person-group person-group-type="author">
            <name>
              <surname>Forsyth</surname>
              <given-names>R.</given-names>
            </name>
          </person-group>
          <date>
            <day>1</day>
            <month>12</month>
            <year>1999</year>
          </date>
          <volume>14</volume>
          <issue>4</issue>
          <fpage>467</fpage>
          <lpage>478</lpage>
          <issn>0268-1145</issn>
          <pub-id pub-id-type="doi">10.1093/llc/14.4.467</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1093/llc/14.4.467">https://doi.org/10.1093/llc/14.4.467</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138014">
        <element-citation publication-type="article-journal">
          <article-title>Shifting Paradigms New Approaches to Horace's Ars Poetica</article-title>
          <person-group person-group-type="author">
            <name>
              <surname>Frischer</surname>
              <given-names>Bernard</given-names>
            </name>
          </person-group>
          <date>
            <year>1991</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-138022">
        <element-citation publication-type="article-journal">
          <article-title>Zola as Imitator of Flaubert's Style</article-title>
          <source>Modern Language Notes</source>
          <person-group person-group-type="author">
            <name>
              <surname>Gauthier</surname>
              <given-names>E. Paul</given-names>
            </name>
          </person-group>
          <date>
            <month>5</month>
            <year>1960</year>
          </date>
          <volume>75</volume>
          <issue>5</issue>
          <fpage>423</fpage>
          <issn>0149-6611</issn>
          <pub-id pub-id-type="doi">10.2307/3039860</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.2307/3039860">https://doi.org/10.2307/3039860</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137996">
        <element-citation publication-type="article-journal">
          <article-title>Entre rupture et continuité, le discours du PCF (1920-2020)</article-title>
          <source>Histoire &amp; mesure</source>
          <person-group person-group-type="author">
            <name>
              <surname>Guaresi</surname>
              <given-names>Magali</given-names>
            </name>
            <name>
              <surname>Mayaffre</surname>
              <given-names>Damon</given-names>
            </name>
            <name>
              <surname>Vanni</surname>
              <given-names>Laurent</given-names>
            </name>
          </person-group>
          <date>
            <day>31</day>
            <month>12</month>
            <year>2021</year>
          </date>
          <volume>XXXVII-1</volume>
          <issue>2</issue>
          <fpage>125</fpage>
          <lpage>162</lpage>
          <issn>0982-1783</issn>
          <pub-id pub-id-type="doi">10.4000/histoiremesure.14904</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.4000/histoiremesure.14904">https://doi.org/10.4000/histoiremesure.14904</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137967">
        <element-citation publication-type="chapter">
          <chapter-title>Idiolects</chapter-title>
          <source>Content and modality: Themes from the philosophy of Robert Stalnaker</source>
          <person-group person-group-type="author">
            <name>
              <surname>Heck</surname>
              <given-names>Richard</given-names>
            </name>
          </person-group>
          <publisher-name>Oxford University Press on Demand</publisher-name>
          <date>
            <year>2006</year>
          </date>
          <fpage>61</fpage>
          <lpage>92</lpage>
        </element-citation>
      </ref>
      <ref id="ref-137993">
        <element-citation publication-type="article">
          <article-title>Manuel  de  TXM,  Version  0.7.9</article-title>
          <person-group person-group-type="author">
            <name>
              <surname>Heiden</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Decorde</surname>
              <given-names>M.</given-names>
            </name>
            <name>
              <surname>Jacquot</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Pincemin</surname>
              <given-names>B.</given-names>
            </name>
          </person-group>
          <publisher-name>ENS de Lyon &amp; Université de Franche-Comté</publisher-name>
          <date>
            <year>2018</year>
          </date>
          <ext-link ext-link-type="uri" ns0:href="http://textometrie.ens-lyon.fr/files/documentation/Manuel%20de%20TXM%200.7%20FR.pdf">http://textometrie.ens-lyon.fr/files/documentation/Manuel%20de%20TXM%200.7%20FR.pdf</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137986">
        <element-citation publication-type="article-journal">
          <article-title>Pause Patterns in Shakespeare's Verse: Canon and Chronology</article-title>
          <source>Literary and Linguistic Computing</source>
          <person-group person-group-type="author">
            <name>
              <surname>Jackson</surname>
              <given-names>MacD. P.</given-names>
            </name>
          </person-group>
          <date>
            <day>1</day>
            <month>4</month>
            <year>2002</year>
          </date>
          <volume>17</volume>
          <issue>1</issue>
          <fpage>37</fpage>
          <lpage>46</lpage>
          <issn>0268-1145</issn>
          <pub-id pub-id-type="doi">10.1093/llc/17.1.37</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1093/llc/17.1.37">https://doi.org/10.1093/llc/17.1.37</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137973">
        <element-citation publication-type="article-journal">
          <article-title>A search for trends in the poetic style of WB Yeats</article-title>
          <source>ALLC Journal</source>
          <person-group person-group-type="author">
            <name>
              <surname>Jaynes</surname>
              <given-names>Joseph T.</given-names>
            </name>
          </person-group>
          <date>
            <year>1980</year>
          </date>
          <volume>1</volume>
          <fpage>11–18</fpage>
        </element-citation>
      </ref>
      <ref id="ref-138026">
        <element-citation publication-type="paper-conference">
          <source>Overview of the Cross-domain Authorship Attribution Task at PAN 2019.</source>
          <person-group person-group-type="author">
            <name>
              <surname>Kestemont</surname>
              <given-names>Mike</given-names>
            </name>
            <name>
              <surname>Stamatatos</surname>
              <given-names>Efstathios</given-names>
            </name>
            <name>
              <surname>Manjavacas</surname>
              <given-names>Enrique</given-names>
            </name>
            <name>
              <surname>Daelemans</surname>
              <given-names>Walter</given-names>
            </name>
            <name>
              <surname>Potthast</surname>
              <given-names>Martin</given-names>
            </name>
            <name>
              <surname>Stein</surname>
              <given-names>Benno</given-names>
            </name>
          </person-group>
          <date>
            <year>2019</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-138007">
        <element-citation publication-type="article-journal">
          <article-title>Elements of Style Change</article-title>
          <source>University of Dublin, Ireland</source>
          <person-group person-group-type="author">
            <name>
              <surname>Klaussner</surname>
              <given-names>Carmen</given-names>
            </name>
          </person-group>
          <date>
            <year>2017</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-138005">
        <element-citation publication-type="chapter">
          <chapter-title>Stylochronometry: Timeline Prediction in Stylometric Analysis</chapter-title>
          <source>Research and Development in Intelligent Systems XXXII</source>
          <person-group person-group-type="author">
            <name>
              <surname>Klaussner</surname>
              <given-names>Carmen</given-names>
            </name>
            <name>
              <surname>Vogel</surname>
              <given-names>Carl</given-names>
            </name>
          </person-group>
          <person-group person-group-type="editor">
            <name>
              <surname>Bremer</surname>
              <given-names>Max</given-names>
            </name>
            <name>
              <surname>Petridis</surname>
              <given-names>Miltos</given-names>
            </name>
          </person-group>
          <publisher-name>Springer International Publishing</publisher-name>
          <date>
            <year>2015</year>
          </date>
          <fpage>91</fpage>
          <lpage>106</lpage>
          <pub-id pub-id-type="doi">10.1007/978-3-319-25032-8_6</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1007/978-3-319-25032-8_6">https://doi.org/10.1007/978-3-319-25032-8_6</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138006">
        <element-citation publication-type="article-journal">
          <article-title>Temporal predictive regression models for linguistic style analysis</article-title>
          <source>Journal of Language Modelling</source>
          <person-group person-group-type="author">
            <name>
              <surname>Klaussner</surname>
              <given-names>Carmen</given-names>
            </name>
            <name>
              <surname>Vogel</surname>
              <given-names>Carl</given-names>
            </name>
          </person-group>
          <date>
            <day>31</day>
            <month>8</month>
            <year>2018</year>
          </date>
          <volume>6</volume>
          <issue>1</issue>
          <issn>2299-8470</issn>
          <pub-id pub-id-type="doi">10.15398/jlm.v6i1.177</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.15398/jlm.v6i1.177">https://doi.org/10.15398/jlm.v6i1.177</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138018">
        <element-citation publication-type="paper-conference">
          <source>Exploiting stylistic idiosyncrasies for authorship attribution</source>
          <person-group person-group-type="author">
            <name>
              <surname>Koppel</surname>
              <given-names>Moshe</given-names>
            </name>
            <name>
              <surname>Schler</surname>
              <given-names>Jonathan</given-names>
            </name>
          </person-group>
          <date>
            <year>2003</year>
          </date>
          <volume>69</volume>
          <fpage>72–80</fpage>
        </element-citation>
      </ref>
      <ref id="ref-137992">
        <element-citation publication-type="article">
          <article-title>Lexico  3  version  3.41  février  03.  Outils  de  statistique  textuelle. Manuel  d’Utilisation.</article-title>
          <person-group person-group-type="author">
            <name>
              <surname>Lamalle</surname>
              <given-names>C.</given-names>
            </name>
            <name>
              <surname>Martinez </surname>
              <given-names>W.</given-names>
            </name>
            <name>
              <surname>Fleury</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Salem</surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Fracchiolla</surname>
              <given-names>B.</given-names>
            </name>
            <name>
              <surname>Kuncova </surname>
              <given-names>A.</given-names>
            </name>
            <name>
              <surname>Maisondieu</surname>
              <given-names>A.</given-names>
            </name>
          </person-group>
          <publisher-name>Laboratoire  SYLED-CLA2T, Université de la Sorbonne nouvelle - Paris 3</publisher-name>
          <date>
            <year>2003</year>
          </date>
          <ext-link ext-link-type="uri" ns0:href="http://www.lexi-co.com/ressources/manuel-3.41.pdf.">http://www.lexi-co.com/ressources/manuel-3.41.pdf.</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137979">
        <element-citation publication-type="article-journal">
          <article-title>Re-Counting Plato a Computer Analysis of Plato's Style</article-title>
          <person-group person-group-type="author">
            <name>
              <surname>Ledger</surname>
              <given-names>Gerard R.</given-names>
            </name>
          </person-group>
          <date>
            <year>1989</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-137971">
        <element-citation publication-type="article-journal">
          <article-title>The Balance Between Quantitative and Qualitative Literary Stylistics: How the Method of ‘Motifs’ Can Help</article-title>
          <source>The Grammar of Genres and Styles: From Discrete to Non-discrete Units</source>
          <person-group person-group-type="author">
            <name>
              <surname>Legallois</surname>
              <given-names>Dominique</given-names>
            </name>
            <name>
              <surname>Charnois</surname>
              <given-names>Thierry</given-names>
            </name>
            <name>
              <surname>Larjavaara</surname>
              <given-names>Meri</given-names>
            </name>
          </person-group>
          <date>
            <year>2018</year>
          </date>
          <fpage>164–93</fpage>
        </element-citation>
      </ref>
      <ref id="ref-138002">
        <element-citation publication-type="article-journal">
          <article-title>The persistence of variation in individual grammars: Copula absence in ?urban sojourners? and their stay-at-home peers, Bequia (St Vincent and the Grenadines)</article-title>
          <source>Journal of Sociolinguistics</source>
          <person-group person-group-type="author">
            <name>
              <surname>Meyerhoff</surname>
              <given-names>Miriam</given-names>
            </name>
            <name>
              <surname>Walker</surname>
              <given-names>James A.</given-names>
            </name>
          </person-group>
          <date>
            <month>6</month>
            <year>2007</year>
          </date>
          <volume>11</volume>
          <issue>3</issue>
          <fpage>346</fpage>
          <lpage>366</lpage>
          <issn>1360-6441</issn>
          <pub-id pub-id-type="doi">10.1111/j.1467-9841.2007.00327.x</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1111/j.1467-9841.2007.00327.x">https://doi.org/10.1111/j.1467-9841.2007.00327.x</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137997">
        <element-citation publication-type="article-journal">
          <article-title>“I entirely understand” is a Blairism: The methodology of identifying idiolectal collocations</article-title>
          <source>International Journal of Corpus Linguistics</source>
          <person-group person-group-type="author">
            <name>
              <surname>Mollin</surname>
              <given-names>Sandra</given-names>
            </name>
          </person-group>
          <date>
            <day>20</day>
            <month>8</month>
            <year>2009</year>
          </date>
          <volume>14</volume>
          <issue>3</issue>
          <fpage>367</fpage>
          <lpage>392</lpage>
          <issn>1384-6655</issn>
          <pub-id pub-id-type="doi">10.1075/ijcl.14.3.04mol</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1075/ijcl.14.3.04mol">https://doi.org/10.1075/ijcl.14.3.04mol</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137974">
        <element-citation publication-type="chapter">
          <chapter-title>A Multi-Dimensional Analysis of Style in Samuel Beckett’s Prose Works.</chapter-title>
          <source>Research in Humanities Computing 4.</source>
          <person-group person-group-type="author">
            <name>
              <surname>Opas</surname>
              <given-names>L.L</given-names>
            </name>
          </person-group>
          <person-group person-group-type="editor">
            <name>
              <surname>Hocking</surname>
              <given-names>S.</given-names>
            </name>
            <name>
              <surname>Ide</surname>
              <given-names>N.</given-names>
            </name>
          </person-group>
          <publisher-name>Clarendon Press.</publisher-name>
          <publisher-loc>Oxford</publisher-loc>
          <date>
            <year>1996</year>
          </date>
          <conf-loc>Oxford</conf-loc>
        </element-citation>
      </ref>
      <ref id="ref-138008">
        <element-citation publication-type="article-journal">
          <article-title>Early Modern Multiloquent Authors (EMMA): Designing a large-scale corpus of individuals’ languages</article-title>
          <source>ICAME journal</source>
          <person-group person-group-type="author">
            <name>
              <surname>Petré</surname>
              <given-names>Peter</given-names>
            </name>
            <name>
              <surname>Anthonissen</surname>
              <given-names>Lynn</given-names>
            </name>
            <name>
              <surname>Budts</surname>
              <given-names>Sara</given-names>
            </name>
            <name>
              <surname>Manjavacas</surname>
              <given-names>Enrique</given-names>
            </name>
            <name>
              <surname>Silva</surname>
              <given-names>Emma-Louise</given-names>
            </name>
            <name>
              <surname>Standing</surname>
              <given-names>William</given-names>
            </name>
            <name>
              <surname>Strik</surname>
              <given-names>Odile A.O.</given-names>
            </name>
          </person-group>
          <date>
            <day>1</day>
            <month>3</month>
            <year>2019</year>
          </date>
          <volume>43</volume>
          <issue>1</issue>
          <fpage>83</fpage>
          <lpage>122</lpage>
          <issn>1502-5462</issn>
          <pub-id pub-id-type="doi">10.2478/icame-2019-0004</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.2478/icame-2019-0004">https://doi.org/10.2478/icame-2019-0004</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138009">
        <element-citation publication-type="article-journal">
          <article-title>The real-time dynamics of the individual and the community in grammaticalization</article-title>
          <source>Language</source>
          <person-group person-group-type="author">
            <name>
              <surname>Petré</surname>
              <given-names>Peter</given-names>
            </name>
            <name>
              <surname>Van de Velde</surname>
              <given-names>Freek</given-names>
            </name>
          </person-group>
          <date>
            <year>2018</year>
          </date>
          <volume>94</volume>
          <issue>4</issue>
          <fpage>867</fpage>
          <lpage>901</lpage>
          <issn>1535-0665</issn>
          <pub-id pub-id-type="doi">10.1353/lan.2018.0056</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1353/lan.2018.0056">https://doi.org/10.1353/lan.2018.0056</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138023">
        <element-citation publication-type="book">
          <source>Pourquoi le style change-t-il?</source>
          <person-group person-group-type="author">
            <name>
              <surname>Philippe</surname>
              <given-names>Gilles</given-names>
            </name>
          </person-group>
          <publisher-name>Les Impressions Nouvelles</publisher-name>
          <date>
            <day>8</day>
            <month>4</month>
            <year>2021</year>
          </date>
          <isbn>9782874498640</isbn>
          <pub-id pub-id-type="doi">10.14375/np.9782874498671</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.14375/np.9782874498671">https://doi.org/10.14375/np.9782874498671</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137990">
        <element-citation publication-type="article-journal">
          <article-title>Sept logiciels de textométrie</article-title>
          <person-group person-group-type="author">
            <name>
              <surname>Pincemin</surname>
              <given-names>Bénédicte</given-names>
            </name>
          </person-group>
          <date>
            <year>2018</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-138025">
        <element-citation publication-type="article-journal">
          <article-title>Semeval 2015, task 7: Diachronic text evaluation</article-title>
          <source>Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)</source>
          <person-group person-group-type="author">
            <name>
              <surname>Popescu</surname>
              <given-names>Octavian</given-names>
            </name>
            <name>
              <surname>Strapparava</surname>
              <given-names>Carlo</given-names>
            </name>
          </person-group>
          <date>
            <year>2015</year>
          </date>
          <fpage>870</fpage>
          <lpage>878</lpage>
          <pub-id pub-id-type="doi">10.18653/v1/s15-2147</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.18653/v1/s15-2147">https://doi.org/10.18653/v1/s15-2147</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138004">
        <element-citation publication-type="chapter">
          <chapter-title>Historical sociolinguistics: The corpus of early english correspondence</chapter-title>
          <source>Creating and digitizing language corpora</source>
          <person-group person-group-type="author">
            <name>
              <surname>Raumolin-Brunberg</surname>
              <given-names>Helena</given-names>
            </name>
            <name>
              <surname>Nevalainen</surname>
              <given-names>Terttu</given-names>
            </name>
          </person-group>
          <person-group person-group-type="editor">
            <name>
              <surname>Beal, et al.</surname>
              <given-names>Joan C.</given-names>
            </name>
          </person-group>
          <publisher-name>Palgrave Macmillan UK</publisher-name>
          <date>
            <year>2007</year>
          </date>
          <fpage>148</fpage>
          <lpage>171</lpage>
          <pub-id pub-id-type="doi">10.1057/9780230223202_7</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1057/9780230223202_7">https://doi.org/10.1057/9780230223202_7</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137978">
        <element-citation publication-type="article-journal">
          <article-title>Plato and the Computer</article-title>
          <source>Ancient Philosophy</source>
          <person-group person-group-type="author">
            <name>
              <surname>Robinson</surname>
              <given-names>T.M.</given-names>
            </name>
          </person-group>
          <date>
            <year>1992</year>
          </date>
          <volume>12</volume>
          <issue>2</issue>
          <fpage>375</fpage>
          <lpage>382</lpage>
          <issn>0740-2007</issn>
          <pub-id pub-id-type="doi">10.5840/ancientphil19921228</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.5840/ancientphil19921228">https://doi.org/10.5840/ancientphil19921228</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138013">
        <element-citation publication-type="article-journal">
          <article-title>A Method for Chronologically Ordering Archaeological Deposits</article-title>
          <source>American Antiquity</source>
          <person-group person-group-type="author">
            <name>
              <surname>Robinson</surname>
              <given-names>W. S.</given-names>
            </name>
          </person-group>
          <date>
            <month>4</month>
            <year>1951</year>
          </date>
          <volume>16</volume>
          <issue>4</issue>
          <fpage>293</fpage>
          <lpage>301</lpage>
          <issn>0002-7316</issn>
          <pub-id pub-id-type="doi">10.2307/276978</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.2307/276978">https://doi.org/10.2307/276978</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138029">
        <element-citation publication-type="chapter">
          <chapter-title>Continuous n-gram representations for authorship attribution</chapter-title>
          <source>Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers</source>
          <person-group person-group-type="author">
            <name>
              <surname>Sari</surname>
              <given-names>Yunita</given-names>
            </name>
            <name>
              <surname>Vlachos</surname>
              <given-names>Andreas</given-names>
            </name>
            <name>
              <surname>Stevenson</surname>
              <given-names>Mark</given-names>
            </name>
          </person-group>
          <date>
            <year>2017</year>
          </date>
          <fpage>267</fpage>
          <lpage>273</lpage>
          <pub-id pub-id-type="doi">10.18653/v1/e17-2043</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.18653/v1/e17-2043">https://doi.org/10.18653/v1/e17-2043</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138033">
        <element-citation publication-type="article-journal">
          <article-title>Who Authored <italic>On Liberty</italic>? Stylometric Evidence on Harriet Taylor Mill's Contribution</article-title>
          <source>Utilitas</source>
          <person-group person-group-type="author">
            <name>
              <surname>Schmidt-Petri</surname>
              <given-names>Christoph</given-names>
            </name>
            <name>
              <surname>Schefczyk</surname>
              <given-names>Michael</given-names>
            </name>
            <name>
              <surname>Osburg</surname>
              <given-names>Lilly</given-names>
            </name>
          </person-group>
          <date>
            <day>2</day>
            <month>12</month>
            <year>2021</year>
          </date>
          <volume>34</volume>
          <issue>2</issue>
          <fpage>120</fpage>
          <lpage>138</lpage>
          <issn>0953-8208</issn>
          <pub-id pub-id-type="doi">10.1017/s0953820821000339</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1017/s0953820821000339">https://doi.org/10.1017/s0953820821000339</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138011">
        <element-citation publication-type="article-journal">
          <article-title>The Corpus for Idiolectal Research (CIDRE)</article-title>
          <source>Journal of Open Humanities Data</source>
          <person-group person-group-type="author">
            <name>
              <surname>Seminck</surname>
              <given-names>Olga</given-names>
            </name>
            <name>
              <surname>Gambette</surname>
              <given-names>Philippe</given-names>
            </name>
            <name>
              <surname>Legallois</surname>
              <given-names>Dominique</given-names>
            </name>
            <name>
              <surname>Poibeau</surname>
              <given-names>Thierry</given-names>
            </name>
          </person-group>
          <date>
            <year>2021</year>
          </date>
          <volume>7</volume>
          <fpage>15</fpage>
          <issn>2059-481X</issn>
          <pub-id pub-id-type="doi">10.5334/johd.42</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.5334/johd.42">https://doi.org/10.5334/johd.42</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137983">
        <element-citation publication-type="article-journal">
          <article-title>Stylistic Constancy and Change across Literary Corpora: Using Measures of Lexical Richness to Date Works</article-title>
          <source>Computers and the Humanities</source>
          <person-group person-group-type="author">
            <name>
              <surname>Smith</surname>
              <given-names>Joseph A.</given-names>
            </name>
            <name>
              <surname>Kelly</surname>
              <given-names>Coleen</given-names>
            </name>
          </person-group>
          <date>
            <year>2002</year>
          </date>
          <volume>36</volume>
          <issue>4</issue>
          <fpage>411</fpage>
          <lpage>430</lpage>
          <issn>0010-4817</issn>
          <pub-id pub-id-type="doi">10.1023/a:1020201615753</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1023/a:1020201615753">https://doi.org/10.1023/a:1020201615753</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138027">
        <element-citation publication-type="article-journal">
          <article-title>On the robustness of authorship attribution based on character n-gram features</article-title>
          <source>JL &amp; Pol'y</source>
          <person-group person-group-type="author">
            <name>
              <surname>Stamatatos</surname>
              <given-names>Efstathios</given-names>
            </name>
          </person-group>
          <date>
            <year>2012</year>
          </date>
          <volume>21</volume>
          <fpage>421</fpage>
          <comment>publisher: HeinOnline</comment>
        </element-citation>
      </ref>
      <ref id="ref-137970">
        <element-citation publication-type="article-journal">
          <article-title>Stylochronometry: Stylistic Development, Sequence of Composition, and Relative Dating</article-title>
          <source>Literary and Linguistic Computing</source>
          <person-group person-group-type="author">
            <name>
              <surname>Stamou</surname>
              <given-names>C.</given-names>
            </name>
          </person-group>
          <date>
            <day>1</day>
            <month>10</month>
            <year>2007</year>
          </date>
          <volume>23</volume>
          <issue>2</issue>
          <fpage>181</fpage>
          <lpage>199</lpage>
          <issn>0268-1145</issn>
          <pub-id pub-id-type="doi">10.1093/llc/fqm029</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1093/llc/fqm029">https://doi.org/10.1093/llc/fqm029</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138030">
        <element-citation publication-type="paper-conference">
          <source>What represents “style” in authorship attribution?</source>
          <person-group person-group-type="author">
            <name>
              <surname>Sundararajan</surname>
              <given-names>Kalaivani</given-names>
            </name>
            <name>
              <surname>Woodard</surname>
              <given-names>Damon</given-names>
            </name>
          </person-group>
          <date>
            <year>2018</year>
          </date>
          <fpage>2814–2822</fpage>
        </element-citation>
      </ref>
      <ref id="ref-137980">
        <element-citation publication-type="article-journal">
          <article-title>A Multivariate Synthesis of Published Platonic Stylometric Data</article-title>
          <source>Literary and Linguistic Computing</source>
          <person-group person-group-type="author">
            <name>
              <surname>Temple</surname>
              <given-names>J. T.</given-names>
            </name>
          </person-group>
          <date>
            <day>1</day>
            <month>6</month>
            <year>1996</year>
          </date>
          <volume>11</volume>
          <issue>2</issue>
          <fpage>67</fpage>
          <lpage>75</lpage>
          <issn>0268-1145</issn>
          <pub-id pub-id-type="doi">10.1093/llc/11.2.67</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1093/llc/11.2.67">https://doi.org/10.1093/llc/11.2.67</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138019">
        <element-citation publication-type="book">
          <source>Gustave Flaubert</source>
          <person-group person-group-type="author">
            <name>
              <surname>Thibaudet</surname>
              <given-names>Albert</given-names>
            </name>
          </person-group>
          <publisher-name>Éditions Gallimard</publisher-name>
          <date>
            <year>1922</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-137995">
        <element-citation publication-type="paper-conference">
          <source>Hyperdeep: deep learning descriptif pour l'analyse de données textuelles</source>
          <person-group person-group-type="author">
            <name>
              <surname>Vanni</surname>
              <given-names>Laurent</given-names>
            </name>
            <name>
              <surname>Corneli</surname>
              <given-names>Marco</given-names>
            </name>
            <name>
              <surname>Longrée</surname>
              <given-names>Dominique</given-names>
            </name>
            <name>
              <surname>Mayaffre</surname>
              <given-names>Damon</given-names>
            </name>
            <name>
              <surname>Precioso</surname>
              <given-names>Frédéric</given-names>
            </name>
          </person-group>
          <date>
            <year>2020</year>
          </date>
        </element-citation>
      </ref>
      <ref id="ref-137975">
        <element-citation publication-type="article-journal">
          <article-title>Traditional and emotional stylometric analysis of the songs of Beatles Paul McCartney and John Lennon</article-title>
          <source>Computers and the Humanities</source>
          <person-group person-group-type="author">
            <name>
              <surname>Whissell</surname>
              <given-names>Cynthia</given-names>
            </name>
          </person-group>
          <date>
            <year>1996</year>
          </date>
          <volume>30</volume>
          <issue>3</issue>
          <fpage>257</fpage>
          <lpage>265</lpage>
          <issn>0010-4817</issn>
          <pub-id pub-id-type="doi">10.1007/bf00055109</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.1007/bf00055109">https://doi.org/10.1007/bf00055109</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-137977">
        <element-citation publication-type="article-journal">
          <article-title>A multivariate analysis of Platonic prose rhythm</article-title>
          <source>Computer studies in the humanities and verbal behavior</source>
          <person-group person-group-type="author">
            <name>
              <surname>Wishart</surname>
              <given-names>David</given-names>
            </name>
            <name>
              <surname>Leach</surname>
              <given-names>Stephen V.</given-names>
            </name>
          </person-group>
          <date>
            <year>1970</year>
          </date>
          <volume>3</volume>
          <issue>2</issue>
          <fpage>90–99</fpage>
        </element-citation>
      </ref>
      <ref id="ref-137998">
        <element-citation publication-type="article">
          <article-title>The British National Corpus XML Edition DVD</article-title>
          <person-group person-group-type="author">
            <name>
              <surname>XML</surname>
              <given-names>BNC</given-names>
            </name>
          </person-group>
          <publisher-name>Oxford: Oxford University Press</publisher-name>
          <date>
            <year>2007</year>
          </date>
          <ext-link ext-link-type="uri" ns0:href="http://www.natcorp.ox.ac.uk/docs/URG/">http://www.natcorp.ox.ac.uk/docs/URG/</ext-link>
        </element-citation>
      </ref>
      <ref id="ref-138031">
        <element-citation publication-type="article-journal">
          <article-title>Syntax encoding with application in authorship attribution</article-title>
          <source>Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          <person-group person-group-type="author">
            <name>
              <surname>Zhang</surname>
              <given-names>Richong</given-names>
            </name>
            <name>
              <surname>Hu</surname>
              <given-names>Zhiyuan</given-names>
            </name>
            <name>
              <surname>Guo</surname>
              <given-names>Hongyu</given-names>
            </name>
            <name>
              <surname>Mao</surname>
              <given-names>Yongyi</given-names>
            </name>
          </person-group>
          <date>
            <year>2018</year>
          </date>
          <fpage>2742</fpage>
          <lpage>2753</lpage>
          <pub-id pub-id-type="doi">10.18653/v1/d18-1294</pub-id>
          <ext-link ext-link-type="uri" ns0:href="https://doi.org/10.18653/v1/d18-1294">https://doi.org/10.18653/v1/d18-1294</ext-link>
        </element-citation>
      </ref>
    </ref-list>
  </back>
</article>