The Afterlives of Shakespeare and Company in Online Social Readership

The growth of social reading platforms such as Goodreads and LibraryThing enables us to analyze reading activity at very large scale and in remarkable detail. But twenty-first century systems give us a perspective only on contemporary readers. Meanwhile, the digitization of the lending library records of Shakespeare and Company provides a window into the reading activity of an earlier, smaller community in interwar Paris. In this article, we explore the extent to which we can make comparisons between the Shakespeare and Company and Goodreads communities. By quantifying similarities and differences, we can identify patterns in how works have risen or fallen in popularity across these datasets. We can also measure differences in how works are received by measuring similarities and differences in co-reading patterns. Finally, by examining the complete networks of co-readership, we can observe changes in the overall structures of literary reception.

In this article, we pursue three main strategies for computational comparison of SC and Goodreads.First, we begin by comparing the basic properties of individual works across the two datasets.Of the books that appear in both datasets, we find that, even after a century has passed, almost twenty percent of the hundred most borrowed works in SC are still among the hundred most reviewed books from this set on Goodreads.These enduringly popular books were mostly contemporary to the SC collection; the older books that were favorites of SC readers are less likely to be read and reviewed today.Relatedly, we find that the books that rose the most in popularity from SC to Goodreads include titles by nineteenth-century authors whose work was out of fashion among SC readers (e.g., Jane Austen), while books that fell in popularity include then-contemporary authors (e.g., Dorothy Richardson) who are today primarily the province of scholars.These changes demonstrate the simultaneous expansion and winnowing over time to reach one modern community understanding of "the classics."2Our results also suggest that readers today treat many of the modernist-era texts that were first consumed by SC members as an era cohort (that is, as a historically distinct and coherent set) that has remained broadly legible for nearly a century.
Second, we consider the two communities from a network perspective, comparing works on the basis of their connection to other works.We find that many of the books that retain popularity from SC to Goodreads also maintain the same patterns of co-readership between the two groups.This fact again suggests that, at least in the case of books read by SC and Goodreads members, the literary contexts of reception can be stable across widely diverse environments.
Third, we consider the structure of the two networks as a whole.We find that, rather than showing which books were most central, or core, to each community, network analysis magnifies the reading habits of two prolific SC readers (and friends) in a way that may be useful for newly directed historical research.

DATA COLLECTION AND PROCESSING
We match 4,460 (74.1 percent) of the 6,018 SC books to records in Goodreads.We use the Goodreads API to search for the book IDs corresponding to each title and author.We then manually check each match and update the book IDs when no match is found or when the matched book is incorrect.Many books have multiple versions in Goodreads, and we prioritize the most specific (e.g., individual volumes rather than collected works) and popular (by numbers of ratings and reviews).From the ranked set of results for each book, we select the book with the highest number of ratings as our final match.
After matching each SC book with its Goodreads book ID, we scrape the Goodreads page for that book.This data includes the ISBN, the number of pages, number of reviews, number of ratings, average and distribution of the ratings, original publication year, and the shelves, genres, and lists to which the book is assigned by Goodreads members.After scraping, we have a set of 4,454 books by 1,726 authors, with 3,940 of those books receiving at least one rating and 3,223 books receiving at least one review.We release this connected metadata at https://github.com/gyauney/shakespeare-and-company-social-readership.

COMPARING POPULARITY IN SC AND GOODREADS
To measure the relative popularity of books in the two datasets, we count the number of times the book was borrowed from SC and the number of text reviews the book received on Goodreads.As shown in the plots below, these popularity metrics have a much wider range for Goodreads (maximum reviews: 77,817) than for SC (maximum borrows: fifty-six).Since many SC books have the same (small) number of borrows, fine-grained comparisons of popularity rank are possible only in the Goodreads data.In all the following results, we show only books that appeared in both datasets (according to the matching process described above) (fig.1).We begin by listing the most popular books in each of the datasets.James Joyce, Dorothy Richardson, and Katherine Mansfield dominate the SC most popular books-all authors contemporary to SC-while Jane Austen, Charlotte Brontë, and F. Scott Fitzgerald (an SC contemporary and, briefly, member, but not a particular community insider), dominate the Goodreads most popular books.The lists of most popular authors in the two datasets reflect the same pattern.One difference is that D. H. Lawrence is the most borrowed author in SC, despite having only one top-ten most-borrowed book.In order to reduce bias toward the most popular works, we next convert our popularity metrics to ranks, where the 0-ranked book or author is the most popular.We scale these ranks to a [0, 1] range and compare the ranks in SC and Goodreads in the following plots.According to these ranks, the popularity of both the books and authors is correlated across the SC and Goodreads datasets; if a book or author is popular on Goodreads, it is also likely to be popular in SC.But there are many outliers, indicating cases where a book or author is much more popular in one dataset than in the other.These outliers represent a rise or fall in popularity rank.Of the shared books, eighteen of the hundred most borrowed SC books are among the hundred most reviewed Goodreads books (of the books included in our datasets).

ROSE THE MOST IN RANK (FROM SC TO GOODREADS)
To tease apart which books were popular in Goodreads in comparison with SC, we calculate each book's change in rank: its Goodreads rank subtracted from its SC rank.Of the books that rose the most in popularity on Goodreads, many are older books-published before SC opened.

FELL THE MOST IN RANK (FROM SC TO GOODREADS)
Of the books that fell most in rank, all were contemporary to the active years of SC.
While many of these authors are still well known in academic or literary contexts (e.g., Katherine Mansfield), they are not popular with the majority of readers on Goodreads.For example, even in the 1930s, the modernist and experimental author Dorothy M. Richardson did not enjoy mainstream popularity, but her books were borrowed fairly often by SC patrons, who would have been reading her work alongside contemporaries such as James Joyce and Virginia Woolf.
Richardson was also in personal contact with Sylvia Beach.But her work is today largely ignored within the Goodreads community.

LISTS: CHANGE IN RANK (SC TO GOODREADS)
The previous section shows that there are substantial differences in interactions for both works and authors between the two datasets.But Goodreads also includes features that allow us to explore how readers perceive these changes; these lists can indicate why certain titles rose or fell in popularity.Goodreads lists are ranked sets of books, created by users and to which any user may add any books.Users influence rankings on these lists by voting individual books up or down.These lists are frequent sources of both creative experimentation and the codification of community values.
Examining the lists that Goodreads users assign to the SC books, we find strong evidence of the canonical status of many SC titles.However, popular books are more likely to be added to lists, and so unpopular books are missing from these categorizations.What about the books that were popular among SC patrons but are not as popular on Goodreads?Did SC members only read what they themselves would have recognized as the "best books" and "books everyone should read," or was their reading more self-consciously avant-garde in historical context?
To answer these questions, we rank the lists by their change in mean rank from SC to Goodreads.For each list, we represent the rank as the mean of the ranks of the books that have been assigned to that list by at least five users.We also require that at least ten books in our matched dataset be assigned to each list; other lists are discarded.

SHELVES: CHANGE IN RANK (FROM SC TO GOODREADS)
Similar to lists are Goodreads shelves.Where lists are ranked, shelves function more like free-text tags.Users employ shelves both for personal tracking (e.g., to-read, my-favorites, readin-2020) and to help build the community's mapping of books.In particular, shelves, unlike lists, often function as genre labels (e.g., romance, historical fiction) on Goodreads, and so their examination can reveal new perceptions and aspects of the books that rose or fell in popularity. 3 we did with lists, we can compare the shelves associated with books that rose and fell the most in popularity.Again, we measure the rank for each shelf as the mean of the ranks of the books assigned to that shelf by at least five users and discard shelves that were assigned to fewer than ten books.
We find that shelves associated with books that rose the most in popularity generally mention genres (thriller-mystery, chick-lit, childrens-lit), classics (the-classics), or school (readin-school), themes that prior work has connected to popular discussions of canonization (Walsh and Antoniak, "The Goodreads 'Classics'").Shelves associated with books that fell in popularity tend to mention specific authors (joyce, d-h-lawrence) or literary criticism (lit-crit, literarycriticism).These results indicate that books that were read in school or that fit genre classifications have remained popular, while books that do not fit genre classifications have decreased in popularity.In sum, our work on lists and shelves shows that even the sustained prominence of authors and books that readers today associate with canonical modernism does not match the status that those authors and books enjoyed among SC members.While literary status can indeed be very durable, our results suggest the early heights of popularity reached by a few authors within the relatively small and distinctive SC group have not been sustained even at the highest end of those evaluated by the larger, later, and more diverse Goodreads community.

CHRONOLOGICAL COMPARISON
In the previous section we noted that the most extreme changes in popularity appeared to be correlated with year of publication.Here we provide a more complete analysis of this relationship.We focus on books published between 1800 to 1940, for which we have the most reliable data.We find that books that are more popular in Goodreads are spread roughly uniformly over the time period, while the works more popular in SC are concentrated between 1910 and 1940.The top hundred works in Goodreads show much greater variation in publication date than do the top hundred works in SC.Of the books that appear in both SC and Goodreads, older titles were less popular in SC but have a greater chance of still being read in Goodreads.

COMPARING CONTEMPORARY LITERATURE TO US BESTSELLERS
Although we observe that the works that have fallen most in popularity tend to be contemporary literature of the 1920s and 1930s, SC readers were nevertheless reading many works that have remained popular.In order to contextualize their reading patterns, we compare to one available source, lists of the ten bestselling novels in the US reported by Publishers Weekly. 4 Focusing on works published from 1920-1929, we compare the number of Goodreads ratings for the ninety-seven distinct US bestsellers (three works were bestsellers in two consecutive years) to the hundred most-borrowed SC works from that decade.The lists are measuring different events (purchases by year vs. borrowing over two decades) and the SC values include some works that are not novels and would not be counted, but they provide a point of comparison.The US bestsellers are mostly unfamiliar to modern readers, with a median of eighty-two ratings and a mean of 7,050.Only six works have more than ten thousand ratings, with The Age of Innocence and All Quiet on the Western Front skewing the mean.In contrast, twenty-six of the top SC works have more than ten thousand ratings, with a median of 1,030 and a mean of 64,151 (skewed by The Great Gatsby at #84).Moreover, many of the books borrowed most often by SC patrons remain popular today, with fifteen of the top twenty-five SC works having more than 10,000 ratings by Goodreads users.The lists of bestsellers and of mostborrowed SC titles are not similar, with only eight works in common, of which five are by Sinclair Lewis.In short, popularity among SC readers was a significantly better predictor of enduring literary status than was commercial success at the time of publication.

COMPARING READING PATTERNS OF POPULAR BOOKS
Reception is defined not just by the frequency with which a book is read, but also by whom and in what contexts it is read.In the previous section, we measured the popularity of individual books, but we can also consider patterns in reading behavior between books.There is a limit to what we can learn from individual book popularity alone, while book co-reading patterns provide a more detailed and nuanced picture.Additionally, such patterns allow us to study co-reading across the entire community of readers rather than limiting ourselves to which books were checked out by a single reader.To compare co-reading patterns between the two periods, we further restrict our dataset of matched titles to 1,685 books for which we have user information from the UCSD Book Graph. 5This dataset contains more user information than is easily available from Goodreads.Each dataset induces a network: in SC, two books are connected if they were borrowed by the same member; in Goodreads, books are connected if they were reviewed by the same user.We restrict our network analysis to the 1,511 books that are connected in both datasets.
We can compare the neighbors of each book in the two communities.If, for a given book, these neighborhoods of books are similar in the two graphs, we have evidence that the book was read in similar textual company by the members of the two communities.If, in contrast, these neighbor graphs are markedly different, we have evidence that a book has been received in the company of different books.For example, a work might be seen as representing a specific genre in SC, but modern readers might consider the same work to be defined primarily by its prestige or "classic" status.
We operationalize similarity between reading patterns across the two graphs by A persistent challenge in this work is that the number of borrowing events in SC is much smaller than the number of reviews on Goodreads.Since most books are borrowed rarely, many of the interactions between books are even more sparse and potentially noisy.We calculate Jensen-Shannon divergence between each book's distributions over co-occurring books after adding a small constant (0.01) to all vector entries to make the problem well-posed with a uniform prior.We limit this analysis to only the 216 books in the top quartile of popularity in both datasets (that is, books borrowed at least four times in SC and rated at least 2,600 times in the UCSD Book Graph) in order to focus on reading patterns of enduringly popular books.
Books with the most similar distributions across SC and Goodreads are listed in Table 9.
These popular books were popular in the same way in SC and Goodreads: often extremely popular and read in conjunction with the same sets of other popular books.

Fig. 1 .
Fig. 1.Kernel density estimate (KDE) plots of the popularity distributions for SC and Goodreads.

Fig. 2 .
Fig. 2.A comparison of author popularity in Goodreads and SC.Each point represents an author, while the x-axis represents the ranked popularity in SC (by number of borrows) and the y-axis represents the ranked popularity in Goodreads (by number of text reviews).We sum the popularity metrics across books for each author.Popularity in Goodreads and SC is correlated (Pearson r=0.51, p<0.05), but there are many outliers representing authors that are much more popular in one dataset than the other.Leo Tolstoy and Louisa May Alcott, for example, are much more popular in Goodreads than they were among SC patrons.

Fig. 3 .
Fig. 3. Relative popularity of titles across SC and Goodreads for the 2225 titles published in the nineteenth and twentieth centuries with non-zero popularity in both datasets.The y-value for each work is the proportion of total borrows in SC accounted for by the work, divided by the proportion of total reviews in Goodreads accounted for by the work.Positive y-values mean that a book was more popular in SC.Negative y-values mean a book is more popular in Goodreads.The most relatively popular books in SC are all from the twentieth century, while the most relatively popular books in Goodreads are drawn more uniformly across the nineteenth and early twentieth centuries.(For reference, the top five SC titles by this metric are The Midas Touch [1938] by Margaret Kennedy, Ripeness is All [1935] by Eric Linklater, This is Mr. Fortune [1938] by H. C. Bailey, The White Horses of Vienna [1937] by Kay Boyle, and The Washington Legation Murders [1935] by F. Van Wyck Mason.The five Goodreads titles are Little Women [1880], Dracula [1897], Anna Karenina [1877], The Wonderful Wizard of Oz [1900], and Madame Bovary [1857].)

Fig. 4 .
Fig. 4. Box plots of publication years for the most popular books in each dataset, truncated to remove lowest outliers for Goodreads.
representing the neighbors of each book as a numeric vector.For each book we define two vectors, the first representing reading patterns in SC and the other representing reading patterns in Goodreads.A given book's vector is indexed by books, where each entry is proportional to the number of readers who interacted with the given pair of books and the vectors are l1-normalized into distributions.We use Jensen-Shannon divergence as a standard method for comparing each book's two vectors. 6et al., "Fine-Grained Spoiler Detection from Large-Scale Review Corpora," Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (July 2019): 2605-10.

Table 1 .
The books borrowed most often in the SC dataset.(Only books also included in the Goodreads dataset are shown here).

Table 2 .
The books reviewed most often in the Goodreads dataset.(Only books also included in the SC dataset are shown here).

Table 3 .
The most popular authors in the SC and Goodreads datasets.

Table 4 .
The books that rose the most in popularity rank from SC to Goodreads.

Table 5 .
The books that fell the most in popularity rank from SC to Goodreads.

Table 6 .
The lists to which SC books are most frequently assigned by Goodreads users are lists of best books (e.g., Best Books Ever, Best Books of the 20th Century) and lists of books everyone should read (e.g., Books That Everyone Should Read At Least Once, Must Read Classics).The most popular lists to which Goodreads users assign books matched with SC.

Table 7 .
The results show which lists are associated with books that rose or fell in popularity from SC to Goodreads.For example, books assigned to lists associated with particular modernist authors (e.g., James Joyce Reading List, Best of D. H. Lawrence) fell in popularity, while books associated with what are today considered "classics" and with children (e.g., Proliferation of the Classics, My Favorite Childhood books) rose in popularity.Evidence from these lists indicates that books that rose in popularity became part of the canon, while books that fell in popularity were particular to the time and place of SC.Lists that rose and fell the most in rank from SC to Goodreads.

Table 8 .
Shelves that rose and fell the most in rank from SC to Goodreads.

Table 9 .
The popular books with the lowest Jensen-Shannon divergence have the most similar distributions over co-occurring books.Even the books with the most similar reading patterns nevertheless have noticeably different neighbors in the two datasets.For example, in both datasets, Light in August by William Faulkner is read by people who also read Hemingway and other works by Faulkner.But in SC, top neighbors include now-less-popular works such as Sanctuary and Men Without Women.In Goodreads, the neighborhood instead includes The Sound and the Fury and The Sun Also Rises.The contemporary but now-less-read The Years by Virginia Woolf appears SC, while the much older but now-more-popular Jane Eyre is more related in Goodreads.Men Without Women by Ernest Hemingway The Great Gatsby by F. Scott Fitzgerald 3 As I Lay Dying by William Faulkner The Sound and the Fury by William Faulkner 4 A Farewell to Arms by Ernest Hemingway Jane Eyre by Charlotte Brontë 5 The Years by Virginia Woolf The Sun Also Rises by Ernest Hemingway

Table 10 .
The books most frequently interacted with by people who read Light in August by William Faulkner, a book with high neighbor similarity across SC and Goodreads.