This post is a continuation of a series where I use natural language processing (NLP) to analyze the text of Proust’s In Search Of Lost Time (ISLT). Here is the Python code for this series.
In my previous post I showed that the five most frequently mentioned characters in ISLT are Albertine (2338 mentions), Swann (1338), Charlus (1303), Robert Saint-Loup (1091) and Odette (971). But when do these references occur?
Readers of ISLT know that characters come and go as the story progresses. Swann and Odette play prominent roles in Volume 1 (Swann’s Way), whereas Albertine does not enter the story until Volume 2, coming to the fore as an object of the Narrator’s obsessions in Volumes 5 (The Prisoner) and 6 (The Fugitive).
I thought it would be interesting to try and visualize these transitions, so I recorded the chapter and paragraph of each reference of each character, in effect producing “coordinates” for every portion ofthe text. For example, the first proper name in the text (appropriately, François Ier), is (1, 1). I counted the number of references for each name within each of the 486 chapters.
These counts constitute a kind of “character heat map”: when characters are referenced frequently in a chapter, these counts are high, and are zero if a character is not mentioned at all.
I created the following heat map of the top five characters by ripping off some stuff I read on stackoverflow:
The tick marks represent the seven volumes of ISLT and each slender vertical line represents a chapter. Of course, not all chapters are the same length so the diagram is not accurate in that respect. From this visualization you can see the following:
- Albertine does enter the picture in volume II and dominates volumes V and VI;
- Odette features heavily in Volume I (especially in “Swann in Love”), with only intermittent appearances thereafter;
- Saint-Loup is introduced in Volume II and taking over the focus from the Swann-Odette relationship in Volume I;
- Swann is a central figure in Volume I and has occasional reappearances thereafter. When he enters the story he is typically a central figure.
Proust treats places as characters of their own, so it’s also interesting to examine where places are referenced. Here are four frequently referenced places. Combray is where the story starts, of course, whereas Balbec later becomes a place of relaxation and refuge to which Proust returns repeatedly. Doncières is where Saint-Loup (an aspiring officer) is stationed, and so references to Doncières are correlated to those of Saint-Loup. Paris, the beating heart of society, is mentioned throughout ISLT.
I also produced heat maps of five of the most frequently occurring words (each a theme):
There are other ways to visualize and juxtapose references to characters, places, and concepts, of course, and perhaps in future posts I will explore further.
For the next post in this series, however, we will examine the context in which characters are mentioned. Which are referred to in positive and negative terms, and when do changes in sentiment occur?