Since the 18th century, the novel has been one of the defining forms of English writing, a mainstay of popular entertainment and academic criticism. Despite its importance, however, there are few computational studies of the large-scale structure of novels—and many popular representations for discourse modeling do not work very well for novelistic texts. We propose a high-level representation of plot structure which tracks the frequency of mention for different characters, topics and emotional words over time. We show that this representation can distinguish between real novels and artificially permuted surrogates with high accuracy; characters are important for eliminating random permutations, while topics are effective at distinguishing beginnings from ends. We analyze the system by inducing clusters of characters which can be viewed as "roles" or "plot functions".
Micha Elsner is an Assistant Professor of Linguistics at The Ohio State University. He previously worked at the University of Edinburgh and received a Ph.D. in Computer Science from Brown University in 2011, where he worked on models of local discourse coherence.