Mining the Meaning (MiMe) is a project financed by the University of Copenhagen Data+ grant that aims to explore the literary and social modernization of Scandinavian societies during the latter part of the 19th century. Unlike traditional historiography, MiMe employs advanced natural language processing (NLP) techniques and semantic analysis to examine a broad corpus of 895 Danish and Norwegian novels published between 1870 and 1900. The project’s methodology includes developing state-of-the-art computational semantic methods and training large language models towards written late 19th-century Danish and Norwegian. The project also plans to integrate existing conceptual knowledge resources and develop methods for incorporating extra-linguistic information in semantic parsing and analysis.

The MiMe project is closely related to the Measuring Modernity (MeMo) project financed by the Carlsberg Foundation, but the two have distinct focuses. While both initiatives explore the reflections of societal change in Scandinavian literature from the same historical period, MeMo primarily focuses on the literary analysis research, investigating how Denmark and Scandinavia became modern and the role of literature in that process. On the other hand, MiMe is more oriented towards the development of computational methodology, aiming to create a higher level of abstraction in text analysis, enabling a fine-grained but large-scale investigation of various phenomena. Both projects, however, share the common goal of offering new insights into the processes of modernization in this formative period in the literary and social history of Scandinavia.

People

  • Jens Bjerring-Hansen, Department of Nordic Studies and Linguistics, University of Copenhagen (Co-PI)
  • Daniel Hershcovich, Department of Computer Science, University of Copenhagen (Co-PI)
  • Ali Al-Laith, Department of Computer Science, University of Copenhagen (Postdoc)

Sub-projects

  • The Unhappy Texts. This term is rooted in a literary hypothesis that suggests 19th-century Scandinavian texts written by female authors were characterized by a negative sentiment, reflecting the societal constraints and patriarchal norms that women faced during this era. The authors of these texts often depicted characters who lacked agency and were disillusioned, reflecting their own experiences in a restrictive society. However, it’s important to note that this hypothesis is based on a limited selection of texts and is subject to ongoing analysis and interpretation. The use of sentiment analysis tools and methodologies, such as those developed for the analysis of historical Danish and Norwegian literary texts, can provide a more nuanced understanding of these ‘unhappy texts’ and the societal conditions they reflect.
  • Language Models for Historical Literary Scandinavian Texts. This sub-project develops pre-trained language models specifically designed for historical Danish and Norwegian texts. It fills a crucial gap in NLP, which, despite a wealth of English language resources, lacks models tailored for historical Scandinavian literature. Leveraging the unique MeMo corpus, we investigate the potential of fine-tuning pre-trained language models on historical data and training a language model from scratch using it. As part of this endeavor, we will collect additional historical documents, including newspapers and novels from various time periods, to train new pre-trained language models that encapsulate a richer historical context. We plan to experiment with various architectures, including encoder-only, encoder-decoder, and decoder-only. Furthermore, we create annotated datasets to benchmark these models, and use them to enable large-scale and nuanced literary analysis. We also aim to explore the applicability of these models to other corpora and their potential to enhance other research initiatives. While our project shares similar goals with initiatives like the Danish Foundation Models project, it stands out due to its specific focus on historical Danish and Norwegian texts. We anticipate synergies with other projects, such as resource sharing and expertise exchange, and potentially using their contemporary models as a foundation for our historical models.
  • The Fate of the Modern Breakthrough. The modernization processes in Scandinavia in the latter half of the 19th century changed how we perceive the world around us and existence in general, but that did not happen overnight. Through a conceptual historical analysis of the concept of skæbne (fate/destiny), this sub-project explores the dialectical relationship between pre-modern and modern perceptions of the world as it unfolds in Scandinavian literary history. The literary use of the concept reflects, on the one hand, the new secular and scientific ideals that gained ground during the period, while on the other hand, it retains the concept’s religious and metaphysical roots. With many of the novels now forgotten, the project adapts and utilizes unsupervised machine learning models to provide a conceptual overview of the period. Further, it creates annotated datasets to fine-tune and deploy language models for more fine-grained analyses.

Publications

Sentiment Classification of Historical Danish and Norwegian Literary Texts. Ali Al-Laith, Kirstine Nielsen Degn, Alexander Conroy, Bolette S. Pedersen, Jens Bjerring-Hansen and Daniel Hershcovich. NoDaLiDa 2023.