Supported by the Institute for Artificial Intelligence at Peking University, researched and designed by the Center for Digital Humanities at Peking University, and developed in collaboration with the Wang Xuan Institute of Computer Technology, a research paper on quantitative cultural analysis systems has been published in Humanities and Social Sciences Communications, the only humanities and social sciences journal under Nature. This publication comes less than a year after another paper introducing the core algorithm of this research was published in the same journal. The details of the two papers are as follows:
· "Evol project: a comprehensive online platform for quantitative analysis of ancient literature"
Jun Wang, Siyu Duan*, Binghao Fu, Liangcai Gao & Qi Su, Humanities and Social Sciences Communications, volume 11, Article number: 291 (2024)
· "Disentangling the cultural evolution of ancient China: a digital humanities perspective"
Siyu Duan, Jun Wang, Hao Yang & Qi Su*, Humanities and Social Sciences Communications, volume 10, Article number: 310 (2023)
The "Evol Project" paper introduces a prototype system for tracing the origins of ancient literature, developed by the interdisciplinary team at the Center for Digital Humanities. This platform applies deep learning techniques to conduct quantitative cultural analyses on large-scale classical literature collections, tracing the origins and evolutionary trajectories of Chinese ideological and cultural concepts. It provides data-driven humanities research tools at the levels of vocabulary, sentences, and documents, facilitating scholars in intellectual and cultural history research through quantitative analysis methods. The following images respectively display the book-level intertextual network, chapter-level intertextual distribution, and sentence-level intertextual frequency statistics of Laozi and various Taoist texts, illustrating the dissemination and evolution of ideas contained in classical literature in subsequent works.

To achieve this, the platform aggregates all digitized classics available up to the Tang dynasty, along with the "Twenty-Four Histories," "Zizhi Tongjian," and several selected classics and anthologies, totaling 201 types, 30,880 articles, and over 50 million characters, covering fields such as philosophy, history, politics, literature, and religion. Beyond common functions like browsing, searching, and frequency statistics, the platform specializes in providing quantitative cultural analysis features such as text reuse, word co-occurrence, and diachronic n-grams, accompanied by diverse visualizations. Through simple clicks, users can observe the evolutionary trajectories of thoughts over millennia.
The paper presents several cultural analysis cases based on this platform. The following image shows the changes in negative sentiment scores of words co-occurring with the names of nomadic tribes in historical materials. It indicates a gradual decrease in negative sentiments towards nomadic tribes in historical texts, corroborating the mainstream view in traditional ethnic studies that, on a large historical scale, mutual dependence and integration among Chinese ethnic groups is the overarching trend.

The prototype system is now open for use and can be accessed at: http://evolution.pkudh.xyz/. Building upon this, the Center for Digital Humanities and the Wang Xuan Institute of Computer Technology have collaborated to develop an application-level ancient literature tracing analysis system, accessible at: https://ca.pkudh.org/.
The "Disentangling Cultural Evolution" paper describes the core algorithm principles of the aforementioned system. It employs deep neural networks to compute millions of similar intertextual pairs across the dataset, subsequently organizing related texts into an intertextual network using a hierarchical framework. Based on the node features of this network, standardized intertextual scores between any two texts are calculated to examine various cultural phenomena. The paper first calculates intertextual indices for several general cultural phenomena to validate the effectiveness of the intertextual analysis method. For example, it finds significant intertextual connections between Song-Ming Neo-Confucian texts and pre-Qin Confucian classics; Taoist and Wei-Jin metaphysical texts like Cantongqi, Wenshi Zhenjing, Collected Works of Ruan Ji, and Collected Works of Ji Kang show significant intertextual links with pre-Qin Taoist texts. The paper applies intertextual metrics to analyze contentious issues in traditional humanities research, discovering that Lüshi Chunqiu exhibits a relatively uniform intertextual distribution across pre-Qin academic schools but slightly leans towards Taoism; chapters with disputed authorship in Collected Works of Tao Yuanming indeed show deviations in intertextual distribution compared to other sections.
The following image illustrates the strength of intertextual connections between Collected Works of Tao Yuanming and the five schools of Confucianism, Taoism, Mohism, Legalism, and Militarism. Collection1 represents the disputed chapters "Five Sons' Commentary" and "Eight Views," while Collection2 represents the remaining texts.

The paper treats historical materials and literary anthologies from various eras as diachronic data, calculating the intertextual strength between pre-Qin philosophical texts and texts from different periods to observe the rise and fall of various schools over two millennia, quantitatively measuring and visually presenting the impact of a series of historical events. In the following image, one can clearly observe the heavy use of Legalism during the Qin dynasty, the exclusive promotion of Confucianism during the Han dynasty, and the revival of Taoist metaphysics during the Jin dynasties.

These research achievements are the result of close collaboration within an interdisciplinary team. Professor Jun Wang from the Department of Information Management, Associate Professor Qi Su from the School of Foreign Languages, and Associate Researcher Hao Yang from the Institute for Artificial Intelligence (formerly a faculty member of the Department of Philosophy) jointly supervised the research. Doctoral student Siyu Duan from the Department of Information Management independently explored and researched, with assistance from peers like Jiachun Li and Binghao Fu. Ruixuan Luo and Xiaohan Bi from the Institute of Computational Linguistics undertook the initial development of the prototype system. These research achievements are the result of close collaboration within an interdisciplinary framework at Peking University. The system development received strong support from the Wang Xuan Institute of Computer Technology, where Deputy Director Gao Liangcai mobilized engineering and technical resources to develop an application-level ancient document analysis system based on the Evo Project prototype, demonstrating the power of interdisciplinary cooperation at Peking University.