Towards automated genre conversion: aggregating thematic events in classical Chinese chronological histories
DOI:
https://doi.org/10.47989/ir31iConf64279Keywords:
Chronological style, Historical-narrative style, Literary genre conversion, Event information aggregation, Thematic event extractionAbstract
Introduction. This study explores the automatic restructuring of chronological historical texts into historical narratives. It aims to identify and aggregate dispersed event records to reconstruct macro-narrative structures.
Method. We propose a framework combining event extraction and unsupervised clustering. First, an event detection model tailored for classical Chinese is developed. Next, we employ contrastive learning to train a semantic representation model using the thematic text Tongjian Jishi Benmo. Finally, unsupervised clustering aggregates vectorised paragraphs into event-specific groups. A mapping dataset linking the chronological Zizhi Tongjian to thematic chapters was created for quantitative evaluation.
Results. Experiments indicate that the contrastive learning model combined with the DBSCAN algorithm yields the best performance, with an adjusted rand index (ARI) of 0.43 and normalised mutual information (NMI) of 0.78. The model successfully aggregates semantically related paragraphs, demonstrating an initial capability to transform chronological annals into event-centered accounts.
Conclusions. While precision in event boundaries needs improvement, this research validates the feasibility of automated narrative reconstruction, offering methodological insights for digital historical knowledge organisation.
References
Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383), 553–569. https://doi.org/10.2307/2288117
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. https://doi.org/10.1007/BF01908075
Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., & Krishnan, D. (2020). Supervised contrastive learning. Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, 18661–18673.
lcclab-blcu. (2024). Lcclab-blcu/CHED [Computer software]. https://github.com/lcclab-blcu/CHED (Original work published 2023)
Litao, L., Mengcheng, W., Xueying, S., Jiaxin, Z., & Shiyan, O. (2024). Multi-model classical chinese event trigger word recognition driven by incremental pre-training. In H. Lin, H. Tan, & B. Li (Eds.), Proceedings of the 23rd chinese national conference on computational linguistics (volume 3: Evaluations) (pp. 178–190). Chinese information processing society of China. https://aclanthology.org/2024.ccl-3.20/
Strehl, A., & Ghosh, J. (2003). Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res., 3(null), 583–617. https://doi.org/10.1162/153244303321897735
Thakur, N., Reimers, N., Daxenberger, J., & Gurevych, I. (2021). Augmented sbert: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks (arXiv:2010.08240). arXiv. https://doi.org/10.48550/arXiv.2010.08240
Wang, D., Liu, C., Zhao, Z., Shen, S., Liu, L., Li, B., Hu, H., Wu, M., Lin, L., Zhao, X., & Wang, X. (2023, July 11). Gujibert and gujigpt: Construction of intelligent information processing foundation language models for ancient texts. arXiv.Org. https://arxiv.org/abs/2307.05354v1
Wang, Y., Wang, H., Zhu, H., & Li, X. (2023). Research on the Construction of an Event Recognition Model for Historical Antique Books Based on Text Generation Technology. Library and Information Service, 67(3), 119–130. https://doi.org/10.13266/j.issn.0252-3116.2023.03.011
Wei, C., Feng, Z., Huang, S., Li, W., & Shao, Y. (2023). CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection. In M. Sun, B. Qin, X. Qiu, J. Jing, X. Han, G. Rao, & Y. Chen (Eds.), Chinese Computational Linguistics (pp. 289–305). Springer Nature. https://doi.org/10.1007/978-981-99-6207-5_18
Xunzi-LLM-of-Chinese-classics/XunziALLM. (n.d.). Retrieved September 12, 2025, from https://github.com/Xunzi-LLM-of-Chinese-classics/XunziALLM
Ye, W., Hu, D., Wang, D., Zhou, H., & Liu, L. (2024). Research on Unsupervised Automatic Intertextual Discovery Based on Large Models of Ancient Books. Library and Information Service, 68(23), 41–51. https://doi.org/10.13266/j.issn.0252-3116.2024.23.004
Yu Xuehan, He Lin, & Wang Xianqi. (2023). Research on Event Extraction from Ancient Books Based on Machine Reading Comprehension. Journal of The China Society for Scientific and Technical Information, 42(3), 316–326.
Zhang, Q., Wang, D., Huang, S., & Deng, S. (2022). Multi-Dimensional Knowledge Reorganisation and Visualisation of History Books: Based on Records of the Grand Historian. Journal of the China Society for Scientific and Technical Information, 41(2), 130–141.
Zhangchao Li, Zhongkai Li, & Lin He. (2020). Study on the Extraction Method of War Events in Zuo Zhuan. Library and Information Service, 64(07), 20–29. https://doi.org/10.13266/j.issn.0252-3116.2020.07.003
通鉴纪事本末—维基文库,自由的图书馆. (n.d.). Retrieved May 20, 2025, from https://zh.wikisource.org/zh-hans/通鑑紀事本末
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Litao Lin , Shiyan Ou

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
