Towards automated genre conversion: aggregating thematic events in classical Chinese chronological histories

Litao  Lin; Shiyan  Ou

doi:10.47989/ir31iConf64279

Authors

Litao Lin Nanjing University https://orcid.org/0000-0001-8287-936X
Shiyan Ou Nanjing University https://orcid.org/0000-0001-8617-6987

DOI:

https://doi.org/10.47989/ir31iConf64279

Keywords:

Chronological style, Historical-narrative style, Literary genre conversion, Event information aggregation, Thematic event extraction

Abstract

Introduction. This study explores the automatic restructuring of chronological historical texts into historical narratives. It aims to identify and aggregate dispersed event records to reconstruct macro-narrative structures.

Method. We propose a framework combining event extraction and unsupervised clustering. First, an event detection model tailored for classical Chinese is developed. Next, we employ contrastive learning to train a semantic representation model using the thematic text Tongjian Jishi Benmo. Finally, unsupervised clustering aggregates vectorised paragraphs into event-specific groups. A mapping dataset linking the chronological Zizhi Tongjian to thematic chapters was created for quantitative evaluation.

Results. Experiments indicate that the contrastive learning model combined with the DBSCAN algorithm yields the best performance, with an adjusted rand index (ARI) of 0.43 and normalised mutual information (NMI) of 0.78. The model successfully aggregates semantically related paragraphs, demonstrating an initial capability to transform chronological annals into event-centered accounts.

Conclusions. While precision in event boundaries needs improvement, this research validates the feasibility of automated narrative reconstruction, offering methodological insights for digital historical knowledge organisation.

References

Fowlkes, E. B., & Mallows, C. L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78(383), 553–569. https://doi.org/10.2307/2288117

Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218. https://doi.org/10.1007/BF01908075

Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., & Krishnan, D. (2020). Supervised contrastive learning. Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, 18661–18673.

lcclab-blcu. (2024). Lcclab-blcu/CHED [Computer software]. https://github.com/lcclab-blcu/CHED (Original work published 2023)

Litao, L., Mengcheng, W., Xueying, S., Jiaxin, Z., & Shiyan, O. (2024). Multi-model classical chinese event trigger word recognition driven by incremental pre-training. In H. Lin, H. Tan, & B. Li (Eds.), Proceedings of the 23rd chinese national conference on computational linguistics (volume 3: Evaluations) (pp. 178–190). Chinese information processing society of China. https://aclanthology.org/2024.ccl-3.20/

Strehl, A., & Ghosh, J. (2003). Cluster ensembles—A knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res., 3(null), 583–617. https://doi.org/10.1162/153244303321897735

Thakur, N., Reimers, N., Daxenberger, J., & Gurevych, I. (2021). Augmented sbert: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks (arXiv:2010.08240). arXiv. https://doi.org/10.48550/arXiv.2010.08240

Wang, D., Liu, C., Zhao, Z., Shen, S., Liu, L., Li, B., Hu, H., Wu, M., Lin, L., Zhao, X., & Wang, X. (2023, July 11). Gujibert and gujigpt: Construction of intelligent information processing foundation language models for ancient texts. arXiv.Org. https://arxiv.org/abs/2307.05354v1

Wang, Y., Wang, H., Zhu, H., & Li, X. (2023). Research on the Construction of an Event Recognition Model for Historical Antique Books Based on Text Generation Technology. Library and Information Service, 67(3), 119–130. https://doi.org/10.13266/j.issn.0252-3116.2023.03.011

Wei, C., Feng, Z., Huang, S., Li, W., & Shao, Y. (2023). CHED: A Cross-Historical Dataset with a Logical Event Schema for Classical Chinese Event Detection. In M. Sun, B. Qin, X. Qiu, J. Jing, X. Han, G. Rao, & Y. Chen (Eds.), Chinese Computational Linguistics (pp. 289–305). Springer Nature. https://doi.org/10.1007/978-981-99-6207-5_18

Xunzi-LLM-of-Chinese-classics/XunziALLM. (n.d.). Retrieved September 12, 2025, from https://github.com/Xunzi-LLM-of-Chinese-classics/XunziALLM

Ye, W., Hu, D., Wang, D., Zhou, H., & Liu, L. (2024). Research on Unsupervised Automatic Intertextual Discovery Based on Large Models of Ancient Books. Library and Information Service, 68(23), 41–51. https://doi.org/10.13266/j.issn.0252-3116.2024.23.004

Yu Xuehan, He Lin, & Wang Xianqi. (2023). Research on Event Extraction from Ancient Books Based on Machine Reading Comprehension. Journal of The China Society for Scientific and Technical Information, 42(3), 316–326.

Zhang, Q., Wang, D., Huang, S., & Deng, S. (2022). Multi-Dimensional Knowledge Reorganisation and Visualisation of History Books: Based on Records of the Grand Historian. Journal of the China Society for Scientific and Technical Information, 41(2), 130–141.

Zhangchao Li, Zhongkai Li, & Lin He. (2020). Study on the Extraction Method of War Events in Zuo Zhuan. Library and Information Service, 64(07), 20–29. https://doi.org/10.13266/j.issn.0252-3116.2020.07.003

通鉴纪事本末—维基文库，自由的图书馆. (n.d.). Retrieved May 20, 2025, from https://zh.wikisource.org/zh-hans/通鑑紀事本末

Towards automated genre conversion: aggregating thematic events in classical Chinese chronological histories

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

About the Journal

Make a Submission

Information