UC-TAM: an unsupervised model for fine-grained semantic association construction across large-scale power system standards documents
DOI:
https://doi.org/10.47989/ir31iConf64288Keywords:
Power system standards, Semantic association, Hierarchical attention, Domain specificity, Unsupervised machine learningAbstract
Introduction. Power system standards constitute a vast, complex repository, yet existing methods struggle with fine-grained semantic association (e.g., clause-level) due to coarse-grained analysis and reliance on manual annotation. This study develops an unsupervised, domain-adapted approach to address this gap.
Method. We propose UC-TAM, which integrates a 134,945-term power lexicon with a hierarchical attention framework and unsupervised clustering. Clauses serve as atomic semantic units, where intra- and cross-document relations are captured through multi-head attention and embedded in a shared space, followed by K-means clustering for semantic association discovery.
Analysis. Comparative experiments against a Sentence-BERT + K-means baseline were conducted on subsets of standards, evaluated with Silhouette (SC), Calinski–Harabasz (CH), and Davies–Bouldin (DB) indices.
Results. UC-TAM outperformed the baseline with improvements of 27.7% (SC), 31.3% (CH), and 19.6% (DB). A case study confirmed practical utility, with expert-rated clustering accuracy of 88% versus 75% for the baseline. Performance improved consistently with dataset size, demonstrating scalability and robustness.
Conclusion. UC-TAM highlights both the feasibility and the practical value of unsupervised, domain-aware approaches to standard document analysis, offering a pathway toward more intelligent, automated, and fine-grained management of standards.
References
Chen, G., Xie, W., Liu, Y., Yuan, X., & Zhao, L. (2025). Systematically modeling and extracting bibliographic metadata of power grid standard documents with LLMs. Information Research an International Electronic Journal, 30(iConf), 654–665. https://doi.org/10.47989/ir30iConf47233
China Electric Power Encyclopedia. (2025). State Grid Corporation of China & Yingda Media Investment Group Co.,Ltd. https://www.ceppedu.com/home/search-term.html
China Power. (2024, July 19). State Grid lays a solid foundation for the construction of a new type of power system.
Huang, C., & Lin, B. (2023). Promoting decarbonisation in the power sector: How important is digital transformation?. Energy Policy, 182, 113735. https://doi.org/10.1016/j.rser.2018.05.068
Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210. https://doi.org/10.1016/j.ins.2022.11.139
Liu, G. (2022). A New Index for Clustering Evaluation Based on Density Estimation. arXiv:2207.01294. https://arxiv.org/abs/2207.01294
Mokashi, R., & Lepakshi, V. A. (2024). Enhancing Pattern Classification Accuracy Through Customer Segmentation Using Machine Learning Algorithms. SN Computer Science, 5(7), 948–17. https://dblp.org/rec/journals/sncs/MokashiL24a.html
Paulheim, H. (2016). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3), 489-508. https://doi.org/10.3233/SW-160218
Tian, D., Li, M., Shen, Y., & Han, S. (2023). Intelligent mining of safety hazard information from construction documents using semantic similarity and information entropy. Engineering Applications of Artificial Intelligence, 119, 105742. https://doi.org/10.1016/j.engappai.2022.105742
Van Wyk, J. J., & Van Vuuren, J. H. (2021). Unsupervised Fine-tuning of Speaker Diarisation Pipelines using Silhouette Coefficients. North-West University Technical Report. https://ieeexplore.ieee.org/abstract/document/10735414
Vaswani, A., Shazeer, N., Parmar, et al. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://doi.org/10.48550/arXiv.1706.03762
Wang, Y. et al. (2017). Topic Model Based Text Similarity Measure for Chinese Judgment Document. In: Zou, B., Han, Q., Sun, G., Jing, W., Peng, X., Lu, Z. (eds) Data Science. ICPCSEE 2017. Communications in Computer and Information Science, vol 728. Springer, Singapore. https://doi.org/10.1007/978-981-10-6388-6_4
Zheng, L., Pan, J., & Zhang, K. (2022). Power Data Integration Method Based on Database-table Metadata Semantic. In Journal of Physics: Conference Series (Vol. 2179, No. 1, p. 012028). IOP Publishing. https://doi.org/10.1088/1742-6596/2179/1/012028
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Weiwei Zheng , Lu Wang , Xiaoqun Yuan , Yanrong Zheng , Wei Xie , Yanan Liu , Liang Zhao

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
