UC-TAM: an unsupervised model for fine-grained semantic association construction across large-scale power system standards documents

Authors

  • Weiwei Zheng Wuhan University
  • Lu Wang Wuhan University
  • Xiaoqun Yuan Wuhan University
  • Yanrong Zheng Yingda Media Investment Group Co., Ltd
  • Wei Xie State Grid Fujian Electric Power Research Institute
  • Yanan Liu Yingda Media Investment Group Co., Ltd
  • Liang Zhao Wuhan University

DOI:

https://doi.org/10.47989/ir31iConf64288

Keywords:

Power system standards, Semantic association, Hierarchical attention, Domain specificity, Unsupervised machine learning

Abstract

Introduction. Power system standards constitute a vast, complex repository, yet existing methods struggle with fine-grained semantic association (e.g., clause-level) due to coarse-grained analysis and reliance on manual annotation. This study develops an unsupervised, domain-adapted approach to address this gap.

Method. We propose UC-TAM, which integrates a 134,945-term power lexicon with a hierarchical attention framework and unsupervised clustering. Clauses serve as atomic semantic units, where intra- and cross-document relations are captured through multi-head attention and embedded in a shared space, followed by K-means clustering for semantic association discovery.

Analysis. Comparative experiments against a Sentence-BERT + K-means baseline were conducted on subsets of standards, evaluated with Silhouette (SC), Calinski–Harabasz (CH), and Davies–Bouldin (DB) indices.

Results. UC-TAM outperformed the baseline with improvements of 27.7% (SC), 31.3% (CH), and 19.6% (DB). A case study confirmed practical utility, with expert-rated clustering accuracy of 88% versus 75% for the baseline. Performance improved consistently with dataset size, demonstrating scalability and robustness.

Conclusion. UC-TAM highlights both the feasibility and the practical value of unsupervised, domain-aware approaches to standard document analysis, offering a pathway toward more intelligent, automated, and fine-grained management of standards.

References

Chen, G., Xie, W., Liu, Y., Yuan, X., & Zhao, L. (2025). Systematically modeling and extracting bibliographic metadata of power grid standard documents with LLMs. Information Research an International Electronic Journal, 30(iConf), 654–665. https://doi.org/10.47989/ir30iConf47233

China Electric Power Encyclopedia. (2025). State Grid Corporation of China & Yingda Media Investment Group Co.,Ltd. https://www.ceppedu.com/home/search-term.html

China Power. (2024, July 19). State Grid lays a solid foundation for the construction of a new type of power system.

Huang, C., & Lin, B. (2023). Promoting decarbonisation in the power sector: How important is digital transformation?. Energy Policy, 182, 113735. https://doi.org/10.1016/j.rser.2018.05.068

Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210. https://doi.org/10.1016/j.ins.2022.11.139

Liu, G. (2022). A New Index for Clustering Evaluation Based on Density Estimation. arXiv:2207.01294. https://arxiv.org/abs/2207.01294

Mokashi, R., & Lepakshi, V. A. (2024). Enhancing Pattern Classification Accuracy Through Customer Segmentation Using Machine Learning Algorithms. SN Computer Science, 5(7), 948–17. https://dblp.org/rec/journals/sncs/MokashiL24a.html

Paulheim, H. (2016). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3), 489-508. https://doi.org/10.3233/SW-160218

Tian, D., Li, M., Shen, Y., & Han, S. (2023). Intelligent mining of safety hazard information from construction documents using semantic similarity and information entropy. Engineering Applications of Artificial Intelligence, 119, 105742. https://doi.org/10.1016/j.engappai.2022.105742

Van Wyk, J. J., & Van Vuuren, J. H. (2021). Unsupervised Fine-tuning of Speaker Diarisation Pipelines using Silhouette Coefficients. North-West University Technical Report. https://ieeexplore.ieee.org/abstract/document/10735414

Vaswani, A., Shazeer, N., Parmar, et al. (2017). Attention is all you need. Advances in neural information processing systems, 30. https://doi.org/10.48550/arXiv.1706.03762

Wang, Y. et al. (2017). Topic Model Based Text Similarity Measure for Chinese Judgment Document. In: Zou, B., Han, Q., Sun, G., Jing, W., Peng, X., Lu, Z. (eds) Data Science. ICPCSEE 2017. Communications in Computer and Information Science, vol 728. Springer, Singapore. https://doi.org/10.1007/978-981-10-6388-6_4

Zheng, L., Pan, J., & Zhang, K. (2022). Power Data Integration Method Based on Database-table Metadata Semantic. In Journal of Physics: Conference Series (Vol. 2179, No. 1, p. 012028). IOP Publishing. https://doi.org/10.1088/1742-6596/2179/1/012028

Downloads

Published

2026-03-20

How to Cite

Zheng, W., Wang, L., Yuan, X., Zheng, Y., Xie, W., Liu, Y., & Zhao, L. (2026). UC-TAM: an unsupervised model for fine-grained semantic association construction across large-scale power system standards documents. Information Research an International Electronic Journal, 31(iConf), 276–286. https://doi.org/10.47989/ir31iConf64288

Issue

Section

Conference proceedings

Similar Articles

<< < 1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.