An LLM-powered framework for hierarchical topic discovery in LLM research

Xuan Lu; Young Soo Ko

doi:10.47989/ir31iConf64158

Authors

Xuan Lu University of Arizona https://orcid.org/0009-0000-8644-9957
Young Soo Ko University of Arizona https://orcid.org/0000-0003-4886-0918

DOI:

https://doi.org/10.47989/ir31iConf64158

Keywords:

Hierarchical topic discovery, LLM research, LLM framework, Trend analysis, Topic modeling

Abstract

Introduction. The advancement of large language models (LLMs) has led to a rapidly expanding body of research, making systematic mapping of the research landscape critical for informed policy and resource allocation. We present an LLM-powered framework constructing hierarchical topics to reveal how research emerges and reorganizes.

Method. We gather Web of Science (WoS) titles/abstracts, extract topics via GPTopic, and prompt an LLM to form a hierarchy. We integrate quantitative and qualitative analyses, building a bipartite network linking domains and methods.

Analysis. The hierarchical topics provide an overview of the LLM research landscape. Trends are derived from monthly aggregations of publications, offering insights into how research topics evolve and shift in the future.

Results. AI/ML led overall output, peaking in August 2024, with multimodal learning, systems efficiency, and cybersecurity emerging as key growth engines. Medical LLM research evolved from exploration (2023) to workflow integration (2024), reaching specialized deployment focused on fairness and guidelines by 2025 across medical imaging topics. We successfully forecast two research directions that account for 83.75% of publications in a specific area over the next three months.

Conclusion. The framework can effectively generate hierarchical topics for LLM-related research for downstream analysis and can be generalized to other domains.

References

Abdel-Rehim, A., Zenil, H., Orhobor, O., Fisher, M., Collins, R. J., Bourne, E., ... & King, R. (2025). Scientific hypothesis generation by large language models: laboratory validation in breast cancer treatment. Journal of the Royal Society Interface, 22(227), 20240674. https://doi.org/10.1098/rsif.2024.0674

Abdurahman, S., Salkhordeh Ziabari, A., Moore, A. K., Bartels, D. M., & Dehghani, M. (2025). A primer for evaluating large language models in social-science research. Advances in Methods and Practices in Psychological Science, 8(2), 25152459251325174. https://doi.org/10.1177/25152459251325174

Agarwal, S., Wood, D., Murray, B. A., Wei, Y., Busaidi, A. A., Kafiabadi, S., ... & Booth, T. C. (2025). Impact of hospital-specific domain adaptation on BERT-based models to classify neuroradiology reports. European Radiology, 1-15. https://doi.org/10.1007/s00330-025-11500-9

Angelov, D. (2020). Top2vec: Distributed representations of topics. arXiv preprint arXiv:2008.09470. https://doi.org/10.48550/arXiv.2008.09470

Atsukawa, N., Tatekawa, H., Oura, T., Matsushita, S., Horiuchi, D., Takita, H., ... & Ueda, D. (2025). Evaluation of radiology residents’ reporting skills using large language models: an observational study. Japanese Journal of Radiology, 1-9. https://doi.org/10.1101/2024.11.06.24316838

Barman, K. G., Caron, S., Sullivan, E., de Regt, H. W., de Austri, R. R., Boon, M., ... & Weniger, C. (2025). Large physics models: Towards a collaborative approach with large language models and foundation models. arXiv preprint arXiv:2501.05382. https://doi.org/10.48550/arXiv.2501.05382

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of machine Learning research, 3(Jan), 993-1022. https://doi.org/10.7551/mitpress/1120.003.0082

Blei, D. M., Griffiths, T. L., & Jordan, M. I. (2010). The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. Journal of the ACM (JACM), 57(2), 1-30. https://doi.org/10.1145/1667053.1667056

Brown, J. D., Lenchik, L., Doja, F., Kaviani, P., Judd, D., Probyn, L., ... & Retrouvey, M. (2025). Leveraging large language models in radiology research: a comprehensive user guide. Academic Radiology. https://doi.org/10.1016/j.acra.2024.11.053

Cao, Y., Lau, P. N., Chin, A. W., He, Z., Au Yeung, C. C., Zeng, K., ... & Shum, H. C. (2025). A Phase Separation‐Assisted Pre‐Enrichment Method for Ultrasensitive Respiratory Virus Detection. Advanced Science, e06578. https://doi.org/10.1002/advs.202506578

Cotfas, L. A., Sandu, A., Delcea, C., Diaconu, P., Frăsineanu, C., & Stănescu, A. (2025). From Transformers to ChatGPT: An Analysis of Large Language Models Research. IEEE Access. https://doi.org/10.1109/access.2025.3600739

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., ... & Xie, X. (2024). A survey on evaluation of large language models. ACM transactions on intelligent systems and technology, 15(3), 1-45. https://doi.org/10.1145/3641289

Dave, T., Athaluri, S. A., & Singh, S. (2023). ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations. Frontiers in artificial intelligence, 6, 1169595. https://doi.org/10.3389/frai.2023.1169595

de Carvalho, S. C., Raboni, S. M., Bueno, L. B., Lapinscki, B. A., Lissa, S. M., Amadeu, L. L. M., ... & Nogueira, M. B. (2025). Assessment of the relationship between hematologic parameters,(CPD), in screening for COVID-19 severity in women. Future Science OA, 11(1), 2540749. https://doi.org/10.1080/20565623.2025.2540749

Dennstädt F, Windisch P, Filchenko I, Zink J, Putora PM, Shaheen A, Gaio R, Cihoric N, Wosny M, Aeppli S, Schmerder M, Shelan M, Hastings J. Application of a General Large Language Model-Based Classification System to Retrieve Information about Oncological Trials. Oncology. 2025 Jun 13:1-11. doi: 10.1159/000546946. Epub ahead of print. PMID: 40517775. https://doi.org/10.1101/2024.12.03.24318390

Elliott, J. H., Turner, T., Clavisi, O., Thomas, J., Higgins, J. P., Mavergames, C., & Gruen, R. L. (2014). Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS medicine, 11(2), e1001603. https://doi.org/10.1371/journal.pmed.1001603

Gallifant, J., Chen, S., Jain, S. K., Moreira, P., Topaloglu, U., Aerts, H. J., ... & Bitterman, D. S. (2025). Reliability of large language model knowledge across brand and generic cancer drug names. JCO Clinical Cancer Informatics, 9, e2400257. https://doi.org/10.1200/cci-24-00257

Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794. https://doi.org/10.48550/arXiv.2203.05794

Habeshian, T. S., Park, S. Y., Conti, D., Wilkens, L. R., Le Marchand, L., & Setiawan, V. W. (2025). Inflammatory and insulinemic dietary and lifestyle patterns and incidence of endometrial cancer: the multiethnic cohort. The American Journal of Clinical Nutrition, 121(6), 1236-1245. https://doi.org/10.1016/j.ajcnut.2025.04.020

Hofmann, T. (1999, August). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 50-57). https://doi.org/10.1145/312624.312649

Huang, Y., Liu, Q., Liu, J., & Hu, Y. (2022, September). Topic Discovery in Scientific Literature. In CCF Conference on Computer Supported Cooperative Work and Social Computing (pp. 481-491). Singapore: Springer Nature Singapore. https://doi.org/10.1007/978-981-99-2356-4_38

Jeon, J., & Lee, S. (2023). Large language models in education: A focus on the complementary relationship between human teachers and ChatGPT. Education and Information Technologies, 28(12), 15873-15892. https://doi.org/10.1007/s10639-023-11834-1

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., ... & Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. nature, 596(7873), 583-589. https://doi.org/10.1038/s41586-021-03819-2

Kabra, M., Nagpal, A., Sacheti, A., Kumar, M., & Joshi, S. (2024). GENWISE: Thematic Discovery from Textual Data. In Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning (pp. 79-88).

Kam, T. T., Bui, C. H., Yeung, H. W., Wong, P. C., Chin, A. W., Nicholls, J. M., ... & Chan, M. C. (2025). Viral characterization of the reassortants between canine influenza H3N2 and human pandemic (2009) H1N1 and avian H9N2 viruses in canine ex vivo tracheal explants. Virology Journal, 22, 218. https://doi.org/10.1186/s12985-025-02836-1

Lee, D., & Seung, H. S. (2000). Algorithms for non-negative matrix factorization. Advances in neural information processing systems, 13. doi/10.5555/3008751.3008829

Li, P., Wang, Z., Zhang, X., Zhang, R., Jiang, L., Wang, P., & Zhou, Y. (2025). SciTopic: Enhancing Topic Discovery in Scientific Literature through Advanced LLM. arXiv preprint arXiv:2508.20514. https://doi.org/10.48550/arXiv.2508.20514

Li, R., Mao, S., Zhu, C., Yang, Y., Tan, C., Li, L., ... & Yang, Y. (2025). Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report. Journal of Medical Internet Research, 27, e72638. https://doi.org/10.2196/72638

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444. https://doi.org/10.1038/nature14539

Maulana, F. I., Adi, P. D. P., & Widartha, V. P. (2024, November). A Comprehensive Review and Research Trends of Large Language Models (LLMs) in Artificial Intelligence. In 2024 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS) (pp. 346-351). IEEE. https://doi.org/10.1109/icimcis63449.2024.10956533

McElroy, J. P., Song, M. A., Barr, J. R., Gardner, M. S., Kinnebrew, G., Kuklenyik, Z., ... & Shields, P. G. (2025). Lung lipids associated with smoking and ECIG use in a cross-sectional study and clinical trial. Respiratory Research, 26(1), 193. https://doi.org/10.1186/s12931-025-03267-w

Mendoza-Revilla, J., Trop, E., Gonzalez, L., Roller, M., Dalla-Torre, H., de Almeida, B. P., ... & Lopez, M. (2024). A foundational large language model for edible plant genomes. Communications Biology, 7(1), 835. https://doi.org/10.1038/s42003-024-06465-2

Mimno, D., Wallach, H., Talley, E., Leenders, M., & McCallum, A. (2011, July). Optimizing semantic coherence in topic models. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 262-272). https://doi.org/10.3115/1699571.1699627

Moassefi, M., Houshmand, S., Faghani, S., Chang, P. D., Sun, S. H., Khosravi, B., ... & Erickson, B. J. (2025). Cross-Institutional Evaluation of Large Language Models for Radiology Diagnosis Extraction: A Prompt-Engineering Perspective. Journal of Imaging Informatics in Medicine, 1-6. https://doi.org/10.1007/s10278-025-01523-5

Mu, Y., Bai, P., Bontcheva, K., & Song, X. (2024b). Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling. arXiv preprint arXiv:2405.00611. https://doi.org/10.48550/arXiv.2405.00611

Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., ... & Mian, A. (2025). A comprehensive overview of large language models. ACM Transactions on Intelligent Systems and Technology, 16(5), 1-72. https://doi.org/10.1145/3744746

Pan, H., Mudur, N., Taranto, W., Tikhanovskaya, M., Venugopalan, S., Bahri, Y., ... & Kim, E. A. (2025). Quantum many-body physics calculations with large language models. Communications Physics, 8(1), 49. https://doi.org/10.1038/s42005-025-01956-y

Patil, R., & Gudivada, V. (2024). A review of current trends, techniques, and challenges in large language models (llms). Applied Sciences, 14(5), 2074. https://doi.org/10.3390/app14052074

Pham, C., Hoyle, A., Sun, S., Resnik, P., & Iyyer, M. (2024). TopicGPT: A Prompt-based Topic Modeling Framework. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2956–2984. https://doi.org/10.18653/v1/2024.naacl-long.164

Reuter, A., Thielmann, A., Weisser, C., Fischer, S., & Säfken, B. (2024). Gptopic: Dynamic and interactive topic representations. arXiv preprint arXiv:2403.03628. https://doi.org/10.48550/arXiv.2403.03628

Röder, M., Both, A., & Hinneburg, A. (2015, February). Exploring the space of topic coherence measures. In Proceedings of the eighth ACM international conference on Web search and data mining (pp. 399-408). https://doi.org/10.1145/2684822.2685324

Spielberger, G., Artinger, F. M., Reb, J., & Kerschreiter, R. (2025). Retrieval Augmented Generation for Topic Modeling in Organizational Research: An Introduction with Empirical Demonstration. arXiv preprint arXiv:2502.20963. https://doi.org/10.48550/arXiv.2502.20963

Sun, K., Bai, Y., Qi, J., Hou, L., & Li, J. (2024). MM-MATH: Advancing Multimodal Math Evaluation with Process Evaluation and Fine-grained Classification. Findings of the Association for Computational Linguistics: EMNLP 2024, 1358–1375. https://doi.org/10.18653/v1/2024.findings-emnlp.73

Wang, D., Qiu, J., Li, R., & Tian, H. (2025). Development and interpretation of machine learning-based prognostic models for predicting high-risk prognostic pathological components in pulmonary nodules: integrating clinical features, serum tumor marker and imaging features. Journal of Cancer Research and Clinical Oncology, 151(6), 190. https://doi.org/10.1007/s00432-025-06241-7

Xu, R., Sun, Y., Ren, M., Guo, S., Pan, R., Lin, H., ... & Han, X. (2024). AI for social science and social science of AI: A survey. Information Processing & Management, 61(3), 103665. https://doi.org/10.1016/j.ipm.2024.103665

Zhang, D., Yu, Y., Dong, J., Li, C., Su, D., Chu, C., & Yu, D. (2024). MM-LLMs: Recent Advances in MultiModal Large Language Models. Findings of the Association for Computational Linguistics ACL 2024, 12401–12430. https://doi.org/10.18653/v1/2024.findings-acl.738

Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223, 1(2). https://doi.org/10.48550/arXiv.2303.18223

Zhu, M., Lin, H., Jiang, J., Jinia, A. J., Jee, J., Pichotta, K., Waters, M., Rose, D., Schultz, N., Chalise, S., Valleru, L., Morin, O., Moran, J., Deasy, J. O., Pilai, S., Nichols, C., Riely, G., Braunstein, L. Z., & Li, A. (2025). Large language model trained on clinical oncology data predicts cancer progression. Npj Digital Medicine, 8(1). https://doi.org/10.1038/s41746-025-01780-2

An LLM-powered framework for hierarchical topic discovery in LLM research

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

About the Journal

Make a Submission

Information