Large language models na ciência da informação: perspectivas técnicas da inteligência artificial aplicadas à organização e à arquitetura da informação(Large language models in information science: perspectives on information organisation and architecture)

Daiane Campos Procópio; Patrícia Nascimento Silva

doi:10.47989/ir31iConf64289

Authors

Daiane Campos Procópio Universidade Federal de Minas Gerais https://orcid.org/0000-0002-9006-191X
Patrícia Nascimento Silva Universidade Federal de Minas Gerais https://orcid.org/0000-0002-2405-8536

DOI:

https://doi.org/10.47989/ir31iConf64289

Keywords:

Ciência da informação, Organização da informação, Arquitetura da informação, Inteligência artificial, Large Language models

Abstract

Introduction. Despite the popularity of large language models, there are still gaps in understanding how information is organised, represented and structured within these systems. Thus, this study sought to identify and analyse aspects of information science (IS) in LLMs.

Method. This is a bibliographic study with a qualitative approach, conducted through a rigorous protocol applied to the ACM digital library, ScienceDirect, Scopus and Web of Science databases.

Analysis. A total of 53 studies published between 2022 and 2025 were examined, from which 16 categories of artificial intelligence techniques applied in LLMs were identified.

Results. These techniques were related to the four information architecture systems: organisation, labeling, navigation and search, and to classical instruments of information organisation, highlighting the convergence between principles of IS and AI innovations.

Conclusions. The articulation between information architecture and information organisation provides conceptual and practical foundations to understand how LLMs structure, represent and make data available. The study points to future research directions, reinforces the relevance of IS in the debate on AI and highlights the role of information professionals in critically mediating between technology and society.

References

Adeseye, A., Isoaho, J., & Mohammad, T. (2025). LLM-assisted qualitative data analysis: Security and privacy concerns in gamified workforce studies. Procedia Computer Science, 257, 60–67. https://doi.org/10.1016/j.procs.2025.03.011

Agbareia, R., Omar, M., Soffer, S., Glicksberg, B. S., Nadkarni, G. N., & Klang, E. (2025). Visual-textual integration in LLMs for medical diagnosis: A preliminary quantitative analysis. Computational and Structural Biotechnology Journal, 27, 184–189. https://doi.org/10.1016/j.csbj.2024.12.019

Alammar, J., & Grooendorst, M. (2024). Hands-on large language models. O’Reilly.

Alghamdi, H., & Mostafa, A. (2025). Advancing EHR analysis: Predictive medication modeling using LLMs. Information Systems, 131, 102528. https://doi.org/10.1016/j.is.2025.102528

Badalotti, D., Agrawal, A., Pensato, U., Angelotti, G., & Marcheselli, S. (2024). Development of a natural language processing (NLP) model to automatically extract clinical data from electronic health records: Results from an Italian comprehensive stroke center. International Journal of Medical Informatics, 192, 105626. https://doi.org/10.1016/j.ijmedinf.2024.105626

Chen, K., Zhou, X., Lin, Y., Feng, S., Shen, L., & Wu, P. (2025). A survey on privacy risks and protection in large language models. arXiv, 2505.01976v1. https://doi.org/10.48550/arXiv.2505.01976

Chen, L. C., Pardeshi, M. S., Liao, Y. X., & Pai, K. C. (2025). Application of retrieval-augmented generation for interactive industrial knowledge management via a large language model. Computer Standards & Interfaces, 94, 103995. https://doi.org/10.1016/j.csi.2025.103995

Cho, H., Yoo, S., Kim, B., Jang, S., Sunwoo, L., Kim, S., Lee, D., Kim, S., Nam, S., & Chung, J.-H. (2024). Extracting lung cancer staging descriptors from pathology reports: A generative language model approach. Journal of Biomedical Informatics, 157, 104720. https://doi.org/10.1016/j.jbi.2024.104720

Cho, N., Srishankar, N., Cecchi, L., & Watson, W. (2024). FISHNET: Financial intelligence from sub-querying, harmonizing, neural-conditioning, expert swarms, and task planning. Proceedings of the ACM International Conference on AI in Finance, 5, 591-599. https://doi.org/10.1145/3677052.3698597

Choi, H. & Jeong, J. (2025a). A conceptual framework for a latest information-maintaining method using retrieval-augmented generation and a large language model in smart manufacturing: Theoretical approach and performance analysis. Machines, 13(2), 94. https://doi.org/10.3390/machines13020094

Choi, H. & Jeong, J. (2025b). Domain-specific manufacturing analytics framework: An integrated architecture with retrieval-augmented generation and Ollama-based models for manufacturing execution systems environments. Processes, 13(3), 670. https://doi.org/10.3390/pr13030670

Chow, A. R. (2025, June 23). ChatGPT may be eroding critical thinking skills, according to a new MIT study. Time. https://time.com/7295195/ai-chatgpt-google-learning-school/ (Internet Arquive)

Costa, D. G., Silva, I., Medeiros, M., Bittencourt, J. C. N., & Andrade, M. (2024). A method to promote safe cycling powered by large language models and AI agents. MethodsX, 13, 102880. https://doi.org/10.1016/j.mex.2024.102880

Creswell, J. W. & Creswell, J. D. (2021). Projeto de pesquisa: Métodos qualitativo, quantitativo e misto (5ª ed.). Penso.

Cruz, M. C. A., Ferneda, E., & Fujita, M. S. L. (2022). A disponibilização de vocabulário controlado aos usuários para a recuperação da informação. RICI: Revista Íbero-Americana de ciência da informação, 15(1), 266–282. https://doi.org/10.26512/rici.v15.n1.2022.42464

Du, Y., Chen, K., Zhan, Y., Low, C. H., You, T., Islam, M., Guo, Z., Jin, Y., Chen, G., & Heng, P.-A. (2024). LLM-assisted multi-teacher continual learning for visual question answering in robotic surgery. arXiv, 2402.16664v3. https://doi.org/10.48550/arXiv.2402.16664

Elahi, A. & Taghvaei, F. (2024). Combining financial data and news articles for stock price movement prediction using large language models. arXiv, 2411.01368v1. https://doi.org/10.48550/arXiv.2411.01368

Ferreira, S. A. & Oliveira, D. A. (2021). Desinformação, crise de confiança na ciência e necessidade de políticas para divulgação científica. Ciência da Informação Express, 2(4), 1–6. https://cienciadainformacaoexpress.ufla.br/index.php/revista/article/view/28/63 (Internet Archive)

Fu, X., Sanchez, T. W., Li, C., & Junqueira, J. R. (2024). Deciphering public voices in the digital era: Benchmarking ChatGPT for analyzing citizen feedback in Hamilton, New Zealand. Journal of the American Planning Association, 90(4), 728–741. https://doi.org/10.1080/01944363.2024.2309259

Gabriel-Petit, P. (2025). Designing information architecture: A practical guide to structuring digital content for findability and easy navigability. Packt Publishing.

Gilliland, A. J. (2016). Setting the stage. In M. Baca (Ed.), Introduction to metadata (3rd ed.). Getty Research Institute. https://www.getty.edu/publications/intrometadata/setting-the-stage/ (Internet Archive)

Han, B., Susnjak, T., & Mathrani, A. (2024). Automating systematic literature reviews with retrieval-augmented generation: A comprehensive overview. Applied Sciences, 14(19), 9103. https://doi.org/10.3390/app14199103

Jaimovitch-López, G., Ferri, C., Hernández-Orallo, J., Martínez-Plumed, F., & Ramírez-Quintana, M. J. (2022). Can language models automate data wrangling?. Machine Learning, 112, 2053–2082. https://doi.org/10.1007/s10994-022-06259-9

Jeong, Y., Song, J.-J., Yang, J., & Kang, S. (2024). Advancing tinnitus therapeutics: GPT-2 driven clustering analysis of cognitive behavioral therapy sessions and Google T5-based predictive modeling for THI score assessment. IEEE Access, 12, 52414–52429. https://doi.org/10.1109/ACCESS.2024.3383020

Kallens, P. C., Kristensen-McLachlan, R. D., & Christiansen, M. H. (2023). Large language models demonstrate the potential of statistical learning in language. Cognitive Science, 47(3), e13256. https://doi.org/10.1111/cogs.13256

Li, H., Wu, H., Li, Q., & Zhao, C. (2025). A review on enhancing agricultural intelligence with large language models. Artificial Intelligence in Agriculture, 15(4), 671–685. https://doi.org/10.1016/j.aiia.2025.05.006

Li, Y. Y., Bai, Y., Wang, C., Qu, M., Lu, Z., Soria, R., & Liu, J. (2025). Deep learning and methods based on large language models applied to stellar light curve classification. Intelligent Computing, 4, 0110. https://doi.org/10.34133/icomputing.0110

Lima, G. A. (2016). Arquitetura da informação. In R. C. R. Miranda (Ed.), arquitetura da informação na Câmara dos Deputados. Câmara dos Deputados, Edições Câmara. http://bd.camara.leg.br/bd/handle/bdcamara/30199 (Internet Archive)

Lima, G. A. (2020). Organização e representação do conhecimento e da informação na web: teorias e técnicas. Perspectivas em ciência da informação, 25, 57-97. https://periodicos.ufmg.br/index.php/pci/article/view/22283 (Internet Archive)

Lima, G. A. & Maculan, B. C. M. S. (2017). Estudo comparativo das estruturas semânticas em diferentes sistemas de organização do conhecimento. Ciência da Informação, 46(1), 60-72. https://revista.ibict.br/ciinf/article/view/4014 (Internet Archive)

Liu, F., Jung, J., Feinstein, W., D’Ambrogia, J., & Jung, G. (2024). Aggregated knowledge model: Enhancing domain-specific QA with fine-tuned and retrieval-augmented generation models. Proceedings of the International Conference on AI-ML Systems, 4, 1–7. https://doi.org/10.1145/3703412.3703434

Liu, J., Ren, J., Jin, R., Zhang, Z., Zhou, Y., Valduriez, P., & Dou, D. (2024). Fisher information-based efficient curriculum federated learning with large language models. arXiv, 2410.00131v2. https://doi.org/10.48550/arXiv.2410.00131

Liu, X., Yu, Z., Liu, X., Miao, L., & Yang, T. (2024). Military equipment entity extraction based on large language model. Applied Sciences, 14(19), 9063. https://doi.org/10.3390/app14199063

Memduhoğlu, A., Fulman, N., & Zipf, A. (2024). Enriching building function classification using large language model embeddings of OpenStreetMap tags. Earth Science Informatics, 17, 5403–5418. https://doi.org/10.1007/s12145-024-01463-8

Mishra, M., Braham, A., Marsom, C., Chung, B., Griffin, G., Sidnerlikar, D., Sarin, C., & Rajaram, A. (2024). DataAgent: Evaluating large language models’ ability to answer zero-shot, natural language queries. arXiv, 2404.00188v1. https://doi.org/10.48550/arXiv.2404.00188

Mohammad, R., AlkhnBashi, O. S., & Hammoudeh, M. (2024). Optimizing large language models for Arabic healthcare communication: A focus on patient-centered NLP applications. Big Data and Cognitive Computing, 8(11), 157. https://doi.org/10.3390/bdcc8110157

Nascimento Silva, P. (2023). Recuperação de informação na ciência da informação: Produção acadêmico-científica brasileira (2012–2021). Transinformação, 35, e237336. https://doi.org/10.1590/2318-0889202335e237336

Noels, S., De Blaere, J., & De Bie, T. (2024). A Dutch financial large language model. Proceedings of the ACM International Conference on AI in Finance, 5, 283-291. https://doi.org/10.1145/3677052.3698628

Papageorgiou, G., Sarlis, V., Maragoudakis, M., & Tjortjis, C. (2024). Enhancing e-government services through state-of-the-art, modular, and reproducible architecture over large language models. Applied Sciences, 14(18), 8259. https://doi.org/10.3390/app14188259

Pais, C., Liu, J., Voigt, R., Gupta, V., Wade, E., & Bayati, M. (2024). Large language models for preventing medication direction errors in online pharmacies. Nature Medicine, 30(6), 1574–1582. https://doi.org/10.1038/s41591-024-02933-8

Pragyan, K. C., Ghandiparsi, R., Slavin, R., Ghanavati, S., Breaux, T., & Hosseini, M. B. (2024). Toward regulatory compliance: A few-shot learning approach to extract processing activities. Proceedings of the IEEE International Requirements Engineering Conference Workshops, 32, 241–248. https://doi.org/10.1109/REW61692.2024.00038

Rangan, K. & Yin, Y. (2024). A fine-tuning enhanced RAG system with quantized influence measure as AI judge. Scientific Reports, 14, 27446. https://doi.org/10.1038/s41598-024-79110-x

Reddy, V., Koncel-Kedziorski, R., Lai, V. D., Krumdick, M., Lovering, C., & Tanner, C. (2024). DocFinQA: A long-context financial reasoning dataset. arXiv, 2401.06915v3. https://doi.org/10.48550/arXiv.2401.06915

Ren, Y., Zhang, T., Dong, X., Li, W., Wang, Z., He, J., Zhang, H., & Jiao, L. (2024). WaterGPT: Training a large language model to become a hydrology expert. Water, 16(21), 3075. https://doi.org/10.3390/w16213075

Rosenfeld, L., Morville, P., & Arango, J. (2015). Information architecture (4th ed.). O’Reilly.

Sadick, A. M. & Chinazzo, G. (2025). What did the occupant say? Fine-tuning and evaluating a large language model for efficient analysis of multi-domain indoor environmental quality feedback. Building and Environment, 274, 112735. https://doi.org/10.1016/j.buildenv.2025.112735

Santana, L. D., Martins, R. M., Chagas, L. B. R., Pereira, F. C. M., Lima, G. A., & Moura, M. A. (2024). Sistemas de Organização do Conhecimento: análise comparativa e modelagem de instrumentos de representação do conhecimento. Encontros Bibli, 29, e97708, 1-27. https://doi.org/10.5007/1518-2924.2024.e97708

Sarzaeim, P., Mahmoud, Q. H., & Azim, A. (2024). A framework for LLM-assisted smart policing system. IEEE Access, 12, 74915–74929. https://doi.org/10.1109/ACCESS.2024.3404862

Silva, M. R. & Fujita, M. S. L. (2004). A prática da indexação: análise da evolução de tendências teóricas e metodológicas. Transinformação, 16(2), 133-161. https://periodicos.puc-campinas.edu.br/transinfo/article/view/6373 (Internet Archive)

Song, Z., Hwang, G. Y., Zhang, X., Huang, S., & Park, B. K. (2025). A scientific-article key-insight extraction system based on multi-actor of fine-tuned open-source large language models. Scientific Reports, 15, 1608. https://doi.org/10.1038/s41598-025-85715-7

Stanley, J., Rabot, E., Reddy, S., Belilovsky, E., Mottron, L., & Bzdo, D. (2025). Large language models deconstruct the clinical intuition behind diagnosing autism. Cell, 188(8), 2235–2248. https://doi.org/10.1016/j.cell.2025.02.025

Sung, C., Lee, Y., & Tsai, Y. (2024). A new pipeline for generating instruction dataset via RAG and self fine-tuning. Proceedings of the IEEE Annual Computers, Software, and Applications Conference, 48, 2308-2312. https://doi.org/10.1109/COMPSAC61105.2024.00371

Tanabe, K., Hirano, M., Matoya, K., Imajo, K., Sakaji, H., & Noda, I. (2024a). Enhancing financial domain adaptation of language models via model augmentation. arXiv, 2411.09249v1. https://doi.org/10.48550/arXiv.2411.09249

Tanabe, K., Suzuki, M., Sakaji, H., & Noda, I. (2024b). JaFIn: Japanese financial instruction dataset. Proceedings of the IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, 1-10. https://doi.org/10.1109/CIFEr62890.2024.10772973

Taylor, N., Kormilitzin, A., Lorge, I., Nevado-Holgado, A., Cipriani, A., & Joyce, D. W. (2024). Model development for bespoke large language models for digital triage assistance in mental health care. Artificial Intelligence in Medicine, 157, 102988. https://doi.org/10.1016/j.artmed.2024.102988

Tojima, T. & Yoshida, M. (2025). Zero-shot classification of art with large language models. IEEE Access, 13, 17426–17435. https://doi.org/10.1109/ACCESS.2025.3532995

Tong, X., Wang, J., Yang, Y., Peng, T., Zhai, H., & Ling, G. (2025). LEGF-DST: LLMs-enhanced graph-fusion dual-stream transformer for fine-grained Chinese malicious SMS detection. Computers, Materials & Continua, 82(2), 1902–1919. https://doi.org/10.32604/cmc.2024.059018

Wang, H., Gao, C., Dantona, C., Hull, B., & Sun, J. (2024). DRG-LLaMA: Tuning LLaMA model to predict diagnosis-related group for hospitalized patients. npj Digital Medicine, 7, 16. https://doi.org/10.1038/s41746-023-00989-3

Wang, T., Zhang, B., Jiang, D., & Li, D. (2025). A multimodal large language model framework for intelligent perception and decision-making in smart manufacturing. Sensors, 25(10), 3072. https://doi.org/10.3390/s25103072

Wiest, I. C., Ferber, D., Zhu, J., van Treeck, M., Meyer, S. K., Juglan, R., Carrero, Z. I., Paech, D., Kleesiek, J., Ebert, M. P., Truhn, D., & Kather, J. N. (2024). Privacy-preserving large language models for structured medical information retrieval. npj Digital Medicine, 7, 257. https://doi.org/10.1038/s41746-024-01233-2

Wu, L., Xu, J., Thakkar, S., Gray, M., Qu, Y., Li, D., & Tong, W. (2024). A framework enabling LLMs into regulatory environment for transparency and trustworthiness and its application to drug labeling document. Regulatory Toxicology and Pharmacology, 149, 105613. https://doi.org/10.1016/j.yrtph.2024.105613

Yan, H. & Shao, D. (2025). Multimodal medical image analysis: Integrating LLM and RAG deep learning strategies. Journal of Advances in Information Technology, 16(4), 568–581. https://doi.org/10.12720/jait.16.4.568-581

Yang, J., Shu, L., Duan, H., & Li, H. (2024). RDguru: A conversational intelligent agent for rare diseases. IEEE Journal of Biomedical and Health Informatics, 29(9), 6366-6378. https://doi.org/10.1109/JBHI.2024.3464555

Zhao, X., Leng, X., Wang, L., & Wang, N. (2024). Research on fine-tuning optimization strategies for large language models in tabular data processing. Biomimetics, 9(11), 708. https://doi.org/10.3390/biomimetics9110708

Zhang, K., Lv, A., Chen, Y., Ha, H., Xu, T., & Yan, R. (2024). Batch-ICL: Effective, efficient, and order-agnostic in-context learning. arXiv, 2401.06469v3. https://doi.org/10.48550/arXiv.2401.06469

Zheng, J., Wang, H., & Yao, J. (2024). Building lightweight domain-specific consultation systems via inter-external knowledge fusion contrastive learning. IEEE Access, 12, 113244–113259. https://doi.org/10.1109/ACCESS.2024.3434648

Zhou, B., Geißler, D., & Lukowicz, P. (2024). Misinforming LLMs: Vulnerabilities, challenges and opportunities. arXiv, 2408.01168v1. https://doi.org/10.48550/arXiv.2408.01168

Zhou, S., Zhou, Z., Wang, C., Liang, Y., Wang, L., Zhang, J., Zhang, J., & Lv, C. (2024). A user-centered framework for data privacy protection using large language models and attention mechanisms. Applied Sciences, 14(15), 6824. https://doi.org/10.3390/app14156824

Zhu, F., Liu, Z., Feng, F., Wang, C., Li, M., & Chua, T. S. (2024). TAT-LLM: A specialized language model for discrete reasoning over financial tabular and textual data. Proceedings of the ACM International Conference on AI in Finance, 5, 310-318. https://doi.org/10.1145/3677052.3698685

Zhu, G., Jia, W., Xing, Z., Xiang, L., Hu, A., & Hao, R. (2025). CMLLM: A novel cross-modal large language model for wind power forecasting. Energy Conversion and Management, 330, 119673. https://doi.org/10.1016/j.enconman.2025.119673

Large language models na ciência da informação: perspectivas técnicas da inteligência artificial aplicadas à organização e à arquitetura da informação(Large language models in information science: perspectives on information organisation and architecture)

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

About the Journal

Make a Submission

Information