Large language models na ciência da informação: perspectivas técnicas da inteligência artificial aplicadas à organização e à arquitetura da informação(Large language models in information science: perspectives on information organisation and architecture)
DOI:
https://doi.org/10.47989/ir31iConf64289Keywords:
Ciência da informação, Organização da informação, Arquitetura da informação, Inteligência artificial, Large Language modelsAbstract
Introduction. Despite the popularity of large language models, there are still gaps in understanding how information is organised, represented and structured within these systems. Thus, this study sought to identify and analyse aspects of information science (IS) in LLMs.
Method. This is a bibliographic study with a qualitative approach, conducted through a rigorous protocol applied to the ACM digital library, ScienceDirect, Scopus and Web of Science databases.
Analysis. A total of 53 studies published between 2022 and 2025 were examined, from which 16 categories of artificial intelligence techniques applied in LLMs were identified.
Results. These techniques were related to the four information architecture systems: organisation, labeling, navigation and search, and to classical instruments of information organisation, highlighting the convergence between principles of IS and AI innovations.
Conclusions. The articulation between information architecture and information organisation provides conceptual and practical foundations to understand how LLMs structure, represent and make data available. The study points to future research directions, reinforces the relevance of IS in the debate on AI and highlights the role of information professionals in critically mediating between technology and society.
References
Adeseye, A., Isoaho, J., & Mohammad, T. (2025). LLM-assisted qualitative data analysis: Security and privacy concerns in gamified workforce studies. Procedia Computer Science, 257, 60–67. https://doi.org/10.1016/j.procs.2025.03.011
Agbareia, R., Omar, M., Soffer, S., Glicksberg, B. S., Nadkarni, G. N., & Klang, E. (2025). Visual-textual integration in LLMs for medical diagnosis: A preliminary quantitative analysis. Computational and Structural Biotechnology Journal, 27, 184–189. https://doi.org/10.1016/j.csbj.2024.12.019
Alammar, J., & Grooendorst, M. (2024). Hands-on large language models. O’Reilly.
Alghamdi, H., & Mostafa, A. (2025). Advancing EHR analysis: Predictive medication modeling using LLMs. Information Systems, 131, 102528. https://doi.org/10.1016/j.is.2025.102528
Badalotti, D., Agrawal, A., Pensato, U., Angelotti, G., & Marcheselli, S. (2024). Development of a natural language processing (NLP) model to automatically extract clinical data from electronic health records: Results from an Italian comprehensive stroke center. International Journal of Medical Informatics, 192, 105626. https://doi.org/10.1016/j.ijmedinf.2024.105626
Chen, K., Zhou, X., Lin, Y., Feng, S., Shen, L., & Wu, P. (2025). A survey on privacy risks and protection in large language models. arXiv, 2505.01976v1. https://doi.org/10.48550/arXiv.2505.01976
Chen, L. C., Pardeshi, M. S., Liao, Y. X., & Pai, K. C. (2025). Application of retrieval-augmented generation for interactive industrial knowledge management via a large language model. Computer Standards & Interfaces, 94, 103995. https://doi.org/10.1016/j.csi.2025.103995
Cho, H., Yoo, S., Kim, B., Jang, S., Sunwoo, L., Kim, S., Lee, D., Kim, S., Nam, S., & Chung, J.-H. (2024). Extracting lung cancer staging descriptors from pathology reports: A generative language model approach. Journal of Biomedical Informatics, 157, 104720. https://doi.org/10.1016/j.jbi.2024.104720
Cho, N., Srishankar, N., Cecchi, L., & Watson, W. (2024). FISHNET: Financial intelligence from sub-querying, harmonizing, neural-conditioning, expert swarms, and task planning. Proceedings of the ACM International Conference on AI in Finance, 5, 591-599. https://doi.org/10.1145/3677052.3698597
Choi, H. & Jeong, J. (2025a). A conceptual framework for a latest information-maintaining method using retrieval-augmented generation and a large language model in smart manufacturing: Theoretical approach and performance analysis. Machines, 13(2), 94. https://doi.org/10.3390/machines13020094
Choi, H. & Jeong, J. (2025b). Domain-specific manufacturing analytics framework: An integrated architecture with retrieval-augmented generation and Ollama-based models for manufacturing execution systems environments. Processes, 13(3), 670. https://doi.org/10.3390/pr13030670
Chow, A. R. (2025, June 23). ChatGPT may be eroding critical thinking skills, according to a new MIT study. Time. https://time.com/7295195/ai-chatgpt-google-learning-school/ (Internet Arquive)
Costa, D. G., Silva, I., Medeiros, M., Bittencourt, J. C. N., & Andrade, M. (2024). A method to promote safe cycling powered by large language models and AI agents. MethodsX, 13, 102880. https://doi.org/10.1016/j.mex.2024.102880
Creswell, J. W. & Creswell, J. D. (2021). Projeto de pesquisa: Métodos qualitativo, quantitativo e misto (5ª ed.). Penso.
Cruz, M. C. A., Ferneda, E., & Fujita, M. S. L. (2022). A disponibilização de vocabulário controlado aos usuários para a recuperação da informação. RICI: Revista Íbero-Americana de ciência da informação, 15(1), 266–282. https://doi.org/10.26512/rici.v15.n1.2022.42464
Du, Y., Chen, K., Zhan, Y., Low, C. H., You, T., Islam, M., Guo, Z., Jin, Y., Chen, G., & Heng, P.-A. (2024). LLM-assisted multi-teacher continual learning for visual question answering in robotic surgery. arXiv, 2402.16664v3. https://doi.org/10.48550/arXiv.2402.16664
Elahi, A. & Taghvaei, F. (2024). Combining financial data and news articles for stock price movement prediction using large language models. arXiv, 2411.01368v1. https://doi.org/10.48550/arXiv.2411.01368
Ferreira, S. A. & Oliveira, D. A. (2021). Desinformação, crise de confiança na ciência e necessidade de políticas para divulgação científica. Ciência da Informação Express, 2(4), 1–6. https://cienciadainformacaoexpress.ufla.br/index.php/revista/article/view/28/63 (Internet Archive)
Fu, X., Sanchez, T. W., Li, C., & Junqueira, J. R. (2024). Deciphering public voices in the digital era: Benchmarking ChatGPT for analyzing citizen feedback in Hamilton, New Zealand. Journal of the American Planning Association, 90(4), 728–741. https://doi.org/10.1080/01944363.2024.2309259
Gabriel-Petit, P. (2025). Designing information architecture: A practical guide to structuring digital content for findability and easy navigability. Packt Publishing.
Gilliland, A. J. (2016). Setting the stage. In M. Baca (Ed.), Introduction to metadata (3rd ed.). Getty Research Institute. https://www.getty.edu/publications/intrometadata/setting-the-stage/ (Internet Archive)
Han, B., Susnjak, T., & Mathrani, A. (2024). Automating systematic literature reviews with retrieval-augmented generation: A comprehensive overview. Applied Sciences, 14(19), 9103. https://doi.org/10.3390/app14199103
Jaimovitch-López, G., Ferri, C., Hernández-Orallo, J., Martínez-Plumed, F., & Ramírez-Quintana, M. J. (2022). Can language models automate data wrangling?. Machine Learning, 112, 2053–2082. https://doi.org/10.1007/s10994-022-06259-9
Jeong, Y., Song, J.-J., Yang, J., & Kang, S. (2024). Advancing tinnitus therapeutics: GPT-2 driven clustering analysis of cognitive behavioral therapy sessions and Google T5-based predictive modeling for THI score assessment. IEEE Access, 12, 52414–52429. https://doi.org/10.1109/ACCESS.2024.3383020
Kallens, P. C., Kristensen-McLachlan, R. D., & Christiansen, M. H. (2023). Large language models demonstrate the potential of statistical learning in language. Cognitive Science, 47(3), e13256. https://doi.org/10.1111/cogs.13256
Li, H., Wu, H., Li, Q., & Zhao, C. (2025). A review on enhancing agricultural intelligence with large language models. Artificial Intelligence in Agriculture, 15(4), 671–685. https://doi.org/10.1016/j.aiia.2025.05.006
Li, Y. Y., Bai, Y., Wang, C., Qu, M., Lu, Z., Soria, R., & Liu, J. (2025). Deep learning and methods based on large language models applied to stellar light curve classification. Intelligent Computing, 4, 0110. https://doi.org/10.34133/icomputing.0110
Lima, G. A. (2016). Arquitetura da informação. In R. C. R. Miranda (Ed.), arquitetura da informação na Câmara dos Deputados. Câmara dos Deputados, Edições Câmara. http://bd.camara.leg.br/bd/handle/bdcamara/30199 (Internet Archive)
Lima, G. A. (2020). Organização e representação do conhecimento e da informação na web: teorias e técnicas. Perspectivas em ciência da informação, 25, 57-97. https://periodicos.ufmg.br/index.php/pci/article/view/22283 (Internet Archive)
Lima, G. A. & Maculan, B. C. M. S. (2017). Estudo comparativo das estruturas semânticas em diferentes sistemas de organização do conhecimento. Ciência da Informação, 46(1), 60-72. https://revista.ibict.br/ciinf/article/view/4014 (Internet Archive)
Liu, F., Jung, J., Feinstein, W., D’Ambrogia, J., & Jung, G. (2024). Aggregated knowledge model: Enhancing domain-specific QA with fine-tuned and retrieval-augmented generation models. Proceedings of the International Conference on AI-ML Systems, 4, 1–7. https://doi.org/10.1145/3703412.3703434
Liu, J., Ren, J., Jin, R., Zhang, Z., Zhou, Y., Valduriez, P., & Dou, D. (2024). Fisher information-based efficient curriculum federated learning with large language models. arXiv, 2410.00131v2. https://doi.org/10.48550/arXiv.2410.00131
Liu, X., Yu, Z., Liu, X., Miao, L., & Yang, T. (2024). Military equipment entity extraction based on large language model. Applied Sciences, 14(19), 9063. https://doi.org/10.3390/app14199063
Memduhoğlu, A., Fulman, N., & Zipf, A. (2024). Enriching building function classification using large language model embeddings of OpenStreetMap tags. Earth Science Informatics, 17, 5403–5418. https://doi.org/10.1007/s12145-024-01463-8
Mishra, M., Braham, A., Marsom, C., Chung, B., Griffin, G., Sidnerlikar, D., Sarin, C., & Rajaram, A. (2024). DataAgent: Evaluating large language models’ ability to answer zero-shot, natural language queries. arXiv, 2404.00188v1. https://doi.org/10.48550/arXiv.2404.00188
Mohammad, R., AlkhnBashi, O. S., & Hammoudeh, M. (2024). Optimizing large language models for Arabic healthcare communication: A focus on patient-centered NLP applications. Big Data and Cognitive Computing, 8(11), 157. https://doi.org/10.3390/bdcc8110157
Nascimento Silva, P. (2023). Recuperação de informação na ciência da informação: Produção acadêmico-científica brasileira (2012–2021). Transinformação, 35, e237336. https://doi.org/10.1590/2318-0889202335e237336
Noels, S., De Blaere, J., & De Bie, T. (2024). A Dutch financial large language model. Proceedings of the ACM International Conference on AI in Finance, 5, 283-291. https://doi.org/10.1145/3677052.3698628
Papageorgiou, G., Sarlis, V., Maragoudakis, M., & Tjortjis, C. (2024). Enhancing e-government services through state-of-the-art, modular, and reproducible architecture over large language models. Applied Sciences, 14(18), 8259. https://doi.org/10.3390/app14188259
Pais, C., Liu, J., Voigt, R., Gupta, V., Wade, E., & Bayati, M. (2024). Large language models for preventing medication direction errors in online pharmacies. Nature Medicine, 30(6), 1574–1582. https://doi.org/10.1038/s41591-024-02933-8
Pragyan, K. C., Ghandiparsi, R., Slavin, R., Ghanavati, S., Breaux, T., & Hosseini, M. B. (2024). Toward regulatory compliance: A few-shot learning approach to extract processing activities. Proceedings of the IEEE International Requirements Engineering Conference Workshops, 32, 241–248. https://doi.org/10.1109/REW61692.2024.00038
Rangan, K. & Yin, Y. (2024). A fine-tuning enhanced RAG system with quantized influence measure as AI judge. Scientific Reports, 14, 27446. https://doi.org/10.1038/s41598-024-79110-x
Reddy, V., Koncel-Kedziorski, R., Lai, V. D., Krumdick, M., Lovering, C., & Tanner, C. (2024). DocFinQA: A long-context financial reasoning dataset. arXiv, 2401.06915v3. https://doi.org/10.48550/arXiv.2401.06915
Ren, Y., Zhang, T., Dong, X., Li, W., Wang, Z., He, J., Zhang, H., & Jiao, L. (2024). WaterGPT: Training a large language model to become a hydrology expert. Water, 16(21), 3075. https://doi.org/10.3390/w16213075
Rosenfeld, L., Morville, P., & Arango, J. (2015). Information architecture (4th ed.). O’Reilly.
Sadick, A. M. & Chinazzo, G. (2025). What did the occupant say? Fine-tuning and evaluating a large language model for efficient analysis of multi-domain indoor environmental quality feedback. Building and Environment, 274, 112735. https://doi.org/10.1016/j.buildenv.2025.112735
Santana, L. D., Martins, R. M., Chagas, L. B. R., Pereira, F. C. M., Lima, G. A., & Moura, M. A. (2024). Sistemas de Organização do Conhecimento: análise comparativa e modelagem de instrumentos de representação do conhecimento. Encontros Bibli, 29, e97708, 1-27. https://doi.org/10.5007/1518-2924.2024.e97708
Sarzaeim, P., Mahmoud, Q. H., & Azim, A. (2024). A framework for LLM-assisted smart policing system. IEEE Access, 12, 74915–74929. https://doi.org/10.1109/ACCESS.2024.3404862
Silva, M. R. & Fujita, M. S. L. (2004). A prática da indexação: análise da evolução de tendências teóricas e metodológicas. Transinformação, 16(2), 133-161. https://periodicos.puc-campinas.edu.br/transinfo/article/view/6373 (Internet Archive)
Song, Z., Hwang, G. Y., Zhang, X., Huang, S., & Park, B. K. (2025). A scientific-article key-insight extraction system based on multi-actor of fine-tuned open-source large language models. Scientific Reports, 15, 1608. https://doi.org/10.1038/s41598-025-85715-7
Stanley, J., Rabot, E., Reddy, S., Belilovsky, E., Mottron, L., & Bzdo, D. (2025). Large language models deconstruct the clinical intuition behind diagnosing autism. Cell, 188(8), 2235–2248. https://doi.org/10.1016/j.cell.2025.02.025
Sung, C., Lee, Y., & Tsai, Y. (2024). A new pipeline for generating instruction dataset via RAG and self fine-tuning. Proceedings of the IEEE Annual Computers, Software, and Applications Conference, 48, 2308-2312. https://doi.org/10.1109/COMPSAC61105.2024.00371
Tanabe, K., Hirano, M., Matoya, K., Imajo, K., Sakaji, H., & Noda, I. (2024a). Enhancing financial domain adaptation of language models via model augmentation. arXiv, 2411.09249v1. https://doi.org/10.48550/arXiv.2411.09249
Tanabe, K., Suzuki, M., Sakaji, H., & Noda, I. (2024b). JaFIn: Japanese financial instruction dataset. Proceedings of the IEEE Symposium on Computational Intelligence for Financial Engineering and Economics, 1-10. https://doi.org/10.1109/CIFEr62890.2024.10772973
Taylor, N., Kormilitzin, A., Lorge, I., Nevado-Holgado, A., Cipriani, A., & Joyce, D. W. (2024). Model development for bespoke large language models for digital triage assistance in mental health care. Artificial Intelligence in Medicine, 157, 102988. https://doi.org/10.1016/j.artmed.2024.102988
Tojima, T. & Yoshida, M. (2025). Zero-shot classification of art with large language models. IEEE Access, 13, 17426–17435. https://doi.org/10.1109/ACCESS.2025.3532995
Tong, X., Wang, J., Yang, Y., Peng, T., Zhai, H., & Ling, G. (2025). LEGF-DST: LLMs-enhanced graph-fusion dual-stream transformer for fine-grained Chinese malicious SMS detection. Computers, Materials & Continua, 82(2), 1902–1919. https://doi.org/10.32604/cmc.2024.059018
Wang, H., Gao, C., Dantona, C., Hull, B., & Sun, J. (2024). DRG-LLaMA: Tuning LLaMA model to predict diagnosis-related group for hospitalized patients. npj Digital Medicine, 7, 16. https://doi.org/10.1038/s41746-023-00989-3
Wang, T., Zhang, B., Jiang, D., & Li, D. (2025). A multimodal large language model framework for intelligent perception and decision-making in smart manufacturing. Sensors, 25(10), 3072. https://doi.org/10.3390/s25103072
Wiest, I. C., Ferber, D., Zhu, J., van Treeck, M., Meyer, S. K., Juglan, R., Carrero, Z. I., Paech, D., Kleesiek, J., Ebert, M. P., Truhn, D., & Kather, J. N. (2024). Privacy-preserving large language models for structured medical information retrieval. npj Digital Medicine, 7, 257. https://doi.org/10.1038/s41746-024-01233-2
Wu, L., Xu, J., Thakkar, S., Gray, M., Qu, Y., Li, D., & Tong, W. (2024). A framework enabling LLMs into regulatory environment for transparency and trustworthiness and its application to drug labeling document. Regulatory Toxicology and Pharmacology, 149, 105613. https://doi.org/10.1016/j.yrtph.2024.105613
Yan, H. & Shao, D. (2025). Multimodal medical image analysis: Integrating LLM and RAG deep learning strategies. Journal of Advances in Information Technology, 16(4), 568–581. https://doi.org/10.12720/jait.16.4.568-581
Yang, J., Shu, L., Duan, H., & Li, H. (2024). RDguru: A conversational intelligent agent for rare diseases. IEEE Journal of Biomedical and Health Informatics, 29(9), 6366-6378. https://doi.org/10.1109/JBHI.2024.3464555
Zhao, X., Leng, X., Wang, L., & Wang, N. (2024). Research on fine-tuning optimization strategies for large language models in tabular data processing. Biomimetics, 9(11), 708. https://doi.org/10.3390/biomimetics9110708
Zhang, K., Lv, A., Chen, Y., Ha, H., Xu, T., & Yan, R. (2024). Batch-ICL: Effective, efficient, and order-agnostic in-context learning. arXiv, 2401.06469v3. https://doi.org/10.48550/arXiv.2401.06469
Zheng, J., Wang, H., & Yao, J. (2024). Building lightweight domain-specific consultation systems via inter-external knowledge fusion contrastive learning. IEEE Access, 12, 113244–113259. https://doi.org/10.1109/ACCESS.2024.3434648
Zhou, B., Geißler, D., & Lukowicz, P. (2024). Misinforming LLMs: Vulnerabilities, challenges and opportunities. arXiv, 2408.01168v1. https://doi.org/10.48550/arXiv.2408.01168
Zhou, S., Zhou, Z., Wang, C., Liang, Y., Wang, L., Zhang, J., Zhang, J., & Lv, C. (2024). A user-centered framework for data privacy protection using large language models and attention mechanisms. Applied Sciences, 14(15), 6824. https://doi.org/10.3390/app14156824
Zhu, F., Liu, Z., Feng, F., Wang, C., Li, M., & Chua, T. S. (2024). TAT-LLM: A specialized language model for discrete reasoning over financial tabular and textual data. Proceedings of the ACM International Conference on AI in Finance, 5, 310-318. https://doi.org/10.1145/3677052.3698685
Zhu, G., Jia, W., Xing, Z., Xiang, L., Hu, A., & Hao, R. (2025). CMLLM: A novel cross-modal large language model for wind power forecasting. Energy Conversion and Management, 330, 119673. https://doi.org/10.1016/j.enconman.2025.119673
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Daiane Campos Procópio , Patrícia Nascimento Silva

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
