Evaluating software academic impact in biomedical research based on large-scale full-text analysis

Authors

DOI:

https://doi.org/10.47989/ir31iConf64193

Keywords:

Influence of software, Full-text content, Large-scale data analysis, Bibliometric

Abstract

Objective. Analysing patterns of academic software mentions and their academic impact is essential for understanding the scholarly ecosystem and optimising research resources.

Methods. This study focuses on the biomedical domain and investigates the prevalence and impact of software entities in scientific research. Rather than treating impact as a causal effect on research outcomes, we operationalise software impact as its scholarly presence, measured through the frequency of mention.  Based on 1,500,334 articles from PubMed Central Open Access (PMC-OA), we collected disambiguated software entities mentioned in the full texts and extracted their features, including mention time, in-text location, and research subfields.

Results. Overall, the impact of software in biomedical research continues to grow, with programs such as SPSS, GraphPad, and Mega demonstrating high academic influence. Within papers, software influence is concentrated in the Methods section and is dominated by general-purpose statistical tools, while other sections display greater diversity. Across fields, domains with high-software mention tend to rely on general-purpose software, whereas more specialised domains adopt software tailored to specific biomedical tasks.

Conclusion. By tackling these questions, this study advances a systematic understanding of software role in biomedical research and provides a basis for further methodological refinement and empirical analysis.

References

Asada, M., & Fukuda, K. (2024). Enhancing Relation Extraction from Biomedical Texts by Large Language Models. In Proceedings of H. Degen & S. Ntoa (Eds), ARTIFICIAL INTELLIGENCE IN HCI, PT III, AI-HCI 2024 (14736, 3–14). Springer International Publishing Ag.

Britz, D. (2016). Attention and memory in deep learning and NLP. Retrieved from http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/.

Cui, H., Wang, Y., & Li, K. (2025). Beyond Citations: Tracing and Validating the Rapid Adoption of AlphaFold in Biomedical Research Through Full-Text Analysis. 2002–2009.

Ding, R., Wang, Y., & Zhang, C. (2019). Investigating Citation of Algorithm in Full-text of Academic Articles in NLP domain: A Preliminary Study. Proceedings of the 17th International Conference on Scientometrics and Informetrics (ISSI 2019), 2726–2727.

Duck, G., Nenadic, G., Brass, A., Robertson, D. L., & Stevens, R. (2013). BioNerDS: Exploring bioinformatics’ database and software use through literature mining. Bmc Bioinformatics, 14(1), 194.

Howison, J., & Bullard, J. (2015). Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for Information Science and Technology, 67(9), 2137–2155.

Ibrahim, B. (2021). Statistical methods used in Arabic journals of library and information science. Scientometrics, 126(5), 4383–4416.

Istrate, A.-M., Li, D., Taraborelli, D., Torkar, M., Veytsman, B., & Williams, I. (2022). A large dataset of software mentions in the biomedical literature (arXiv:2209.00693). arXiv. https://doi.org/10.48550/arXiv.2209.00693

Katsurai, M. (2021). Adoption of Data Mining Methods in the Discipline of Library and Information Science. Journal of Library and Information Studies, 19(1), 1–17.

Lam, C., Lai, F.-C., Wang, C.-H., Lai, M.-H., Hsu, N., & Chung, M.-H. (2016). Text Mining of Journal Articles for Sleep Disorder Terminologies. PLOS ONE, 11(5), e0156031.

Li, K., & Yan, E. (2018). Co-mention network of R packages: Scientific impact and clustering structure. Journal of Informetrics, 12(1), 87–100.

Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.

Pan, X., Yan, E., Cui, M., & Hua, W. (2019). How important is software to library and information science research? A content analysis of full-text publications. Journal of Informetrics, 13(1), 397–406.

Pan, X., Yan, E., Wang, Q., & Hua, W. (2015). Assessing the impact of software on science: A bootstrapped learning of software entities in full-text papers. Journal of Informetrics, 9(4), 860–871.

QasemiZadeh B & Schumann A K. (2016). The ACL RD-TEC 2.0: A language resource for evaluating term extraction and entity recognition methods. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 1862–1868.

Settles, B. (2012). Active learning. Springer International Publishing.

Tateisi, Y., Ohta, T., Pyysalo, S., Miyao, Y., & Aizawa, A. (2016). Typed Entity and Relation Annotation on Computer Science Papers. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 3836–3843.

Wang, Y., & Zhang, C. (2020). Using the full-text content of academic articles to identify and evaluate algorithm entities in the domain of natural language processing. Journal of Informetrics, 14(4), 101091.

Wang, Y., Zhang, C., & Li, K. (2022). A review on method entities in the academic literature: Extraction, evaluation, and application. Scientometrics, 127(5), 2479–2520.

Wei, Q., Zhang, Y., Amith, M., Lin, R., & Xu, H. (2020). Recognizing software names in biomedical literature using machine learning. Health Informatics Journal, 26(1), 21–33.

Zheng, A., Zhao, H., Luo, Z., Feng, C., Liu, X., & Ye, Y. (2021). Improving On-line Scientific Resource Profiling by Exploiting Resource Citation Information in the Literature. Information Processing & Management, 58(5), 102638.

Downloads

Published

2026-03-20

How to Cite

Wang, Y., Zhang, H., Hu, H., & Zhao, Y. (2026). Evaluating software academic impact in biomedical research based on large-scale full-text analysis. Information Research an International Electronic Journal, 31(iConf), 1002–1010. https://doi.org/10.47989/ir31iConf64193

Issue

Section

Conference proceedings

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.