<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">IR</journal-id>
<journal-title-group>
<journal-title>Information Research</journal-title>
</journal-title-group>
<issn pub-type="epub">1368-1613</issn>
<publisher>
<publisher-name>University of Bor&#x00E5;s</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">ir31141005</article-id>
<article-id pub-id-type="doi">10.47989/ir31141005</article-id>
<article-categories>
<subj-group xml:lang="en">
<subject>Research article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Citation count prediction based on Google Scholar profiles and Clarivate&#x2019;s journal citation reports</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Bahaghighat</surname><given-names>Mahdi</given-names></name>
<xref ref-type="aff" rid="aff0001"/></contrib>
<contrib contrib-type="author"><name><surname>Akbari</surname><given-names>Leila</given-names></name>
<xref ref-type="aff" rid="aff0002"/></contrib>
<contrib contrib-type="author"><name><surname>Ghasemi</surname><given-names>Majid</given-names></name>
<xref ref-type="aff" rid="aff0003"/></contrib>
<contrib contrib-type="author"><name><surname>Xin</surname><given-names>Qin</given-names></name>
<xref ref-type="aff" rid="aff0004"/></contrib>
<aff id="aff0001"><bold>Mahdi Bahaghighat</bold> is an Associate Professor in the Department of Computer Engineering at Imam Khomeini International University, Qazvin, Iran. His primary research interests include Artificial Intelligence, Computer Vision, Natural Language Processing, and applications of AI in Finance. Dr. Bahaghighat leads the Artificial Intelligence in Science and Technologies (AIST) laboratory. As the corresponding author, he can be reached via email at <email xlink:href="Bahaghighat&#x0040;eng.ikiu.ac.ir">Bahaghighat&#x0040;eng.ikiu.ac.ir</email>.</aff>
<aff id="aff0002"><bold>Leila Akbari</bold> is a Research Assistant in the AIST Lab. She holds a M.Sc. in Electrical Engineering from Islamic Azad University, Qazvin, Iran. Her research interests include artificial intelligence, signal processing, and image processing.</aff>
<aff id="aff0003"><bold>Majid Ghasemi</bold> is an undergraduate student in Computer Engineering Department, Imam Khomeini International University, Qazvin, Iran. His research interests include artificial intelligence, and machine learning.</aff>
<aff id="aff0004"><bold>Qin Xin</bold> is a Full Professor of Computer Science at the University of the Faroe Islands. He earned his Ph.D. from the University of Liverpool in 2004 and has held research positions at renowned institutions, including Simula Research Laboratory and UCLouvain. With over 200 peer-reviewed publications, his research focuses on algorithms for wireless networks, cryptography, and combinatorial optimization.</aff>
</contrib-group>
<pub-date pub-type="epub"><day>06</day><month>02</month><year>2026</year></pub-date>
<pub-date pub-type="collection"><year>2026</year></pub-date>
<volume>31</volume>
<issue>1</issue>
<fpage>46</fpage>
<lpage>69</lpage>
<permissions>
<copyright-year>2026</copyright-year>
<copyright-holder>&#x00A9; 2026 The Author(s).</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by-nc/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</ext-link>), permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract xml:lang="en">
<title>Abstract</title>
<p><bold>Introduction.</bold> Citation count prediction (CCP) models are vital for assessing research impact, yet existing approaches suffer from critical limitations. Prior studies often rely on restricted datasets (e.g., journal metrics alone) or fail to account for the multidimensional factors influencing citations, leading to suboptimal accuracy.</p>
<p><bold>Method.</bold> We propose an accurate CCP regression model for Computer Science and Electrical Engineering disciplines found on twenty three novel features extracted from public data in Google Scholar profiles and the Journal Citation Reports (JCR) annual report by splitting features into four datasets: Author information database (AI DB), journal information database (JI DB), paper information database (PI DB), and finally author &#x0026; paper &#x0026; journal information database (APJ DB).</p>
<p><bold>Analysis.</bold> Our evaluation employed Mean Absolute Error (MAE), Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE), and the Coefficient of Determination (R<sup>2</sup>) to assess model performance. Dimensionality reduction techniques like Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) were also applied, and their effect on CCP was assessed.</p>
<p><bold>Results.</bold> We identified that paper-level features (PI DB) were significantly more predictive than author or journal attributes, resolving a key debate in CCP research.</p>
<p><bold>Conclusions.</bold> This study enhances CCP research by introducing scalable, publicly available features, demonstrating the superiority of paper-level attributes through empirical evidence, and identifying Nu-SVR as the most effective algorithm for accurate and interpretable citation prediction, supporting researchers, institutions, and policymakers in assessing research impact.</p>
</abstract>
</article-meta>
</front>
<body>
<sec id="sec1">
<title>Introduction</title>
<p>Citation is considered in scientometrics as the reference of one published document in another document. It also is a degree to evaluate the prominence and measure of research outputs because it presents how usually a specific work is referenced in alternative scholarly literature. A high citation count often indicates that a work has significantly impacted its field, influencing subsequent research and scholarship (<xref ref-type="bibr" rid="R14">Bl&#x00FC;mel &#x0026; Schniedermann, 2020</xref>; <xref ref-type="bibr" rid="R18">Broadus, 1987</xref>; <xref ref-type="bibr" rid="R46">Moed, 2006</xref>; <xref ref-type="bibr" rid="R40">Khokhlov, 2020</xref>). So it can act as an indicator of research quality; works with high citation counts are often thought to be more valuable or credible than less cited peer-reviewed literature (<xref ref-type="bibr" rid="R15">Bornmann &#x0026; Daniel, 2008</xref>; <xref ref-type="bibr" rid="R19">Butler &#x0026; Visser, 2006</xref>). Citation counts can also provide to spot trends in research over time, indicating areas of growing interest, and shifts in the scientific landscape. Besides, citation counts play a key role in resource allocation decisions, as funding agencies and institutions also rely on these metrics to decide which research areas or researchers are worthy of support. Overall, citation counts play a vital role in academic communication, evaluation, and the advancement of knowledge in various fields (<xref ref-type="bibr" rid="R1">Abramo et al., 2023</xref>; <xref ref-type="bibr" rid="R11">Belikov &#x0026; Belikov, 2015</xref>; <xref ref-type="bibr" rid="R21">Cao et al., 2016</xref>; <xref ref-type="bibr" rid="R25">Durieux &#x0026; Gevenois, 2010</xref>; <xref ref-type="bibr" rid="R30">Gao et al., 2024</xref>; <xref ref-type="bibr" rid="R33">Groos &#x0026; Pritchard, 1969</xref>; <xref ref-type="bibr" rid="R58">Sohrabi &#x0026; Iraj, 2017</xref>; <xref ref-type="bibr" rid="R66">Yu et al., 2014</xref>).</p>
<p>On the other side, several famous bibliometrics measures, including Impact Factor (IF) and h-index, are also based on citations from publications and journals. In the early 1960s, the IF was introduced by the Institute of Scientific Information (ISI) (<xref ref-type="bibr" rid="R31">Garfield, 2006</xref>). The IF is determined by the number of citations it receives throughout the year. The IF is calculated by dividing the current number of citations by the number of articles published in the last two years. As a result, the performance of a journal can be directly estimated based on its IF (<xref ref-type="bibr" rid="R5">Amin &#x0026; Mabe, 2003</xref>; <xref ref-type="bibr" rid="R16">Bornmann &#x0026; Daniel, 2009</xref>; <xref ref-type="bibr" rid="R17">Braun et al., 2006</xref>; <xref ref-type="bibr" rid="R25">Durieux &#x0026; Gevenois, 2010</xref>; <xref ref-type="bibr" rid="R37">Hirsch, 2010</xref>; <xref ref-type="bibr" rid="R45">Lundberg, 2006</xref>). It has been criticized for not taking into account the diversity of individual researchers among different fields of study and for being inaccurate in terms of its purpose. In addition, researchers were given an index called the h-index in order to measure their objective function. To determine an author&#x0027;s h-index, one must determine how many of his or her articles have been cited directly by other researchers at least a similar number of times (<xref ref-type="bibr" rid="R17">Braun et al., 2006</xref>; <xref ref-type="bibr" rid="R27">Fassin, 2020</xref>; <xref ref-type="bibr" rid="R37">Hirsch, 2010</xref>; <xref ref-type="bibr" rid="R41">Khurana &#x0026; Sharma, 2022</xref>). In addition, Q1 to Q4 quartile rankings for journals are provided by two key sources: Clarivate Analytics and Elsevier. Clarivate&#x2019;s Journal Citation Reports (JCR) ranks journals based on their Impact Factor (IF) through its Web of Science platform, assigning quartiles from Q1 (top 25%) to Q4 (bottom 25%). Elsevier&#x2019;s Scopus platform uses the SCImago Journal Rank (SJR), which evaluates journals based on citation impact and assigns similar quartiles. Both systems are widely recognized and used for academic journal rankings globally, guiding researchers in assessing and selecting journals for publishing their work (<xref ref-type="bibr" rid="R4">Almas et al., 2021</xref>; <xref ref-type="bibr" rid="R32">Gonz&#x00E1;lez-Betancor &#x0026; Dorta-Gonz&#x00E1;lez, 2017</xref>; <xref ref-type="bibr" rid="R42">Kosyakov &#x0026; Pislyakov, 2024</xref>; <xref ref-type="bibr" rid="R47">Moussa, 2023</xref>; <xref ref-type="bibr" rid="R51">Okagbue et al., 2020</xref>; <xref ref-type="bibr" rid="R50">Okagbue et al., 2021</xref>; <xref ref-type="bibr" rid="R60">Teixeira da Silva, 2020</xref>; <xref ref-type="bibr" rid="R61">Torres-Salinas et al., 2022</xref>).</p>
<p><xref ref-type="bibr" rid="R30">Gao et al., 2024</xref> leveraged multi-layer academic networks to improve citation count predictions by fusing different relationship types between publications. The authors show how taking this multilayer perspective of academic relations can enhance precision. Their model offers the ability to capture complexity in citation dynamics. Nevertheless, the intricacy of a multi-layered network could impose difficulties in model interpretability, hindering researchers from interpreting the mechanisms that power predictions. Also, the model performance may depend on the quality of the network that it is trained on, which may differ greatly between different sectors.</p>
<p>The process of peer reviewing is widely accepted as a means of evaluating papers (<xref ref-type="bibr" rid="R44">Li et al., 2019</xref>). It is important for a reviewer to evaluate the originality, creativity, contribution, integrity, and readability of a paper. Since peer-reviewing data includes the assessment comments of related experts, it may also be possible to predict the future influence of a paper. The algorithm of a comprehensive review paper has made it possible to obtain peer-reviewed data for the Citation count prediction (CCP) task. It is apparent in their comments that they focus on issues that are not directly related to the paper&#x0027;s main contribution. Reviewers may include reminders regarding formatting issues in their reviews. Several people may be reviewing articles at the same time, resulting in differing opinions. Therefore, when determining the impact of a paper, the coverage and divergence of review comments would both be considered (<xref ref-type="bibr" rid="R44">Li et al., 2019</xref>).</p>
<p>While citation count prediction is a relatively well-explored area, many existing studies did not comprehensively consider the multitude of factors influencing citations, particularly in specialized fields like Computer Science and Electrical Engineering (<xref ref-type="bibr" rid="R3">Aksnes et al., 2019</xref>; <xref ref-type="bibr" rid="R6">Baas et al., 2020</xref>; <xref ref-type="bibr" rid="R26">Enduri et al., 2022</xref>; <xref ref-type="bibr" rid="R29">Furman &#x0026; Teodoridis, 2020</xref>; He et al., n.d.; <xref ref-type="bibr" rid="R68">Zhang et al., 2025</xref>; <xref ref-type="bibr" rid="R38">Hutchins et al., 2016</xref>; <xref ref-type="bibr" rid="R51">Okagbue et al., 2020</xref>).</p>
<p>Citation patterns can vary across disciplines, so our work aimed at citation count predictions of academic papers for computer science and electrical engineering disciplines. Due to the fact that both Google Scholar and JCR are well-respected sources in the academic community and public availability of data, we suggested mainly gathering raw data from Google Scholar Profiles (GSPs) and public JCR annual reports. The GSP offers a wide range of citation information, such as Citation, h-index, and i10-index, while the JCR reports provide metrics like IF and journal rankings (Q1, Q2, Q3, Q4). Combining these two main data sources can enrich our feature set and help capture more nuanced aspects of citation behavior, making our model more accurate. Raw data like citation counts, h-index, or publication year are useful but limited in their predictive power. Citations are influenced by a variety of factors, and raw data may not fully capture non-linear relationships between these factors. Creating new features can improve the performance of the citation count prediction model by uncovering hidden patterns and relationships that raw data alone cannot reveal. As a result, our approach includes twenty three unique and novel features, offering a fundamental understanding of the factors that may impact citation count. The citation count was then estimated using a number of robust regression techniques, including SVR, Nu-SVR, Linear SVR, K-Nearest Neighbors (KNN), Decision Tree (DT), Bayesian Ridge, and SGD Regressor. We also assessed our method using several performance measures including the mean absolute percentage error (MAPE), Mean Square Error (MSE), Mean Absolute Error (MAE), and R-squared. The study also utilized PCA and t-SNE for dimensionality reduction and examined their impacts on CCP.</p>
<p>The rest of this paper is organized as follows: Section two discusses related works. In section three, our methodology is introduced. Simulation results are presented in section four, and section five is the conclusion.</p>
</sec>
<sec id="sec2">
<title>Related work</title>
<p>Citation count prediction (CCP) has been approached from multiple perspectives, including network-based modeling, textual analysis, trend forecasting, and deep learning. While existing methods offer valuable insights, they often focus on isolated aspects of the problem, leaving room for a more holistic and generalizable approach (<xref ref-type="bibr" rid="R9">Bai et al., 2025</xref>; <xref ref-type="bibr" rid="R35">He et al., 2025</xref>; <xref ref-type="bibr" rid="R68">Zhang et al. 2025</xref>, <xref ref-type="bibr" rid="R69">Zhu et al., 2025</xref>).</p>
<p>Early work by (<xref ref-type="bibr" rid="R53">Pobiedina &#x0026; Ichise, 2016</xref>) framed CCP as a link prediction problem, leveraging citation networks to model future citations. While this approach captures structural dependencies, it overlooks critical external factors such as author reputation and research competitiveness. Similarly, (<xref ref-type="bibr" rid="R63">Wang et al., 2023</xref>) introduced AGSTA-NET, a spatio-temporal fusion model that improves dynamic citation network analysis. However, its computational complexity and reliance on heterogeneous network data may limit scalability. These studies highlight the potential of network-based methods but also reveal their dependence on well-structured citation data, which may not always be available Recent work has explored the role of textual features in CCP. For example, (<xref ref-type="bibr" rid="R44">Li et al., 2019</xref>) incorporated peer review text into a neural network model, demonstrating that qualitative feedback can enhance prediction accuracy. However, their reliance on peer review data&#x2014;which varies widely across disciplines&#x2014;poses a generalizability challenge. Similarly, (<xref ref-type="bibr" rid="R58">Sohrabi &#x0026; Iraj, 2017</xref>) focused on keyword frequency, showing that strategic keyword use can improve visibility. Yet, their model neglects broader contextual factors, such as journal prestige or research impact. Baba et al. (2019) extended textual analysis to paper abstracts but did not account for external citation influences. These studies suggest that while textual features are valuable, they must be integrated with other predictive factors for robust performance.</p>
<p>Historical citation trends have also been used for prediction. (<xref ref-type="bibr" rid="R43">Li et al., 2015</xref>) demonstrated that temporal patterns improve out-of-time forecasts, but their model struggles with disruptive research that defies conventional citation trends. Meanwhile, (<xref ref-type="bibr" rid="R2">Abrishami &#x0026; Aliakbary, 2019</xref>) applied deep learning, achieving superior accuracy over traditional methods. However, their approach requires large datasets and risks overfitting, limiting applicability in low-data scenarios.</p>
<p>Existing CCP methods face key limitations: network models ignore external factors; textual approaches lackgeneralizability; trend-based methods fail with disruptive research; and deep learning requires excessive data. Most critically, no unified framework integrates multi-modal data while ensuring efficiency and interpretability (<xref ref-type="bibr" rid="R49">Nguyen et al., 2025</xref>; <xref ref-type="bibr" rid="R67">Zafar et al., 2024</xref>).</p>
</sec>
<sec id="sec3">
<title>Methodology</title>
<p>In this paper, we aim to predict the citation count of an article. An article&#x0027;s citation count is a suitable determinant for the impact assessment. For this purpose, we have created and developed four datasets that are called Author Information Database (AI DB), Journal Information Database (JI DB), Paper Information Database (PI DB), and finally Author &#x0026; Paper &#x0026; Journal Information Database (APJ DB). <xref ref-type="fig" rid="F1">Figure 1</xref> depicts the proposed algorithm. The initial preprocessing step normalizes the data. Then, we can predict the citation count of a published paper using several robust regression algorithms such as K-Nearest Neighbors (KNN), Decision Trees (DT), and Support Vector Regression (SVR), and Bayesian Regression methods, along with some of the most important performance metrics in regression problems such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE), and the Coefficient of Determination (R<sup>2</sup>) for evaluation of the achieved results. To improve interpretability, we also assessed the influence of dimensionality reduction techniques (e.g., t-SNE and PCA) on the results and discussed their comparative effects.</p>
<sec id="sec3_1">
<title>From data to information: Creating some comprehensive database</title>
<p>The data collection process was conducted in two phases. In the first phase, at the end of December 2022, all input features were gathered. The second phase, at the end of 2023, involved collecting the output data (predictions). Data sources included Google Scholar profiles, the Journal Citation Reports (JCR) annual report, and SCImago (to obtain the SJR metric). The dataset only focuses on papers in Computer Science (CS) and Electrical Engineering (EE). Additionally, we developed a specialized dataset called AoI2WoS (<xref ref-type="bibr" rid="R7">Bahaghighat et al., 2024</xref>; <xref ref-type="bibr" rid="R8">Jahani rad et al., 2024</xref>), which establishes a connection between Areas of Interest (AoI) in GSP and Web of Science (WoS) scientific fields. This dataset was used to evaluate whether a given GSP is related to Computer Science (CS) or Electrical Engineering (EE), allowing us to filter out irrelevant profiles and focus on approximately 2,000 papers from randomly selected authors in these fields.</p>
<p>To create the AI DB, the author&#x0027;s scholarly background is examined found on ten suggested attributes. AI DB includes the information of the authors such as <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie1.jpg"/> and <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie2.jpg"/> which can be seen with more details in <xref ref-type="table" rid="T1">Table 1</xref>. The second dataset is the JI DB. According to <xref ref-type="table" rid="T2">Table 2</xref>, we have gathered numerous critical information such as Impact Factor (IF), h-index, SJR, Q1, Q2, Q3, Q4, and Q (Best quartile among all disciplines) in the JI DB. The third dataset is called the PI DB, in which some features of published papers were defined. According to <xref ref-type="table" rid="T3">Table 3</xref>, some attributes such as <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie3.jpg"/> (availability of the paper from publication year to current year), <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie4.jpg"/> (citation of the manuscript in publication year), <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie5.jpg"/> (citation of the manuscript in the last year), and <italic>N</italic><sup><italic>A</italic></sup> (Number of Authors). Finally, the APJ DB has been constructed based on all information available in all three mentioned datasets (including all twenty-three defined features).</p>
<fig id="F1">
<label>Figure 1.</label>
<caption><p>An illustration of the proposed citation count prediction algorithm (CCP).</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-fig1.jpg"><alt-text>none</alt-text></graphic>
</fig>
<table-wrap id="T1">
<label>Table 1.</label>
<caption><p>Proposed attributes in author information dataset (AI DB).</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Row</th>
<th align="center" valign="top">Attributes</th>
<th align="center" valign="top">Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">1</td>
<td align="center" valign="top"><italic>N</italic><sub><italic>p</italic></sub></td>
<td align="left" valign="top">Total number of publications for the author</td>
</tr>
<tr>
<td align="center" valign="top">2</td>
<td align="center" valign="top"><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie6.jpg"/></td>
<td align="left" valign="top">Total number of publications without citations for the author</td>
</tr>
<tr>
<td align="center" valign="top">3</td>
<td align="center" valign="top"><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie7.jpg"/></td>
<td align="left" valign="top">The most citation for an author: max (<italic>C</italic><sub><italic>i</italic></sub>),for <italic>i</italic> = 1 to <italic>N</italic><sub><italic>p</italic></sub></td>
</tr>
<tr>
<td align="center" valign="top">4</td>
<td align="center" valign="top"><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie8.jpg"/></td>
<td align="left" valign="top">Sum of top ten citation for an author: <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie9.jpg"/></td>
</tr>
<tr>
<td align="center" valign="top">5</td>
<td align="center" valign="top"><italic>citations</italic></td>
<td align="left" valign="top">Total citations for an author</td>
</tr>
<tr>
<td align="center" valign="top">6</td>
<td align="center" valign="top"><italic>h</italic> &#x2013; <italic>index<sup>Author</sup></italic></td>
<td align="left" valign="top">h-index for an author</td>
</tr>
<tr>
<td align="center" valign="top">7</td>
<td align="center" valign="top"><italic>i</italic>10 &#x2013; <italic>index</italic></td>
<td align="left" valign="top">i10-index for an author</td>
</tr>
<tr>
<td align="center" valign="top">8</td>
<td align="center" valign="top"><italic>Y<sub>FP</sub></italic></td>
<td align="left" valign="top">First publication year: Author&#x2019;s first publication (in year)</td>
</tr>
<tr>
<td align="center" valign="top">9</td>
<td align="center" valign="top"><italic>Y<sub>LP</sub></italic></td>
<td align="left" valign="top">Last publication year: Author&#x2019;s last publication ( in year)</td>
</tr>
<tr>
<td align="center" valign="top">10</td>
<td align="center" valign="top"><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie10.jpg"/></td>
<td align="center" valign="top"><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie15.jpg"/></td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="fig" rid="F2">Figure 2</xref> shows an example of an author&#x0027;s Google Scholar profile. In addition, some characteristics of the proposed dataset can be seen in <xref ref-type="fig" rid="F3">Figure 3</xref>, and <xref ref-type="fig" rid="F4">Figure 4</xref>.</p>
<table-wrap id="T2">
<label>Table 2.</label>
<caption><p>Proposed attributes in journal information dataset (JI DB).</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Row</th>
<th align="center" valign="top">Attributes</th>
<th align="center" valign="top">Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">1</td>
<td align="center" valign="top">IF</td>
<td align="center" valign="top">The Impact Factor of a journal</td>
</tr>
<tr>
<td align="center" valign="top">2</td>
<td align="center" valign="top">Q1</td>
<td align="center" valign="top">Top 25% of journals (if a journal is Q1 then Q1=1, Q2=Q3=Q4=0)</td>
</tr>
<tr>
<td align="center" valign="top">3</td>
<td align="center" valign="top">Q2</td>
<td align="center" valign="top">25% to 50% of journals (if a journal is Q2 then Q2=1, Q1=Q3=Q4=0)</td>
</tr>
<tr>
<td align="center" valign="top">4</td>
<td align="center" valign="top">Q3</td>
<td align="center" valign="top">50% to 75% of journals (if a journal is Q3 then Q3=1, Q1=Q2=Q4=0)</td>
</tr>
<tr>
<td align="center" valign="top">5</td>
<td align="center" valign="top">Q4</td>
<td align="center" valign="top">75% to 100% of journals (if a journal is Q4 then Q4=1, Q1=Q2=Q3=0)</td>
</tr>
<tr>
<td align="center" valign="top">6</td>
<td align="center" valign="top">Q</td>
<td align="center" valign="top">Q is derived from Q1 to Q4. It is equal to 1.00 for <italic>Q</italic><sub>1</sub>; 0.75 for <italic>Q</italic><sub>2</sub>; 0.50 for <italic>Q</italic><sub>3</sub>; 0.25 for <italic>Q</italic><sub>4</sub> according to the best quartile among all disciplines</td>
</tr>
<tr>
<td align="center" valign="top">7</td>
<td align="center" valign="top">SJR</td>
<td align="center" valign="top">The SCImago Journal Rank</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T3">
<label>Table 3.</label>
<caption><p>Proposed attributes in paper information dataset (PI DB).</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Row</th>
<th align="center" valign="top">Attributes</th>
<th align="center" valign="top">Comments</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">1</td>
<td align="center" valign="top"><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie11.jpg"/></td>
<td align="center" valign="top">It shows how many years the paper is available online</td>
</tr>
<tr>
<td align="center" valign="top">2</td>
<td align="center" valign="top"><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie12.jpg"/></td>
<td align="center" valign="top">The total citations of the paper in the published year</td>
</tr>
<tr>
<td align="center" valign="top">3</td>
<td align="center" valign="top"><inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie13.jpg"/></td>
<td align="center" valign="top">The total citations of the paper in the last year</td>
</tr>
<tr>
<td align="center" valign="top">4</td>
<td align="center" valign="top"><italic>N</italic><sup><italic>A</italic></sup></td>
<td align="center" valign="top">Number of authors in a paper</td>
</tr>
<tr>
<td align="center" valign="top">5</td>
<td align="center" valign="top"><italic>O</italic><italic>A</italic><italic>A</italic></td>
<td align="center" valign="top">Open Access Article</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="F2">
<label>Figure 2.</label>
<caption><p>The author information in its Google Scholar profile.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-fig2.jpg"><alt-text>none</alt-text></graphic>
</fig>
<fig id="F3">
<label>Figure 3.</label>
<caption><p>The histograms of some features in the JI DB dataset</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-fig3.jpg"><alt-text>none</alt-text></graphic>
</fig>
<fig id="F4">
<label>Figure 4.</label>
<caption><p>The histograms of some features in the AI DB dataset.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-fig4.jpg"><alt-text>none</alt-text></graphic>
</fig>
<sec id="sec3_1_1">
<title>Regression models</title>
<p>In statistical approaches, Regression Analysis (RA) is considered a set of statistical procedures to estimate the relationships between an output (dependent variable) and one or more inputs (independent variables). As a statistical method, regression describes the relationship between two or more inputs and outputs (<xref ref-type="bibr" rid="R28">Fox, 2015</xref>; <xref ref-type="bibr" rid="R54">Rostami et al., 2021</xref>). In this paper, we deploy several strong regression models to predict the number of citations of a paper.</p>
</sec>
<sec id="sec3_1_2">
<title>SVR and linearSVR</title>
<p>Machine learning (ML) is a sub-field of Artificial Intelligence (AI) that enables systems to learn automatically, and as opposed to being explicitly programmed, they can improve their decision-making abilities by acquiring experience (<xref ref-type="bibr" rid="R22">Chen et al., 2024</xref>). Support Vector Regression (SVR) (<xref ref-type="bibr" rid="R36">Hearst et al., 1998</xref>; <xref ref-type="bibr" rid="R57">Smola &#x0026; Sch&#x00F6;lkopf, 2004</xref>) distinguishes itself by employing the Structural Risk Minimization (SRM) principle, a foundation rooted in statistical learning theory. SRM&#x0027;s core objective is to craft a hypothesis (h) that minimizes the true error when applied to unseen and randomly sampled testing data. Notably, SVR excels in handling outliers, a critical advantage in practical applications. In general, SVR estimation functions have the following form (<xref ref-type="bibr" rid="R10">Basak et al., 2007</xref>; <xref ref-type="bibr" rid="R57">Smola &#x0026; Sch&#x00F6;lkopf, 2004</xref>):</p>
<fig id="E1">
<label>(1)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f1.jpg"><alt-text>none</alt-text></graphic>
</fig>
<p>Where <italic>b</italic> &#x2282; <italic>R</italic> and <italic>&#x03D5;</italic> indicate a nonlinear conversion from <italic>R</italic><sup><italic>n</italic></sup> (the real coordinate space or real coordinate n-space, of dimension n) to high-dimensional space, the aim is to indicate the value of w. In order to compute the value of x, it is necessary to minimize the regression risk. Minimizing the regression risk determines the values of x.</p>
<fig id="E2">
<label>(2)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f2.jpg"><alt-text>none</alt-text></graphic>
</fig>
<p>The cost function is <italic>&#x0393;</italic>(.). In Support Vector Regression (SVR), the cost function aims to minimize the discrepancy between the model&#x0027;s output and the actual values corresponding to the training data. C is a constant value, and vector w can be calculated as below:</p>
<fig id="E3">
<label>(3)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f3.jpg"><alt-text>none</alt-text></graphic>
</fig>
<p>Using substituting <xref ref-type="fig" rid="E3">Eq. (3)</xref> into <xref ref-type="fig" rid="E1">Eq. (1)</xref>, the general equation can be revised as the <xref ref-type="fig" rid="E4">Eq. (4)</xref>:</p>
<fig id="E4">
<label>(4)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f4.jpg"><alt-text>none</alt-text></graphic>
</fig>
<p>In <xref ref-type="fig" rid="E4">Eq. (4)</xref>, the function was replaced with dot product <italic>k</italic>(<italic>x</italic><sub><italic>i</italic></sub>, <italic>x</italic>), that function <italic>k</italic>(<italic>x</italic><sub><italic>i</italic></sub>, <italic>x</italic>) familiar as the kernel function. In a high-dimensional feature space, evaluate the dot function based on low-dimensional input data without understanding how the transformation is performed. There is a condition of Mercer that all kernel functions must satisfy, which is equivalent to the inner product of some feature space. For regression, the Radial Basis Function (RBF) is used as the standard kernel. RBF kernels are presented in the following equation.</p>
<fig id="E5">
<label>(5)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f5.jpg"><alt-text>none</alt-text></graphic>
</fig>
<p>A few standard kernels in SVR are Linear (<italic>x</italic> &#x00D7; <italic>y</italic>), Polynomial ([(<italic>x</italic> &#x00D7; <italic>x</italic><sub><italic>i</italic></sub>) + 1]<sup><italic>d</italic></sup>), Radial Basis Function (exp {&#x2013;<italic>&#x03B3;</italic> &#x2016; <italic>x</italic> &#x2013; <italic>x</italic><sub><italic>i</italic></sub> &#x2016; <sup>2</sup>}) are shown in <xref ref-type="table" rid="T4">Table 4</xref>.</p>
<table-wrap id="T4">
<label>Table 4.</label>
<caption><p>Common kernel function (<xref ref-type="bibr" rid="R10">Basak et al., 2007</xref>; <xref ref-type="bibr" rid="R57">Smola &#x0026; Sch&#x00F6;lkopf, 2004</xref>).</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Kernel</th>
<th align="center" valign="top">Function</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">Linear</td>
<td align="center" valign="top">Simple, faster, and lower accuracy for nonlinear data</td>
</tr>
<tr>
<td align="center" valign="top">Polynomial</td>
<td align="center" valign="top">Fast and more flexible</td>
</tr>
<tr>
<td align="center" valign="top">RBF</td>
<td align="center" valign="top">More flexible and higher accuracy</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="sec3_1_3">
<title>Nu-SVR</title>
<p>In Nu-SVR, the inclusion of the epsilon parameter allows for the control of the number of support vectors retained in the solution. The introduction of the nu parameter further provides a mechanism to manage support vectors by specifying the proportion of these vectors relative to the total number of samples in the dataset. Given a sample pair of the dataset, including input-output {(<italic>x</italic><sub>1</sub>, <italic>y</italic><sub>1</sub>), (<italic>x</italic><sub>2</sub>, <italic>y</italic><sub>2</sub>), &#x2026;, (<italic>x</italic><sub><italic>n</italic></sub>, <italic>y</italic><sub><italic>n</italic></sub>)}, the Nu-SVR method is used to approximate a nonlinear relationship. It is used to minimize overfitting by getting as close as possible to the target function (<xref ref-type="bibr" rid="R12">Bhatt et al., 2012</xref>; <xref ref-type="bibr" rid="R57">Smola &#x0026; Sch&#x00F6;lkopf, 2004</xref>). The types of kernels that could be used are a polynomial function, Radial Basis Function (RBF), a sigmoid function, and a linear function.</p>
</sec>
<sec id="sec3_1_4">
<title>KNN</title>
<p>The k-Nearest Neighbors (kNN) method (<xref ref-type="bibr" rid="R23">Cover &#x0026; Hart, 1967</xref>)is widely adopted in data mining and statistics for its simplicity and notable classification performance (<xref ref-type="bibr" rid="R24">Cunningham &#x0026; Delany, 2021</xref>; <xref ref-type="bibr" rid="R34">Halder et al., 2024</xref>). Despite its ease of implementation, the kNN method has demonstrated significant classification prowess and has been shown to approximate the error rate of Bayes optimization under mild conditions. Its versatility extends to various applications, including regression, classification, and missing value imputation. However, the efficacy of the kNN method is contingent upon factors like the choice of the k value and the selection of distance measures. Addressing these considerations has led to the development of numerous machine learning techniques aimed at optimizing the performance of the kNN method. In order to solve classification problems, KNN is a highly beneficial approach (<xref ref-type="bibr" rid="R34">Halder et al., 2024</xref>; <xref ref-type="bibr" rid="R55">Sabry, 2023</xref>; <xref ref-type="bibr" rid="R59">Song et al., 2017</xref>). The unique property was calculated where no explicit step is required in the training phase (other than the capacity of the training database). Based on the data in the testing dataset, kNN is used to evaluate the answer to <italic>x</italic><sub><italic>t</italic></sub> as a weighted mean of the responses of the k nearest training points <italic>x</italic><sub>(1)</sub>,<italic>x</italic><sub>(2)</sub>,&#x2026;,<italic>x</italic><sub>(<italic>k</italic>)</sub> the neighborhood of <italic>x</italic><sub><italic>t</italic></sub>. This function can calculate how near each training data point <italic>x</italic><sub><italic>i</italic></sub> is to the testing data point <italic>x</italic><sub><italic>t</italic></sub> using the weighted Euclidean distance, described as (<xref ref-type="bibr" rid="R34">Halder et al., 2024</xref>; <xref ref-type="bibr" rid="R55">Sabry, 2023</xref>):</p>
<fig id="E6">
<label>(6)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f6.jpg"><alt-text>none</alt-text></graphic>
</fig>
<p>Then, we apply the kernel of the regression and calculate the following approximation of the response of <italic>x</italic><sub><italic>t</italic></sub> (<xref ref-type="bibr" rid="R65">Yao &#x0026; Ruzzo, 2006</xref>):</p>
<fig id="E7">
<label>(7)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f7.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
<sec id="sec3_1_5">
<title>Decision tree</title>
<p>Decision rees are a widely used ML method for both classification and regression problems. It consists of a tree structure with internal nodes containing tests on features, branches representing the output of the tests, and leaf nodes containing the predicted output values when the path from the root to the leaf node has been established. Then, the data is split recursively on the feature that would result in the best split, often according to criteria like mean squared error or variance. Decision Trees can handle continuous and categorical variables, and in the case of continuous variables, the algorithm seeks a threshold in order to perform the split. Their simplicity and interpretability, as well as a potential to highlight relevant features, make them widely utilized for feature selection (<xref ref-type="bibr" rid="R13">Bishop, 2006</xref>).</p>
</sec>
<sec id="sec3_1_6">
<title>Bayesian Regression</title>
<p>Bayesian Regression uses Bayesian statistics to estimate the parameters of the regression model. It uses some prior beliefs regarding the parameters, which are modified using observed data, to form posterior distribution through the Bayes Theorem. Bayesian Regression, on the other hand, provides a posterior distribution over parameters, which is a more robust and flexible approach than classical regression methods. It is however, computationally expensive and requires the proper choice of prior distributions, which would add some subjective influence. The Bayesian Regression could give you better estimates, but it is not without its challenges. However, it is also widely used in fields such as finance, engineering, social sciences, etc. (<xref ref-type="bibr" rid="R13">Bishop, 2006</xref>; <xref ref-type="bibr" rid="R52">Pedregosa et al., 2011</xref>).</p>
</sec>
<sec id="sec3_1_7">
<title>SGDRegressor</title>
<p>It is basically a linear regression in scikit-learn with a Stochastic Gradient Descent (SGD) optimiser to learn the optimal set of parameters for our model. In contrast to the previous update, traditional gradient descent updates weights after the whole dataset is processed, while SGD performs it for every single sample, which is therefore more computationally effective, particularly with large datasets. This model is perfect when your features are sparse and your data is high-dimensional (<xref ref-type="bibr" rid="R52">Pedregosa et al., 2011</xref>).</p>
</sec>
<sec id="sec3_1_8">
<title>Dimension reduction</title>
<p>Essentially, Dimension reduction (DR) is one of those very prominent techniques you would often use if working on machine learning or data analysis projects, and it basically simplifies your data sets by reducing the number of variables to only those that are significant. PCA (Principal Component Analysis) is probably the most famous DR method, which converts the data into a new set of variables (principal components) that are all orthogonal (no correlation). PCA achieves dimensionality reduction with minimum data loss by choosing the best components. However, the effectiveness of PCA is closely related to the data/model you use. This can eliminate noise, too, but this sacrifices interpretability and might go with it, model performance. Hence comparison should be made between models with and without PCA to check the effect (<xref ref-type="bibr" rid="R13">Bishop, 2006</xref>; <xref ref-type="bibr" rid="R52">Pedregosa et al., 2011</xref>). In our analysis, we applied both t-SNE (t-Distributed Stochastic Neighbor Embedding), and PCA for dimensionality reduction. t-SNE is a nonlinear dimensionality reduction technique widely used for visualizing high-dimensional data. Unlike linear methods such as PCA, it focuses on preserving local similarities between data points by modeling pairwise probabilities in both the original and reduced spaces. It employs a t-distribution in the low-dimensional space to mitigate crowding effects, making it particularly effective for revealing clusters or manifolds in complex datasets (e.g., images or biological data). However, t-SNE&#x2019;s results can be sensitive to hyperparameters (e.g., perplexity) and computationally intensive for large datasets (<xref ref-type="bibr" rid="R56">Skrodzki et al., 2024</xref>; <xref ref-type="bibr" rid="R39">Jung et al., 2024</xref>).</p>
</sec>
</sec>
<sec id="sec3_2">
<title>Performance metrics</title>
<p>Here we discuss methods for calculating errors in the proposed CCP model. In our research, we have used a variety of error measurement methods, such as Mean Absolute Error (MAE), Mean Square Error (MSE), Mean Absolute Percentage Error (MAPE), and R-squared (R2). An important loss function in regression analysis is the Mean Square Error (MSE) (<xref ref-type="bibr" rid="R20">Cameron &#x0026; Windmeijer, 1997</xref>; <xref ref-type="bibr" rid="R48">Murphy, 1988</xref> ;<xref ref-type="bibr" rid="R62">Wallisch et al., 2022</xref>; <xref ref-type="bibr" rid="R64">Willmott &#x0026; Matsuura, 2005</xref>). The mean squared distance between the predicted and actual values is calculated using this loss function. Below is a description of how it is calculated:</p>
<fig id="E8">
<label>(8)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f8.jpg"><alt-text>none</alt-text></graphic>
</fig>
<p>Other errors that have been used in this paper are MAE, MAPE, and R2:</p>
<fig id="E9">
<label>(9)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f9.jpg"><alt-text>none</alt-text></graphic>
</fig>
<fig id="E10">
<label>(10)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f10.jpg"><alt-text>none</alt-text></graphic>
</fig>
<fig id="E11">
<label>(11)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f11.jpg"><alt-text>none</alt-text></graphic>
</fig>
<fig id="E12">
<label>(12)</label>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-f12.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
</sec>
<sec id="sec4">
<title>Results</title>
<p>In this paper, seven distinct regression methods, including SVR, Nu-SVR, Linear SVR, kNN, Decision Tree Regression, Bayesian Ridge, and SGD Regression, were trained to find the best Citation count prediction (CCP) solution. We also used several error methods for estimating the errors, such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Mean Absolute Percentage Error (MAPE), and the Coefficient of Determination (R<sup>2</sup>), to measure the algorithm performance for the Author information database (AI DB), journal information database (JI DB), paper information database (PI DB), and finally author &#x0026; paper &#x0026; journal information database (APJ DB), separately. We also tested APJ DB and AI DB in two scenarios, with dimension reduction and without it. All of our simulations have been implemented using Python software and were run on a Lenovo device with a 2.5 GHz Intel Core i7 processor and 16 GB DDR4 RAM. The results obtained from the different methods are presented from <xref ref-type="table" rid="T5">Table 5</xref> to Table 17.</p>
<p>In our experiment, Grid search that is a hyperparameter tuning technique was used to systematically work through multiple combinations of parameter values to determine which combination yields the best model performance.</p>
<table-wrap id="T5">
<label>Table 5.</label>
<caption><p>Primary grid search results for suggested models.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Model</th>
<th align="center" valign="top">Best Parameters</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">SVR</td>
<td align="center" valign="top">{&#x0027;C&#x0027;: 29.108, &#x0027;gamma&#x0027;: 0.00156, &#x0027;kernel&#x0027;: &#x0027;rbf&#x0027;}</td>
</tr>
<tr>
<td align="center" valign="top">Nu-SVR</td>
<td align="center" valign="top">{&#x0027;C&#x0027;: 122.519, &#x0027;gamma&#x0027;: 0.000535, &#x0027;kernel&#x0027;: &#x0027;rbf&#x0027;, &#x0027;nu&#x0027;: 1}</td>
</tr>
<tr>
<td align="center" valign="top">LinearSVR</td>
<td align="center" valign="top">{&#x0027;C&#x0027;: 0.01672, &#x0027;epsilon&#x0027;: 0.01682, &#x0027;loss&#x0027;: &#x0027;epsilon_insensitive&#x0027;}</td>
</tr>
<tr>
<td align="center" valign="top">KNeighbors Regressor</td>
<td align="center" valign="top">{&#x0027;n_neighbors&#x0027;: 1}</td>
</tr>
<tr>
<td align="center" valign="top">Decision Tree Regressor</td>
<td align="center" valign="top">{&#x0027;criterion&#x0027;: &#x0027;squared_error&#x0027;, &#x0027;max_depth&#x0027;: 24.42, &#x0027;splitter&#x0027;: &#x0027;random&#x0027;}</td>
</tr>
<tr>
<td align="center" valign="top">Bayesian Ridge</td>
<td align="center" valign="top">{&#x0027;alpha_1&#x0027;: 3.59e6, &#x0027;alpha_2&#x0027;: 10.0, &#x0027;lambda_1&#x0027;: 1e11, &#x0027;lambda_2&#x0027;: 21544.35, &#x0027;n_iter&#x0027;: 1}</td>
</tr>
<tr>
<td align="center" valign="top">SGD Regressor</td>
<td align="center" valign="top">{&#x0027;alpha&#x0027;: 0.07508, &#x0027;epsilon&#x0027;: 12.07867, &#x0027;penalty&#x0027;: &#x0027;elasticnet&#x0027;}</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In <xref ref-type="table" rid="T5">Table 5</xref>, max_depth=24.42 suggests a deep tree for DecisionTree Regressor, which may be harder to interpret than a shallower one. So, we could constrain max_depth to a smaller value about 5 during grid search. Furthermore, KNeighborsRegressor with n_neighbors=1 may highly prone to overfitting and lacks interpretability. As a result, a higher value at 5 was selected. LinearSVR and SGDRegressor use regularization (C, alpha). A low C=0.01672 (LinearSVR) suggests strong L2 regularization, which can simplify the model by shrinking coefficients. In addition, SGDRegressor uses penalty=&#x0027;elasticnet&#x0027;, which can promote sparsity, making feature importance clearer. For Kernel Choices in SVM Models Both SVR and Nu-SVR use the &#x0027;rbf&#x0027; kernel, which is inherently less interpretable than a linear kernel. As a result, for considering interpretability, we could restrict the search to kernel=&#x0027;linear&#x0027;. For Bayesian Ridge&#x2019;s Complexity, the high values for lambda_1=1e11 and alpha_1=3.59e6 suggest strong prior assumptions, but the model remains interpretable since it&#x2019;s a linear regression variant.</p>
<p><xref ref-type="table" rid="T6">Table 6</xref> presents the performance of various regression models on the APJ dataset without any dimensionality reduction. The best-performing models are SVR and Decision Tree Regression, both achieving an MAE of 0.16, MSE of 0.50, and R<sup>2</sup> of 0.48, indicating strong predictive accuracy and stability. Nu-SVR also performs well with the lowest MAPE (0.36), suggesting better relative error control. In contrast, K-Neighbors Regression and Bayesian Ridge show slightly higher errors, while Linear SVR and SGDRegression exhibit the weakest R<sup>2</sup> scores (0.35 and 0.43, respectively). Overall, non-linear models (SVR, Decision Tree) outperform linear ones, likely due to their ability to capture complex relationships in the data.</p>
<table-wrap id="T6">
<label>Table 6.</label>
<caption><p>Regression results for the APJ DB and without dimension reduction.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Regression models</th>
<th align="center" valign="top">MAE</th>
<th align="center" valign="top">MSE</th>
<th align="center" valign="top">MAPE</th>
<th align="center" valign="top">R2</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">SVR</td>
<td align="center" valign="top">0.17</td>
<td align="center" valign="top">0.50</td>
<td align="center" valign="top">0.50</td>
<td align="center" valign="top">0.48</td>
</tr>
<tr>
<td align="center" valign="top">Nu-SVR</td>
<td align="center" valign="top">0.16</td>
<td align="center" valign="top">0.56</td>
<td align="center" valign="top">0.36</td>
<td align="center" valign="top">0.43</td>
</tr>
<tr>
<td align="center" valign="top">Linear SVR</td>
<td align="center" valign="top">0.17</td>
<td align="center" valign="top">0.63</td>
<td align="center" valign="top">0.43</td>
<td align="center" valign="top">0.35</td>
</tr>
<tr>
<td align="center" valign="top">K-Neighbors Regression</td>
<td align="center" valign="top">0.20</td>
<td align="center" valign="top">0.55</td>
<td align="center" valign="top">0.64</td>
<td align="center" valign="top">0.43</td>
</tr>
<tr>
<td align="center" valign="top">Decision tree regression</td>
<td align="center" valign="top">0.16</td>
<td align="center" valign="top">0.50</td>
<td align="center" valign="top">0.52</td>
<td align="center" valign="top">0.48</td>
</tr>
<tr>
<td align="center" valign="top">Bayesian Ridge</td>
<td align="center" valign="top">0.19</td>
<td align="center" valign="top">0.54</td>
<td align="center" valign="top">0.61</td>
<td align="center" valign="top">0.44</td>
</tr>
<tr>
<td align="center" valign="top">SGDRegression</td>
<td align="center" valign="top">0.19</td>
<td align="center" valign="top">0.55</td>
<td align="center" valign="top">0.57</td>
<td align="center" valign="top">0.43</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>According <xref ref-type="table" rid="T7">Table 7</xref>, applying PCA-based dimension reduction generally worsened model performance, except for K-Neighbors Regression, which saw slight improvements (e.g., MSE dropping from 0.55 to 0.53). The degradation is most severe for Decision Tree Regression, where MSE nearly doubled (0.50 &#x2192; 0.92) and R<sup>2</sup> collapsed to 0.05, because PCA&#x2019;s linear transformations disrupt the tree-based feature splits. Similarly, SVR and Nu-SVR suffered, possibly due to lost non-linear feature interactions. The improvement in K-Neighbors suggests PCA may have removed noise, aiding its distance-based computations. The overall decline implies that PCA either discarded informative features or failed to preserve structures critical for regression, highlighting that blind dimensionality reduction can harm performance unless the model benefits from noise removal (like kNN).</p>
<table-wrap id="T7">
<label>Table 7.</label>
<caption><p>Regression results for the APJ DB and with dimension reduction based on PCA.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Regression models</th>
<th align="center" valign="top">MAE</th>
<th align="center" valign="top">MSE</th>
<th align="center" valign="top">MAPE</th>
<th align="center" valign="top">R2</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">SVR</td>
<td align="center" valign="top">0.20</td>
<td align="center" valign="top">0.55</td>
<td align="center" valign="top">0.67</td>
<td align="center" valign="top">0.44</td>
</tr>
<tr>
<td align="center" valign="top">Nu-SVR</td>
<td align="center" valign="top">0.19</td>
<td align="center" valign="top">0.64</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">0.34</td>
</tr>
<tr>
<td align="center" valign="top">Linear SVR</td>
<td align="center" valign="top">0.21</td>
<td align="center" valign="top">0.57</td>
<td align="center" valign="top">0.82</td>
<td align="center" valign="top">0.41</td>
</tr>
<tr>
<td align="center" valign="top">K-Neighbors Regression</td>
<td align="center" valign="top">0.18</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">0.60</td>
<td align="center" valign="top">0.45</td>
</tr>
<tr>
<td align="center" valign="top">Decision tree regression</td>
<td align="center" valign="top">0.26</td>
<td align="center" valign="top">0.92</td>
<td align="center" valign="top">1.12</td>
<td align="center" valign="top">0.05</td>
</tr>
<tr>
<td align="center" valign="top">Bayesian Ridge</td>
<td align="center" valign="top">0.20</td>
<td align="center" valign="top">0.56</td>
<td align="center" valign="top">0.72</td>
<td align="center" valign="top">0.42</td>
</tr>
<tr>
<td align="center" valign="top">SGDRegression</td>
<td align="center" valign="top">0.21</td>
<td align="center" valign="top">0.55</td>
<td align="center" valign="top">0.76</td>
<td align="center" valign="top">0.42</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Comparing both tables show that PCA does not universally improve model performance and even degrades results for certain algorithms. Given this outcome, this study further evaluates t-SNE (t-Distributed Stochastic Neighbor Embedding) as an alternative and compares results with PCA. The <xref ref-type="table" rid="T8">Table 8</xref> presents MAE and MSE values for six regression models, with performance measured after applying PCA and t-SNE for dimensionality reduction. While t-SNE excels at visualization, PCA remains almost superior for regression tasks due to its stability, interpretability, and preservation of globally meaningful features. The results shows that t-SNE cannot outperform PCA in regression settings when it was applied to APJ DB.</p>
<table-wrap id="T8">
<label>Table 8.</label>
<caption><p>Regression results for APJ DB: PCA vs. t-SNE dimensionality reduction.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Regression Model</th>
<th align="center" valign="top">Metric</th>
<th align="center" valign="top">PCA</th>
<th align="center" valign="top">t-SNE</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">SVR</td>
<td align="center" valign="top">MAE</td>
<td align="center" valign="top">0.20</td>
<td align="center" valign="top">0.26</td>
</tr>
<tr>
<td align="center" valign="top">SVR</td>
<td align="center" valign="top">MSE</td>
<td align="center" valign="top">0.55</td>
<td align="center" valign="top">0.85</td>
</tr>
<tr>
<td align="center" valign="top">Nu-SVR</td>
<td align="center" valign="top">MAE</td>
<td align="center" valign="top">0.19</td>
<td align="center" valign="top">0.23</td>
</tr>
<tr>
<td align="center" valign="top">Nu-SVR</td>
<td align="center" valign="top">MSE</td>
<td align="center" valign="top">0.64</td>
<td align="center" valign="top">0.94</td>
</tr>
<tr>
<td align="center" valign="top">Linear SVR</td>
<td align="center" valign="top">MAE</td>
<td align="center" valign="top">0.21</td>
<td align="center" valign="top">0.28</td>
</tr>
<tr>
<td align="center" valign="top">Linear SVR</td>
<td align="center" valign="top">MSE</td>
<td align="center" valign="top">0.57</td>
<td align="center" valign="top">1.10</td>
</tr>
<tr>
<td align="center" valign="top">K-Neighbors</td>
<td align="center" valign="top">MAE</td>
<td align="center" valign="top">0.18</td>
<td align="center" valign="top">0.33</td>
</tr>
<tr>
<td align="center" valign="top">K-Neighbors</td>
<td align="center" valign="top">MSE</td>
<td align="center" valign="top">0.53</td>
<td align="center" valign="top">0.52</td>
</tr>
<tr>
<td align="center" valign="top">Decision Tree</td>
<td align="center" valign="top">MAE</td>
<td align="center" valign="top">0.26</td>
<td align="center" valign="top">0.25</td>
</tr>
<tr>
<td align="center" valign="top">Decision Tree</td>
<td align="center" valign="top">MSE</td>
<td align="center" valign="top">0.92</td>
<td align="center" valign="top">1.15</td>
</tr>
<tr>
<td align="center" valign="top">Bayesian Ridge</td>
<td align="center" valign="top">MAE</td>
<td align="center" valign="top">0.20</td>
<td align="center" valign="top">0.32</td>
</tr>
<tr>
<td align="center" valign="top">Bayesian Ridge</td>
<td align="center" valign="top">MSE</td>
<td align="center" valign="top">0.56</td>
<td align="center" valign="top">0.72</td>
</tr>
<tr>
<td align="center" valign="top">SGDRegression</td>
<td align="center" valign="top">MAE</td>
<td align="center" valign="top">0.21</td>
<td align="center" valign="top">0.25</td>
</tr>
<tr>
<td align="center" valign="top">SGDRegression</td>
<td align="center" valign="top">MSE</td>
<td align="center" valign="top">0.55</td>
<td align="center" valign="top">0.99</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="table" rid="T9">Table 9</xref> presents the regression results for the AI DB without dimension reduction. The Decision Tree model performs best in terms of MSE (0.86) and R<sup>2</sup> (0.12), while SVR follows closely with an MAE of 0.24 and R<sup>2</sup> of 0.1. Nu-SVR and Linear SVR achieve the lowest MAE (0.22), though Linear SVR has a slightly higher MSE (0.95). K-Neighbors Regression performs the worst, with an MSE of 1.08 and a negative R<sup>2</sup> (-0.11), indicating poor fit. Bayesian Ridge and SGDRegression show moderate performance, with MAPE values ranging from 0.65 to 0.96.</p>
<table-wrap id="T9">
<label>Table 9.</label>
<caption><p>Regression results for the AI DB and without dimension reduction.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Regression models</th>
<th align="center" valign="top">MAE</th>
<th align="center" valign="top">MSE</th>
<th align="center" valign="top">MAPE</th>
<th align="center" valign="top">R2</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">SVR</td>
<td align="center" valign="top">0.24</td>
<td align="center" valign="top">0.88</td>
<td align="center" valign="top">0.63</td>
<td align="center" valign="top">0.1</td>
</tr>
<tr>
<td align="center" valign="top">Nu-SVR</td>
<td align="center" valign="top">0.22</td>
<td align="center" valign="top">0.93</td>
<td align="center" valign="top">0.66</td>
<td align="center" valign="top">0.05</td>
</tr>
<tr>
<td align="center" valign="top">Linear SVR</td>
<td align="center" valign="top">0.22</td>
<td align="center" valign="top">0.95</td>
<td align="center" valign="top">0.61</td>
<td align="center" valign="top">0.03</td>
</tr>
<tr>
<td align="center" valign="top">K-Neighbors Regression</td>
<td align="center" valign="top">0.34</td>
<td align="center" valign="top">1.08</td>
<td align="center" valign="top">1.25</td>
<td align="center" valign="top">-0.11</td>
</tr>
<tr>
<td align="center" valign="top">Decision tree regression</td>
<td align="center" valign="top">0.22</td>
<td align="center" valign="top">0.86</td>
<td align="center" valign="top">0.79</td>
<td align="center" valign="top">0.12</td>
</tr>
<tr>
<td align="center" valign="top">Bayesian Ridge</td>
<td align="center" valign="top">0.29</td>
<td align="center" valign="top">0.93</td>
<td align="center" valign="top">0.96</td>
<td align="center" valign="top">0.05</td>
</tr>
<tr>
<td align="center" valign="top">SGDRegression</td>
<td align="center" valign="top">0.24</td>
<td align="center" valign="top">0.96</td>
<td align="center" valign="top">0.65</td>
<td align="center" valign="top">0.017</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In <xref ref-type="table" rid="T10">Table 10</xref>, where PCA-based dimension reduction is applied, results vary. Nu-SVR improves slightly, achieving the lowest MAE (0.21) and MAPE (0.55), while its MSE remains stable (0.89). However, Linear SVR deteriorates significantly, with MSE increasing to 0.97 and R<sup>2</sup> dropping to -0.004. Decision Tree regression, which performed well without PCA, now shows a higher MSE (1.02) and a negative R<sup>2</sup> (-0.04). K-Neighbors Regression remains poor, with nearly identical metrics as in <xref ref-type="table" rid="T7">Table 7</xref>. Bayesian Ridge is largely unaffected by PCA, maintaining similar MAE, MSE, and R<sup>2</sup> values.</p>
<table-wrap id="T10">
<label>Table 10.</label>
<caption><p>Regression results for the AI DB and with dimension reduction based on PCA.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Regression models</th>
<th align="center" valign="top">MAE</th>
<th align="center" valign="top">MSE</th>
<th align="center" valign="top">MAPE</th>
<th align="center" valign="top">R2</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">SVR</td>
<td align="center" valign="top">0.24</td>
<td align="center" valign="top">0.89</td>
<td align="center" valign="top">0.63</td>
<td align="center" valign="top">0.08</td>
</tr>
<tr>
<td align="center" valign="top">Nu-SVR</td>
<td align="center" valign="top">0.21</td>
<td align="center" valign="top">0.89</td>
<td align="center" valign="top">0.55</td>
<td align="center" valign="top">0.08</td>
</tr>
<tr>
<td align="center" valign="top">Linear SVR</td>
<td align="center" valign="top">0.25</td>
<td align="center" valign="top">0.97</td>
<td align="center" valign="top">0.94</td>
<td align="center" valign="top">-0.004</td>
</tr>
<tr>
<td align="center" valign="top">K-Neighbors Regression</td>
<td align="center" valign="top">0.34</td>
<td align="center" valign="top">1.08</td>
<td align="center" valign="top">1.28</td>
<td align="center" valign="top">-0.11</td>
</tr>
<tr>
<td align="center" valign="top">Decision tree regression</td>
<td align="center" valign="top">0.23</td>
<td align="center" valign="top">1.02</td>
<td align="center" valign="top">0.73</td>
<td align="center" valign="top">-0.04</td>
</tr>
<tr>
<td align="center" valign="top">Bayesian Ridge</td>
<td align="center" valign="top">0.29</td>
<td align="center" valign="top">0.93</td>
<td align="center" valign="top">0.96</td>
<td align="center" valign="top">0.04</td>
</tr>
<tr>
<td align="center" valign="top">SGDRegression</td>
<td align="center" valign="top">0.24</td>
<td align="center" valign="top">0.97</td>
<td align="center" valign="top">0.59</td>
<td align="center" valign="top">-0.0004</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Tables 11 to 15 make a brief comparison of different proposed methods based on results of four performance metrics, including MAE, MSE, MAPE, and R2. In these tables, DR stands for dimension reduction. In our implementations, APJ DB has been evaluated with DR and without DR; AI DB follows the same rule as APJ DB with and without DR but both JI DB and PI BI have been used without DR.</p>
<table-wrap id="T11">
<label>Table 8.</label>
<caption><p>Regression results for the JI DB and without dimension reduction.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Regression models</th>
<th align="center" valign="top">MAE</th>
<th align="center" valign="top">MSE</th>
<th align="center" valign="top">MAPE</th>
<th align="center" valign="top">R2</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">SVR</td>
<td align="center" valign="top">0.26</td>
<td align="center" valign="top">0.99</td>
<td align="center" valign="top">0.71</td>
<td align="center" valign="top">-0.023</td>
</tr>
<tr>
<td align="center" valign="top">Nu-SVR</td>
<td align="center" valign="top">0.24</td>
<td align="center" valign="top">1.02</td>
<td align="center" valign="top">0.73</td>
<td align="center" valign="top">-0.05</td>
</tr>
<tr>
<td align="center" valign="top">Linear SVR</td>
<td align="center" valign="top">0.26</td>
<td align="center" valign="top">0.99</td>
<td align="center" valign="top">0.73</td>
<td align="center" valign="top">-0.02</td>
</tr>
<tr>
<td align="center" valign="top">K-Neighbors Regression</td>
<td align="center" valign="top">0.26</td>
<td align="center" valign="top">0.92</td>
<td align="center" valign="top">1.06</td>
<td align="center" valign="top">0.05</td>
</tr>
<tr>
<td align="center" valign="top">Decision tree regression</td>
<td align="center" valign="top">0.23</td>
<td align="center" valign="top">1.02</td>
<td align="center" valign="top">0.71</td>
<td align="center" valign="top">-0.05</td>
</tr>
<tr>
<td align="center" valign="top">Bayesian Ridge</td>
<td align="center" valign="top">0.31</td>
<td align="center" valign="top">0.97</td>
<td align="center" valign="top">0.94</td>
<td align="center" valign="top">-0.001</td>
</tr>
<tr>
<td align="center" valign="top">SGDRegression</td>
<td align="center" valign="top">0.32</td>
<td align="center" valign="top">0.97</td>
<td align="center" valign="top">0.98</td>
<td align="center" valign="top">-0.001</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T12">
<label>Table 9.</label>
<caption><p>Regression results for the PI DB and without dimension reduction.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Regression models</th>
<th align="center" valign="top">MAE</th>
<th align="center" valign="top">MSE</th>
<th align="center" valign="top">MAPE</th>
<th align="center" valign="top">R2</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">SVR</td>
<td align="center" valign="top">0.11</td>
<td align="center" valign="top">0.21</td>
<td align="center" valign="top">0.29</td>
<td align="center" valign="top">0.78</td>
</tr>
<tr>
<td align="center" valign="top">Nu-SVR</td>
<td align="center" valign="top">0.09</td>
<td align="center" valign="top">0.16</td>
<td align="center" valign="top">0.24</td>
<td align="center" valign="top">0.84</td>
</tr>
<tr>
<td align="center" valign="top">Linear SVR</td>
<td align="center" valign="top">0.20</td>
<td align="center" valign="top">0.54</td>
<td align="center" valign="top">0.60</td>
<td align="center" valign="top">0.44</td>
</tr>
<tr>
<td align="center" valign="top">K-Neighbors Regression</td>
<td align="center" valign="top">0.17</td>
<td align="center" valign="top">0.42</td>
<td align="center" valign="top">0.51</td>
<td align="center" valign="top">0.56</td>
</tr>
<tr>
<td align="center" valign="top">Decision tree regression</td>
<td align="center" valign="top">0.16</td>
<td align="center" valign="top">0.46</td>
<td align="center" valign="top">0.48</td>
<td align="center" valign="top">0.52</td>
</tr>
<tr>
<td align="center" valign="top">Bayesian Ridge</td>
<td align="center" valign="top">0.19</td>
<td align="center" valign="top">0.55</td>
<td align="center" valign="top">0.6</td>
<td align="center" valign="top">0.42</td>
</tr>
<tr>
<td align="center" valign="top">SGDRegression</td>
<td align="center" valign="top">0.19</td>
<td align="center" valign="top">0.56</td>
<td align="center" valign="top">0.58</td>
<td align="center" valign="top">0.42</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T13">
<label>Table 10.</label>
<caption><p>Comparing all the methods based on MAE.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Method</th>
<th align="center" valign="top">Min (MAE)</th>
<th align="center" valign="top">Algorithm</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">AJP DB-without DR</td>
<td align="center" valign="top">0.1558</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
<tr>
<td align="center" valign="top">AJP DB-with DR</td>
<td align="center" valign="top">0.1835</td>
<td align="center" valign="top">KNN</td>
</tr>
<tr>
<td align="center" valign="top">AI DB-without DR</td>
<td align="center" valign="top">0.2150</td>
<td align="center" valign="top">LSVR</td>
</tr>
<tr>
<td align="center" valign="top">AI DB-with DR</td>
<td align="center" valign="top">0.2088</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
<tr>
<td align="center" valign="top">JI DB-without DR</td>
<td align="center" valign="top">0.2339</td>
<td align="center" valign="top">Decision Tree</td>
</tr>
<tr>
<td align="center" valign="top">PI DB-without DR</td>
<td align="center" valign="top">0.0930</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T14">
<label>Table 11.</label>
<caption><p>Comparing all the methods based on MSE.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Method</th>
<th align="center" valign="top">Min (MSE)</th>
<th align="center" valign="top">Algorithm</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">AJP DB-without DR</td>
<td align="center" valign="top">0.5020</td>
<td align="center" valign="top">SVR</td>
</tr>
<tr>
<td align="center" valign="top">AJP DB-with DR</td>
<td align="center" valign="top">0.5300</td>
<td align="center" valign="top">KNN</td>
</tr>
<tr>
<td align="center" valign="top">AI DB-without DR</td>
<td align="center" valign="top">0.8570</td>
<td align="center" valign="top">Decision Tree</td>
</tr>
<tr>
<td align="center" valign="top">AI DB-with DR</td>
<td align="center" valign="top">0.8904</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
<tr>
<td align="center" valign="top">JI DB-without DR</td>
<td align="center" valign="top">0.9200</td>
<td align="center" valign="top">KNN</td>
</tr>
<tr>
<td align="center" valign="top">PI DB-without DR</td>
<td align="center" valign="top">0.1587</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T15">
<label>Table 12.</label>
<caption><p>Comparing all the methods based on MAPE.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Method</th>
<th align="center" valign="top">Min (MAPE)</th>
<th align="center" valign="top">Algorithm</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">AJP DB-without DR</td>
<td align="center" valign="top">0.3618</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
<tr>
<td align="center" valign="top">AJP DB-with DR</td>
<td align="center" valign="top">0.5300</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
<tr>
<td align="center" valign="top">AI DB-without DR</td>
<td align="center" valign="top">0.6114</td>
<td align="center" valign="top">LSVR</td>
</tr>
<tr>
<td align="center" valign="top">AI DB-with DR</td>
<td align="center" valign="top">0.5517</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
<tr>
<td align="center" valign="top">JI DB-without DR</td>
<td align="center" valign="top">0.7158</td>
<td align="center" valign="top">Decision Tree</td>
</tr>
<tr>
<td align="center" valign="top">PI DB-without DR</td>
<td align="center" valign="top">0.2437</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Moreover, in Tables 16 and 17, we summarized all information in tables to find out which algorithm leads us to the best result regarding the methods. It can be clearly seen that the dominant algorithm is Nu-SVR, which outperformed all the other ones, and in terms of methods, features included in PI DB led us to the least error rates and the most R2.</p>
<table-wrap id="T16">
<label>Table 13.</label>
<caption><p>Comparing all the methods based on R2.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Method</th>
<th align="center" valign="top">Max (R2)</th>
<th align="center" valign="top">Algorithm</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">AJP DB-without DR</td>
<td align="center" valign="top">0.4828</td>
<td align="center" valign="top">SVR</td>
</tr>
<tr>
<td align="center" valign="top">AJP DB-with DR</td>
<td align="center" valign="top">0.4545</td>
<td align="center" valign="top">KNN</td>
</tr>
<tr>
<td align="center" valign="top">AI DB-without DR</td>
<td align="center" valign="top">0.1173</td>
<td align="center" valign="top">Decision Tree</td>
</tr>
<tr>
<td align="center" valign="top">AI DB-with DR</td>
<td align="center" valign="top">0.0838</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
<tr>
<td align="center" valign="top">JI DB-without DR</td>
<td align="center" valign="top">0.0533</td>
<td align="center" valign="top">KNN</td>
</tr>
<tr>
<td align="center" valign="top">PI DB-without DR</td>
<td align="center" valign="top">0.8366</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T17">
<label>Table 14.</label>
<caption><p>Comparing best-achieved results among all methods along with different performance metrics.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Performance Metric</th>
<th align="center" valign="top">Method</th>
<th align="center" valign="top">Algorithm</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">MAE</td>
<td align="center" valign="top">PI DB-without DR</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
<tr>
<td align="center" valign="top">MSE</td>
<td align="center" valign="top">PI DB-without DR</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
<tr>
<td align="center" valign="top">MAPE</td>
<td align="center" valign="top">PI DB-without DR</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
<tr>
<td align="center" valign="top">R2</td>
<td align="center" valign="top">PI DB-without DR</td>
<td align="center" valign="top">Nu-SVR</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Furthermore, <xref ref-type="fig" rid="F5">Figure 5</xref> and <xref ref-type="fig" rid="F6">Figure 6</xref> draw a wide comparison among all proposed algorithms and our four introduced datasets in terms of MAE and R2, respectively.</p>
<fig id="F5">
<label>Figure 5.</label>
<caption><p>A comparison among all proposed algorithms and datasets based on MAE.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-fig5.jpg"><alt-text>none</alt-text></graphic>
</fig>
<fig id="F6">
<label>Figure 6.</label>
<caption><p>A comparison among all proposed algorithms and datasets based on R2.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-fig6.jpg"><alt-text>none</alt-text></graphic>
</fig>
<p>Ultimately, we called our best model found on PI DB-without DR &#x0026; Nu-SVR as PI-CCP. According to Tables 18, 19, and 20, the simulation results achieved based on the proposed PI-CCP modle are compared with the available studies presented by (<xref ref-type="bibr" rid="R2">Abrishami et al., 2019</xref>; <xref ref-type="bibr" rid="R30">Gao et al., 2024</xref>; <xref ref-type="bibr" rid="R43">Li et al., 2015</xref>; <xref ref-type="bibr" rid="R44">Li et al., 2019</xref>). Table 18 shows that PI-CCP performs exceptionally well at minimizing errors, as evidenced by the remarkably low MAE. It is noteworthy that it performs better than esteemed models like (<xref ref-type="bibr" rid="R44">Li et al., 2019</xref>), demonstrating its effectiveness in generating precise forecasts. The significant difference with (<xref ref-type="bibr" rid="R30">Gao et al., 2024</xref>) emphasizes PI-CCP&#x0027;s superiority even more.</p>
<table-wrap id="T18">
<label>Table 158.</label>
<caption><p>Achievement comparison based on MAE.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Model</th>
<th align="center" valign="top">MAE</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">Proposed (PI-CCP)</td>
<td align="center" valign="top">0.0930</td>
</tr>
<tr>
<td align="center" valign="top">NIPS (<xref ref-type="bibr" rid="R44">S. Li et al., 2019</xref>)</td>
<td align="center" valign="top">0.1349</td>
</tr>
<tr>
<td align="center" valign="top">ICLR (<xref ref-type="bibr" rid="R44">S. Li et al., 2019</xref>)</td>
<td align="center" valign="top">0.1866</td>
</tr>
<tr>
<td align="center" valign="top">(<xref ref-type="bibr" rid="R30">Gao et al., 2024</xref>)</td>
<td align="center" valign="top">7.3000</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The R2 values demonstrate how well PI-CCP captures the variation in citation counts. surpassing both CCP and T-CCP (<xref ref-type="bibr" rid="R43">Li et al., 2015</xref>), and getting near NNCP (<xref ref-type="bibr" rid="R2">Abrishami &#x0026; Aliakbary, 2019</xref>), PI-CCP demonstrates its validity as a predictor, underscoring its usefulness in figuring out the influence of research papers (see Table 19). Besides, Table 20 shows that PI-CCP maintains remarkable precision in citation count predictions, while NNCP (<xref ref-type="bibr" rid="R2">Abrishami &#x0026; Aliakbary, 2019</xref>) displays a reduced MSE. This ensures a balance between accuracy and reliability. As a result, PI-CCP is constantly positioned as a state-of-the-art and highly accurate model for citation count prediction by the thorough examination across MAE, R2, and MSE.</p>
<table-wrap id="T19">
<label>Table 169.</label>
<caption><p>Achievement comparison based on R2.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Model</th>
<th align="center" valign="top">R2</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">Proposed (PI-CCP)</td>
<td align="center" valign="top">0.83</td>
</tr>
<tr>
<td align="center" valign="top">CCP (<xref ref-type="bibr" rid="R43">C.-T. Li et al., 2015</xref>)</td>
<td align="center" valign="top">0.53</td>
</tr>
<tr>
<td align="center" valign="top">T-CCP (<xref ref-type="bibr" rid="R43">C.-T. Li et al., 2015</xref>)</td>
<td align="center" valign="top">0.68</td>
</tr>
<tr>
<td align="center" valign="top">NNCP (<xref ref-type="bibr" rid="R2">Abrishami &#x0026; Aliakbary, 2019</xref>)</td>
<td align="center" valign="top">0.79</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="T20">
<label>Table 20.</label>
<caption><p>Achievement comparison based on MSE.</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top">Model</th>
<th align="center" valign="top">MSE</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top">Proposed (PI-CCP)</td>
<td align="center" valign="top">0.158</td>
</tr>
<tr>
<td align="center" valign="top">NNCP (<xref ref-type="bibr" rid="R2">Abrishami &#x0026; Aliakbary, 2019</xref>)</td>
<td align="center" valign="top">0.034</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="sec5">
<title>Conclusion and future work</title>
<p>Citation count serves as crucial metrics in evaluating the impact of scientific articles and researchers, playing a pivotal role in scholarly and academic endeavors. The purpose of our research was to implement a high-accuracy citation count prediction (CCP) model based on easy access and available public data. As a first step, we created four datasets (AI DB, JI DB, and PI DB) containing twenty three proposed attributes. The data were collected from about 2000 GSPs in the fields of computer science and electrical engineering. The obtained results of the proposed model, so called PI-CCP, allows us to conclude that the features suggested in the Paper Information Dataset (PI DB) including <inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="c3-ie14.jpg"/> and OAA are the most crucial variables in predicting the number of citations for scientific works. Besides, the most effective regression technique for forecasting citation count was determined to be the Nu-SVR algorithm. Achieving records such as 0.0930 for MAE and 0.8366 for R2 confirm that among all discussed algorithms, Nu-SVR is capable of managing the intricate, non-linear correlations between the input features and citation counts. Additionally, we examined APJ DB and AI DB in two different configurations: one with PCA/t-SNE-based dimension reduction and the other without them. The results showed that neither PCA nor t-SNE can consistently produce a lower error. Comparative analyses against existing models in the literature affirm the significant advancements achieved by our proposed algorithm, notably outperforming others. Our research not only presents the robust PI-CCP model but also contributes methodologically by offering insights into the selection of parameters, dataset creation, and algorithmic choices. The identification of crucial variables and the superior performance of Nu-SVR underscore the novelty and significance of our work in the domain of citation count prediction.</p>
<p>Even though we introduced twenty three novel features, there may still be other potentially relevant variables that could not be missed. Factors such as historical citation trends (time series), collaboration networks, social media presence, the impact of conference versus journal publications, institutional and funding factors, etc. may also play significant roles in citation counts but were not included in our analysis. Besides, our study focuses specifically on computer science and electrical engineering. The findings may not be directly applicable to other fields, as citation behaviors can vary significantly across disciplines. Future research could explore the applicability of our model in different academic domains.</p>
<p><bold>Disclosure statement:</bold> The authors report there are no competing interests to declare.</p>
<p><bold>Funding statement:</bold> There is no funding resource for this study.</p>
<p><bold>Conflict of interest:</bold> There is no conflict of interest to declare.</p>
</sec>
<sec id="sec6">
<title>Copyright</title>
<p>Authors contributing to <italic>Information Research</italic> agree to publish their articles under a <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by-nc/4.0/"><underline>Creative Commons CC BY-NC 4.0 license,</underline></ext-link> which gives third parties the right to copy and redistribute the material in any medium or format. It also gives third parties the right to remix, transform and build upon the material for any purpose, except commercial, on the condition that clear acknowledgment is given to the author(s) of the work, that a link to the license is provided and that it is made clear if changes have been made to the work. This must be done in a reasonable manner, and must not imply that the licensor endorses the use of the work by third parties. The author(s) retain copyright to the work. You can also read more at: <ext-link ext-link-type="uri" xlink:href="https://publicera.kb.se/ir/openaccess"><underline>https://publicera.kb.se/ir/openaccess</underline></ext-link></p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="R1"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Abramo</surname><given-names>G.</given-names></name><name><surname>D&#x2019;angelo</surname><given-names>C. A.</given-names></name><name><surname>Di Costa</surname><given-names>F.</given-names></name></person-group><year>2023</year><article-title>Correlating article citedness and journal impact: An empirical investigation by field on a large-scale dataset</article-title><source>Scientometrics</source><volume>128</volume><issue>3</issue><fpage>1877</fpage><lpage>1894</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-022-04622-0">https://doi.org/10.1007/s11192-022-04622-0</ext-link></comment></element-citation></ref>
<ref id="R2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Abrishami</surname><given-names>A.</given-names></name><name><surname>Aliakbary</surname><given-names>S.</given-names></name></person-group><year>2019</year><article-title>Predicting citation counts based on deep neural network learning techniques</article-title><source>Journal of Informetrics</source><volume>13</volume><issue>2</issue><fpage>485</fpage><lpage>499</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.joi.2019.02.011">https://doi.org/10.1016/j.joi.2019.02.011</ext-link></comment></element-citation></ref>
<ref id="R3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Aksnes</surname><given-names>D. W.</given-names></name><name><surname>Langfeldt</surname><given-names>L.</given-names></name><name><surname>Wouters</surname><given-names>P.</given-names></name></person-group><year>2019</year><article-title>Citations, citation indicators, and research quality: An overview of basic concepts and theories</article-title><source>Sage Open</source><volume>9</volume><issue>1</issue><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1177/215824401982957">https://doi.org/10.1177/215824401982957</ext-link></comment></element-citation></ref>
<ref id="R4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Almas</surname><given-names>K.</given-names></name><name><surname>Ur Rehman</surname><given-names>S.</given-names></name><name><surname>Al-Harbi</surname><given-names>F.</given-names></name><name><surname>Qadir Khan</surname><given-names>S.</given-names></name><name><surname>Ahmed Farooqi</surname><given-names>F.</given-names></name><name><surname>Smith</surname><given-names>S.</given-names></name><name><surname>Ahmad</surname><given-names>S.</given-names></name></person-group><year>2021</year><article-title>Significance of variable contributing factors on impact factor of Clarivate analytics dental journals</article-title><source>Serials Review</source><volume>47</volume><issue>3&#x2013;4</issue><fpage>201</fpage><lpage>214</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/00987913.2021.2018225">https://doi.org/10.1080/00987913.2021.2018225</ext-link></comment></element-citation></ref>
<ref id="R5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Amin</surname><given-names>M.</given-names></name><name><surname>Mabe</surname><given-names>M. A.</given-names></name></person-group><year>2003</year><article-title>Impact factors: Use and abuse</article-title><source>Medicina (Buenos Aires)</source><volume>63</volume><issue>4</issue><fpage>347</fpage><lpage>354</lpage><comment><ext-link ext-link-type="uri" xlink:href="https://medicinabuenosaires.com/revistas/vol63-03/4/Impact%20factors-use%20and%20abuse.pdf">https://medicinabuenosaires.com/revistas/vol63-03/4/Impact%20factors-use%20and%20abuse.pdf</ext-link></comment><comment>Archived at</comment><comment><ext-link ext-link-type="uri" xlink:href="https://web.archive.org/web/20250325222750/http://www.medicinabuenosaires.com/revistas/vol63-03/4/Impact%20factors-use%20and%20abuse.pdf">https://web.archive.org/web/20250325222750/http://www.medicinabuenosaires.com/revistas/vol63-03/4/Impact%20factors-use%20and%20abuse.pdf</ext-link></comment></element-citation></ref>
<ref id="R6"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Baas</surname><given-names>J.</given-names></name><name><surname>Schotten</surname><given-names>M.</given-names></name><name><surname>Plume</surname><given-names>A.</given-names></name><name><surname>C&#x00F4;t&#x00E9;</surname><given-names>G.</given-names></name><name><surname>Karimi</surname><given-names>R.</given-names></name></person-group><year>2020</year><article-title>Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies</article-title><source>Quantitative Science Studies</source><volume>1</volume><issue>1</issue><fpage>377</fpage><lpage>386</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1162/qss_a_00019">https://doi.org/10.1162/qss_a_00019</ext-link></comment></element-citation></ref>
<ref id="R7"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Bahaghighat</surname><given-names>Mahdi</given-names></name><name><surname>Jahani rad</surname><given-names>P.</given-names></name></person-group><year>2024</year><source>AoI2WoS: Mapping area of interest in Google Scholar profile to Web Of Science (WoS) scientific fields categories</source><publisher-name>Mendeley Data</publisher-name><comment><ext-link ext-link-type="doi" xlink:href="http://doi.org/10.17632/nr7zfdjm7f.1">http://doi.org/10.17632/nr7zfdjm7f.1</ext-link></comment></element-citation></ref>
<ref id="R8"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rad</surname><given-names>P. J.</given-names></name><name><surname>Bahaghighat</surname><given-names>M.</given-names></name></person-group><year>2024</year><source>Hierarchical text classification for web of science scientific fields. Facta Universitatis, Series: Electronics and Energetics</source><volume>37</volume><issue>4</issue><fpage>703</fpage><lpage>732</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.2298/FUEE2404703J">https://doi.org/10.2298/FUEE2404703J</ext-link></comment></element-citation></ref>
<ref id="R9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bai</surname><given-names>X.</given-names></name><name><surname>Zhang</surname><given-names>F.</given-names></name><name><surname>Liu</surname><given-names>J.</given-names></name><name><surname>Wang</surname><given-names>X.</given-names></name><name><surname>Xia</surname><given-names>F.</given-names></name></person-group><year>2025</year><article-title>Revolutionizing scholarly impact: Advanced evaluations, predictive models, and future directions</article-title><source>SpringerX Bai, F Zhang, J Liu, X Wang, F XiaArtificial Intelligence Review, 2025&#x2022;Springer</source><volume>58</volume><issue>10</issue><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/S10462-025-11315-6">https://doi.org/10.1007/S10462-025-11315-6</ext-link></comment></element-citation></ref>
<ref id="R10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Basak</surname><given-names>D.</given-names></name><name><surname>Pal</surname><given-names>S.</given-names></name><name><surname>Patranabis</surname><given-names>D. C.</given-names></name></person-group><year>2007</year><article-title>Support vector regression</article-title><source>Neural Information Processing-Letters and Reviews</source><volume>11</volume><issue>10</issue><fpage>203</fpage><lpage>224</lpage><comment><ext-link ext-link-type="uri" xlink:href="https://static.aminer.org/pdf/PDF/000/337/560/uncertainty_support_vector_method_for_ordinal_regression.pdf">https://static.aminer.org/pdf/PDF/000/337/560/uncertainty_support_vector_method_for_ordinal_regression.pdf</ext-link></comment><comment>Archived at</comment><comment><ext-link ext-link-type="uri" xlink:href="https://web.archive.org/web/20200709085151/https://static.aminer.org/pdf/PDF/000/337/560/uncertainty_support_vector_method_for_ordinal_regression.pdf">https://web.archive.org/web/20200709085151/https://static.aminer.org/pdf/PDF/000/337/560/uncertainty_support_vector_method_for_ordinal_regression.pdf</ext-link></comment></element-citation></ref>
<ref id="R11"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Belikov</surname><given-names>A. V</given-names></name><name><surname>Belikov</surname><given-names>V. V.</given-names></name></person-group><year>2015</year><article-title>A citation-based, author-and age-normalized, logarithmic index for evaluation of individual researchers independently of publication counts</article-title><source>F1000Research</source><volume>4</volume><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.12688/f1000research.7070.2">https://doi.org/10.12688/f1000research.7070.2</ext-link></comment></element-citation></ref>
<ref id="R12"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bhatt</surname><given-names>D.</given-names></name><name><surname>Aggarwal</surname><given-names>P.</given-names></name><name><surname>Bhattacharya</surname><given-names>P.</given-names></name><name><surname>Devabhaktuni</surname><given-names>V.</given-names></name></person-group><year>2012</year><article-title>An enhanced mems error modeling approach based on nu-support vector regression</article-title><source>Sensors</source><volume>12</volume><issue>7</issue><fpage>9448</fpage><lpage>9466</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3390/s120709448">https://doi.org/10.3390/s120709448</ext-link></comment></element-citation></ref>
<ref id="R13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bishop</surname><given-names>C. M.</given-names></name></person-group><year>2006</year><source>Pattern recognition and machine learning. Springer Google Scholar</source><volume>2</volume><fpage>1122</fpage><lpage>1128</lpage></element-citation></ref>
<ref id="R14"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bl&#x00FC;mel</surname><given-names>C.</given-names></name><name><surname>Schniedermann</surname><given-names>A.</given-names></name></person-group><year>2020</year><article-title>Studying review articles in scientometrics and beyond: A research agenda</article-title><source>Scientometrics</source><volume>124</volume><issue>1</issue><fpage>711</fpage><lpage>728</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-020-03431-7">https://doi.org/10.1007/s11192-020-03431-7</ext-link></comment></element-citation></ref>
<ref id="R15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bornmann</surname><given-names>L.</given-names></name><name><surname>Daniel</surname><given-names>H.</given-names></name></person-group><year>2008</year><article-title>What do citation counts measure? A review of studies on citing behavior</article-title><source>Journal of Documentation</source><volume>64</volume><issue>1</issue><fpage>45</fpage><lpage>80</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1108/00220410810844150">https://doi.org/10.1108/00220410810844150</ext-link></comment></element-citation></ref>
<ref id="R16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bornmann</surname><given-names>L.</given-names></name><name><surname>Daniel</surname><given-names>H.</given-names></name></person-group><year>2009</year><article-title>The state of h index research: Is the h index the ideal way to measure research performance?</article-title><source>EMBO Reports</source><volume>10</volume><issue>1</issue><fpage>2</fpage><lpage>6</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1038/embor.2008.233">https://doi.org/10.1038/embor.2008.233</ext-link></comment></element-citation></ref>
<ref id="R17"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Braun</surname><given-names>T.</given-names></name><name><surname>Gl&#x00E4;nzel</surname><given-names>W.</given-names></name><name><surname>Schubert</surname><given-names>A.</given-names></name></person-group><year>2006</year><article-title>A Hirsch-type index for journals</article-title><source>Scientometrics</source><volume>69</volume><fpage>169</fpage><lpage>173</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-006-0147-4">https://doi.org/10.1007/s11192-006-0147-4</ext-link></comment></element-citation></ref>
<ref id="R18"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Broadus</surname><given-names>R. N.</given-names></name></person-group><year>1987</year><article-title>Toward a definition of &#x201C;bibliometrics.&#x201D;</article-title><source>Scientometrics</source><volume>12</volume><issue>5&#x2013;6</issue><fpage>373</fpage><lpage>379</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/BF02016680">https://doi.org/10.1007/BF02016680</ext-link></comment></element-citation></ref>
<ref id="R19"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Butler</surname><given-names>L.</given-names></name><name><surname>Visser</surname><given-names>M. S.</given-names></name></person-group><year>2006</year><article-title>Extending citation analysis to non-source items</article-title><source>Scientometrics</source><volume>66</volume><issue>2</issue><fpage>327</fpage><lpage>343</lpage></element-citation></ref>
<ref id="R20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cameron</surname><given-names>A. C.</given-names></name><name><surname>Windmeijer</surname><given-names>F. A. G.</given-names></name></person-group><year>1997</year><article-title>An R-squared measure of goodness of fit for some common nonlinear regression models</article-title><source>Journal of Econometrics</source><volume>77</volume><issue>2</issue><fpage>329</fpage><lpage>342</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/S0304-4076(96)01818-0">https://doi.org/10.1016/S0304-4076(96)01818-0</ext-link></comment></element-citation></ref>
<ref id="R21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cao</surname><given-names>X.</given-names></name><name><surname>Chen</surname><given-names>Y.</given-names></name><name><surname>Liu</surname><given-names>K. J. R.</given-names></name></person-group><year>2016</year><article-title>A data analytic approach to quantifying scientific impact</article-title><source>Journal of Informetrics</source><volume>10</volume><issue>2</issue><fpage>471</fpage><lpage>484</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.joi.2016.02.006">https://doi.org/10.1016/j.joi.2016.02.006</ext-link></comment></element-citation></ref>
<ref id="R22"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>J. T.</given-names></name><name><surname>Lee</surname><given-names>C.</given-names></name><name><surname>Chen</surname><given-names>L. Y.</given-names></name></person-group><year>2024</year><source>Statistical prediction and machine learning</source><publisher-name>CRC Press</publisher-name></element-citation></ref>
<ref id="R23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cover</surname><given-names>T.</given-names></name><name><surname>Hart</surname><given-names>P.</given-names></name></person-group><year>1967</year><article-title>Nearest neighbor pattern classification</article-title><source>IEEE Transactions on Information Theory</source><volume>13</volume><issue>1</issue><fpage>21</fpage><lpage>27</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/TIT.1967.1053964">https://doi.org/10.1109/TIT.1967.1053964</ext-link></comment></element-citation></ref>
<ref id="R24"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cunningham</surname><given-names>P.</given-names></name><name><surname>Delany</surname><given-names>S. J.</given-names></name></person-group><year>2021</year><article-title>K-nearest neighbour classifiers-a tutorial</article-title><source>ACM Computing Surveys (CSUR)</source><volume>54</volume><issue>6</issue><fpage>1</fpage><lpage>25</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1145/3459665">https://doi.org/10.1145/3459665</ext-link></comment></element-citation></ref>
<ref id="R25"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Durieux</surname><given-names>V.</given-names></name><name><surname>Gevenois</surname><given-names>P. A.</given-names></name></person-group><year>2010</year><article-title>Bibliometric indicators: Quality measurements of scientific publication</article-title><source>Radiology</source><volume>255</volume><issue>2</issue><fpage>342</fpage><lpage>351</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1148/radiol.09090626">https://doi.org/10.1148/radiol.09090626</ext-link></comment></element-citation></ref>
<ref id="R26"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Enduri</surname><given-names>M. K.</given-names></name><name><surname>Sankar</surname><given-names>V. U.</given-names></name><name><surname>Hajarathaiah</surname><given-names>K.</given-names></name></person-group><year>2022</year><article-title>Empirical study on citation count prediction of research articles</article-title><source>Journal of Scientometric Research</source><volume>11</volume><issue>2</issue><fpage>155</fpage><lpage>163</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.5530/jscires.11.2.17">https://doi.org/10.5530/jscires.11.2.17</ext-link></comment></element-citation></ref>
<ref id="R27"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fassin</surname><given-names>Y.</given-names></name></person-group><year>2020</year><article-title>The HF-rating as a universal complement to the h-index</article-title><source>Scientometrics</source><volume>125</volume><issue>2</issue><fpage>965</fpage><lpage>990</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-020-03611-5">https://doi.org/10.1007/s11192-020-03611-5</ext-link></comment></element-citation></ref>
<ref id="R28"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Fox</surname><given-names>J.</given-names></name></person-group><year>2015</year><source>Applied regression analysis and generalized linear models</source><publisher-name>SAGE Publications, Inc</publisher-name></element-citation></ref>
<ref id="R29"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Furman</surname><given-names>J. L.</given-names></name><name><surname>Teodoridis</surname><given-names>F.</given-names></name></person-group><year>2020</year><article-title>Automation, research technology, and researchers&#x2019; trajectories: Evidence from computer science and electrical engineering</article-title><source>Organization Science</source><volume>31</volume><issue>2</issue><fpage>330</fpage><lpage>354</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1287/orsc.2019.1308">https://doi.org/10.1287/orsc.2019.1308</ext-link></comment></element-citation></ref>
<ref id="R30"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gao</surname><given-names>T.</given-names></name><name><surname>Liu</surname><given-names>J.</given-names></name><name><surname>Pan</surname><given-names>R.</given-names></name><name><surname>Wang</surname><given-names>H.</given-names></name></person-group><year>2024</year><article-title>Citation counts prediction of statistical publications based on multi-layer academic networks via neural network model</article-title><source>Expert Systems with Applications</source><volume>238</volume><fpage>121634</fpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.eswa.2023.121634">https://doi.org/10.1016/j.eswa.2023.121634</ext-link></comment></element-citation></ref>
<ref id="R31"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Garfield</surname><given-names>E.</given-names></name></person-group><year>2006</year><article-title>The history and meaning of the journal impact factor</article-title><source>Jama</source><volume>295</volume><issue>1</issue><fpage>90</fpage><lpage>93</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1001/jama.295.1.90">https://doi.org/10.1001/jama.295.1.90</ext-link></comment></element-citation></ref>
<ref id="R32"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Gonz&#x00E1;lez-Betancor</surname><given-names>S. M.</given-names></name><name><surname>Dorta-Gonz&#x00E1;lez</surname><given-names>P.</given-names></name></person-group><year>2017</year><article-title>An indicator of the impact of journals based on the percentage of their highly cited publications</article-title><source>Online Information Review</source><volume>41</volume><issue>3</issue><fpage>398</fpage><lpage>411</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1108/OIR-01-2016-0008">https://doi.org/10.1108/OIR-01-2016-0008</ext-link></comment></element-citation></ref>
<ref id="R33"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Groos</surname><given-names>O. V</given-names></name><name><surname>Pritchard</surname><given-names>A.</given-names></name></person-group><year>1969</year><article-title>Documentation notes</article-title><source>Journal of Documentation</source><volume>25</volume><issue>4</issue><fpage>344</fpage><lpage>349</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1108/eb026482">https://doi.org/10.1108/eb026482</ext-link></comment></element-citation></ref>
<ref id="R34"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Halder</surname><given-names>R. K.</given-names></name><name><surname>Uddin</surname><given-names>M. N.</given-names></name><name><surname>Uddin</surname><given-names>M. A.</given-names></name><name><surname>Aryal</surname><given-names>S.</given-names></name><name><surname>Khraisat</surname><given-names>A.</given-names></name></person-group><year>2024</year><article-title>Enhancing K-nearest neighbor algorithm: A comprehensive review and performance analysis of modifications</article-title><source>Journal of Big Data</source><volume>11</volume><issue>1</issue><fpage>113</fpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1186/s40537-024-00973-y">https://doi.org/10.1186/s40537-024-00973-y</ext-link></comment></element-citation></ref>
<ref id="R35"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>He</surname><given-names>G.</given-names></name><name><surname>Gu</surname><given-names>S.</given-names></name><name><surname>Xue</surname><given-names>Z.</given-names></name><name><surname>Duan</surname><given-names>Y.</given-names></name><name><surname>Zhu</surname><given-names>X.</given-names></name></person-group><year>2025</year><article-title>Sequential citation counts prediction enhanced by dynamic contents</article-title><source>Journal of Informetrics</source><volume>19</volume><issue>2</issue><fpage>101645</fpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.joi.2025.101645">https://doi.org/10.1016/j.joi.2025.101645</ext-link></comment></element-citation></ref>
<ref id="R36"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hearst</surname><given-names>M. A.</given-names></name><name><surname>Dumais</surname><given-names>S. T.</given-names></name><name><surname>Osuna</surname><given-names>E.</given-names></name><name><surname>Platt</surname><given-names>J.</given-names></name><name><surname>Scholkopf</surname><given-names>B.</given-names></name></person-group><year>1998</year><article-title>Support vector machines</article-title><source>IEEE Intelligent Systems and Their Applications</source><volume>13</volume><issue>4</issue><fpage>18</fpage><lpage>28</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/5254.708428">https://doi.org/10.1109/5254.708428</ext-link></comment></element-citation></ref>
<ref id="R37"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hirsch</surname><given-names>J. E.</given-names></name></person-group><year>2010</year><article-title>An index to quantify an individual&#x2019;s scientific research output that takes into account the effect of multiple coauthorship</article-title><source>Scientometrics</source><volume>85</volume><issue>3</issue><fpage>741</fpage><lpage>754</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-010-0193-9">https://doi.org/10.1007/s11192-010-0193-9</ext-link></comment></element-citation></ref>
<ref id="R38"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hutchins</surname><given-names>B. I.</given-names></name><name><surname>Yuan</surname><given-names>X.</given-names></name><name><surname>Anderson</surname><given-names>J. M.</given-names></name><name><surname>Santangelo</surname><given-names>G. M.</given-names></name></person-group><year>2016</year><article-title>Relative citation ratio (RCR): A new metric that uses citation rates to measure influence at the article level</article-title><source>PLoS Biology</source><volume>14</volume><issue>9</issue><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1371/journal.pbio.1002541">https://doi.org/10.1371/journal.pbio.1002541</ext-link></comment></element-citation></ref>
<ref id="R39"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jung</surname><given-names>S.</given-names></name><name><surname>Dagobert</surname><given-names>T.</given-names></name><name><surname>Morel</surname><given-names>J.-M.</given-names></name><name><surname>Facciolo</surname><given-names>G.</given-names></name></person-group><year>2024</year><article-title>A review of t-SNE</article-title><source>Image Processing On Line</source><volume>14</volume><fpage>250</fpage><lpage>270</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.5201/ipol.2024.528">https://doi.org/10.5201/ipol.2024.528</ext-link></comment></element-citation></ref>
<ref id="R40"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Khokhlov</surname><given-names>A. N.</given-names></name></person-group><year>2020</year><article-title>How scientometrics became the most important science for researchers of all specialties</article-title><source>Moscow University Biological Sciences Bulletin</source><volume>75</volume><fpage>159</fpage><lpage>163</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3103/s0096392520040057">https://doi.org/10.3103/s0096392520040057</ext-link></comment></element-citation></ref>
<ref id="R41"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Khurana</surname><given-names>P.</given-names></name><name><surname>Sharma</surname><given-names>K.</given-names></name></person-group><year>2022</year><article-title>Impact of h-index on author&#x2019;s rankings: An improvement to the h-index for lower-ranked authors</article-title><source>Scientometrics</source><volume>127</volume><issue>8</issue><fpage>4483</fpage><lpage>4498</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-022-04464-w">https://doi.org/10.1007/s11192-022-04464-w</ext-link></comment></element-citation></ref>
<ref id="R42"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kosyakov</surname><given-names>D.</given-names></name><name><surname>Pislyakov</surname><given-names>V.</given-names></name></person-group><year>2024</year><article-title>&#x201C;I&#x2019;d like to publish in Q1, but there&#x2019;s no Q1 to be found&#x201D;: Study of journal quartile distributions across subject categories and topics</article-title><source>Journal of Informetrics</source><volume>18</volume><issue>1</issue><fpage>101494</fpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.joi.2024.101494">https://doi.org/10.1016/j.joi.2024.101494</ext-link></comment></element-citation></ref>
<ref id="R43"><element-citation publication-type="web"><person-group person-group-type="author"><name><surname>Li</surname><given-names>C.-T.</given-names></name><name><surname>Lin</surname><given-names>Y.-J.</given-names></name><name><surname>Yan</surname><given-names>R.</given-names></name><name><surname>Yeh</surname><given-names>M.-Y.</given-names></name></person-group><year>2015</year><article-title>Trend-based citation count prediction for research articles</article-title><source>Advances in Knowledge Discovery and Data Mining: 19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part I 19</source><fpage>659</fpage><lpage>&#x2013;</lpage><comment><ext-link ext-link-type="doi" xlink:href="http://dx.doi.org/10.1007/978-3-319-18038-0_51">http://dx.doi.org/10.1007/978-3-319-18038-0_51</ext-link></comment></element-citation></ref>
<ref id="R44"><element-citation publication-type="web"><person-group person-group-type="author"><name><surname>Li</surname><given-names>S.</given-names></name><name><surname>Zhao</surname><given-names>W. X.</given-names></name><name><surname>Yin</surname><given-names>E. J.</given-names></name><name><surname>Wen</surname><given-names>J.-R.</given-names></name></person-group><year>2019</year><article-title>A neural citation count prediction model based on peer review text</article-title><source>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</source><fpage>4914</fpage><lpage>&#x2013;</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.18653/v1/D19-1497">https://doi.org/10.18653/v1/D19-1497</ext-link></comment></element-citation></ref>
<ref id="R45"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Lundberg</surname><given-names>J.</given-names></name></person-group><year>2006</year><source>Bibliometrics as a research assessment tool: impact beyond the impact factor</source><publisher-name>Karolinska Institutet (Sweden)</publisher-name></element-citation></ref>
<ref id="R46"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Moed</surname><given-names>H. F.</given-names></name></person-group><year>2006</year><source>Citation analysis in research evaluation</source><comment>9</comment><publisher-name>Springer Science &amp; Business Media</publisher-name></element-citation></ref>
<ref id="R47"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Moussa</surname><given-names>S.</given-names></name></person-group><year>2023</year><article-title>A bibliometric investigation of the journals that were repeatedly suppressed from Clarivate&#x2019;s journal citation reports</article-title><source>Accountability in Research</source><volume>30</volume><issue>8</issue><fpage>592</fpage><lpage>612</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/08989621.2022.2071154">https://doi.org/10.1080/08989621.2022.2071154</ext-link></comment></element-citation></ref>
<ref id="R48"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Murphy</surname><given-names>A. H.</given-names></name></person-group><year>1988</year><article-title>Skill scores based on the mean square error and their relationships to the correlation coefficient</article-title><source>Monthly Weather Review</source><volume>116</volume><issue>12</issue><fpage>2417</fpage><lpage>2424</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1175/1520-0493(1988)116%3C2417:SSBOTM%3E2.0.CO;2">https://doi.org/10.1175/1520-0493(1988)116%3C2417:SSBOTM%3E2.0.CO;2</ext-link></comment></element-citation></ref>
<ref id="R49"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nguyen</surname><given-names>B. T.</given-names></name><name><surname>Nguyen</surname><given-names>T. T.</given-names></name></person-group><year>2025</year><article-title>Forecasting scientific impact: A model for predicting citation counts</article-title><source>Statistics, Optimization &amp;amp; Information Computing</source><volume>13</volume><issue>6</issue><fpage>2601</fpage><lpage>2615</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.19139/soic-2310-5070-2524">https://doi.org/10.19139/soic-2310-5070-2524</ext-link></comment></element-citation></ref>
<ref id="R50"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Okagbue</surname><given-names>H. I.</given-names></name><name><surname>Akhmetshin</surname><given-names>E. M.</given-names></name><name><surname>Teixeira da Silva</surname><given-names>J. A.</given-names></name></person-group><year>2021</year><article-title>Distinct clusters of CiteScore and percentiles in top 1000 journals in Scopus</article-title><source>COLLNET Journal of Scientometrics and Information Management</source><volume>15</volume><issue>1</issue><fpage>133</fpage><lpage>143</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1080/09737766.2021.1934604">https://doi.org/10.1080/09737766.2021.1934604</ext-link></comment></element-citation></ref>
<ref id="R51"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Okagbue</surname><given-names>H. I.</given-names></name><name><surname>Bishop</surname><given-names>S. A.</given-names></name><name><surname>Adamu</surname><given-names>P. I.</given-names></name><name><surname>Opanuga</surname><given-names>A. A.</given-names></name><name><surname>Obasi</surname><given-names>E. C. M.</given-names></name></person-group><year>2020</year><article-title>Analysis of percentiles of computer science, theory and methods journals: CiteScore versus impact factor</article-title><source>DESIDOC Journal of Library &amp; Information Technology</source><volume>40</volume><issue>1</issue><fpage>359</fpage><lpage>365</lpage><comment><ext-link ext-link-type="doi" xlink:href="http://dx.doi.org/10.14429/djlit.40.1.14866">http://dx.doi.org/10.14429/djlit.40.1.14866</ext-link></comment></element-citation></ref>
<ref id="R52"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pedregosa</surname><given-names>F.</given-names></name><name><surname>Varoquaux</surname><given-names>G.</given-names></name><name><surname>Gramfort</surname><given-names>A.</given-names></name><name><surname>Michel</surname><given-names>V.</given-names></name><name><surname>Thirion</surname><given-names>B.</given-names></name><name><surname>Grisel</surname><given-names>O.</given-names></name><name><surname>Blondel</surname><given-names>M.</given-names></name><name><surname>Prettenhofer</surname><given-names>P.</given-names></name><name><surname>Weiss</surname><given-names>R.</given-names></name><name><surname>Dubourg</surname><given-names>V.</given-names></name></person-group><year>2011</year><article-title>Scikit-learn: Machine learning in Python</article-title><source>The Journal of Machine Learning Research</source><volume>12</volume><fpage>2825</fpage><lpage>2830</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.48550/arXiv.1201.0490">https://doi.org/10.48550/arXiv.1201.0490</ext-link></comment></element-citation></ref>
<ref id="R53"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pobiedina</surname><given-names>N.</given-names></name><name><surname>Ichise</surname><given-names>R.</given-names></name></person-group><year>2016</year><article-title>Citation count prediction as a link prediction problem</article-title><source>Applied Intelligence</source><volume>44</volume><fpage>252</fpage><lpage>268</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s10489-015-0657-y">https://doi.org/10.1007/s10489-015-0657-y</ext-link></comment></element-citation></ref>
<ref id="R54"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rostami</surname><given-names>M.</given-names></name><name><surname>Bahaghighat</surname><given-names>M.</given-names></name><name><surname>Zanjireh</surname><given-names>M. M.</given-names></name></person-group><year>2021</year><article-title>Bitcoin daily close price prediction using optimized grid search method</article-title><source>Acta Universitatis Sapientiae, Informatica</source><volume>13</volume><issue>2</issue><fpage>265</fpage><lpage>287</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.2478/ausi-2021-0012">https://doi.org/10.2478/ausi-2021-0012</ext-link></comment></element-citation></ref>
<ref id="R55"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sabry</surname><given-names>F.</given-names></name></person-group><year>2023</year><source>K Nearest Neighbor algorithm: Fundamentals and applications</source><comment>28</comment><publisher-name>One Billion Knowledgeable</publisher-name></element-citation></ref>
<ref id="R56"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Skrodzki</surname><given-names>M.</given-names></name><name><surname>van Geffen</surname><given-names>H.</given-names></name><name><surname>Chaves-de-Plaza</surname><given-names>N. F.</given-names></name><name><surname>H&#x00F6;llt</surname><given-names>T.</given-names></name><name><surname>Eisemann</surname><given-names>E.</given-names></name><name><surname>Hildebrandt</surname><given-names>K.</given-names></name></person-group><year>2024</year><article-title>Accelerating hyperbolic t-SNE</article-title><source>IEEE Transactions on Visualization and Computer Graphics</source><volume>30</volume><issue>7</issue><fpage>4403</fpage><lpage>4415</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1109/TVCG.2024.3364841">https://doi.org/10.1109/TVCG.2024.3364841</ext-link></comment></element-citation></ref>
<ref id="R57"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Smola</surname><given-names>A. J.</given-names></name><name><surname>Sch&#x00F6;lkopf</surname><given-names>B.</given-names></name></person-group><year>2004</year><article-title>A tutorial on support vector regression</article-title><source>Statistics and Computing</source><volume>14</volume><fpage>199</fpage><lpage>222</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1023/B:STCO.0000035301.49549.88">https://doi.org/10.1023/B:STCO.0000035301.49549.88</ext-link></comment></element-citation></ref>
<ref id="R58"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sohrabi</surname><given-names>B.</given-names></name><name><surname>Iraj</surname><given-names>H.</given-names></name></person-group><year>2017</year><article-title>The effect of keyword repetition in abstract and keyword frequency per journal in predicting citation counts</article-title><source>Scientometrics</source><volume>110</volume><fpage>243</fpage><lpage>251</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-016-2161-5">https://doi.org/10.1007/s11192-016-2161-5</ext-link></comment></element-citation></ref>
<ref id="R59"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Song</surname><given-names>Y.</given-names></name><name><surname>Liang</surname><given-names>J.</given-names></name><name><surname>Lu</surname><given-names>J.</given-names></name><name><surname>Zhao</surname><given-names>X.</given-names></name></person-group><year>2017</year><article-title>An efficient instance selection algorithm for k nearest neighbor regression</article-title><source>Neurocomputing</source><volume>251</volume><fpage>26</fpage><lpage>34</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.neucom.2017.04.018">https://doi.org/10.1016/j.neucom.2017.04.018</ext-link></comment></element-citation></ref>
<ref id="R60"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Teixeira da Silva</surname><given-names>J. A.</given-names></name></person-group><year>2020</year><article-title>CiteScore: Advances, evolution, applications, and limitations</article-title><source>Publishing Research Quarterly</source><volume>36</volume><issue>3</issue><fpage>459</fpage><lpage>468</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s12109-020-09736-y">https://doi.org/10.1007/s12109-020-09736-y</ext-link></comment></element-citation></ref>
<ref id="R61"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Torres-Salinas</surname><given-names>D.</given-names></name><name><surname>Valderrama-Baca</surname><given-names>P.</given-names></name><name><surname>Arroyo-Machado</surname><given-names>W.</given-names></name></person-group><year>2022</year><article-title>Is there a need for a new journal metric? Correlations between JCR Impact Factor metrics and the Journal Citation Indicator&#x2014;JCI</article-title><source>Journal of Informetrics</source><volume>16</volume><issue>3</issue><fpage>101315</fpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1016/j.joi.2022.101315">https://doi.org/10.1016/j.joi.2022.101315</ext-link></comment></element-citation></ref>
<ref id="R62"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wallisch</surname><given-names>C.</given-names></name><name><surname>Bach</surname><given-names>P.</given-names></name><name><surname>Hafermann</surname><given-names>L.</given-names></name><name><surname>Klein</surname><given-names>N.</given-names></name><name><surname>Sauerbrei</surname><given-names>W.</given-names></name><name><surname>Steyerberg</surname><given-names>E. W.</given-names></name><name><surname>Heinze</surname><given-names>G.</given-names></name><name><surname>Rauch</surname><given-names>G.</given-names></name><name><surname>Initiative</surname><given-names>T. G. 2 of the S</given-names></name></person-group><year>2022</year><article-title>Review of guidance papers on regression modeling in statistical series of medical journals</article-title><source>PloS One</source><volume>17</volume><issue>1</issue><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1371/journal.pone.0262918">https://doi.org/10.1371/journal.pone.0262918</ext-link></comment></element-citation></ref>
<ref id="R63"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>B.</given-names></name><name><surname>Wu</surname><given-names>F.</given-names></name><name><surname>Shi</surname><given-names>L.</given-names></name></person-group><year>2023</year><article-title>AGSTA-NET: Adaptive graph spatiotemporal attention network for citation count prediction</article-title><source>Scientometrics</source><volume>128</volume><issue>1</issue><fpage>511</fpage><lpage>541</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-022-04541-0">https://doi.org/10.1007/s11192-022-04541-0</ext-link></comment></element-citation></ref>
<ref id="R64"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Willmott</surname><given-names>C. J.</given-names></name><name><surname>Matsuura</surname><given-names>K.</given-names></name></person-group><year>2005</year><article-title>Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance</article-title><source>Climate Research</source><volume>30</volume><issue>1</issue><fpage>79</fpage><lpage>82</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3354/cr030079">https://doi.org/10.3354/cr030079</ext-link></comment></element-citation></ref>
<ref id="R65"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yao</surname><given-names>Z.</given-names></name><name><surname>Ruzzo</surname><given-names>W. L.</given-names></name></person-group><year>2006</year><article-title>A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data</article-title><source>BMC Bioinformatics</source><volume>7</volume><fpage>1</fpage><lpage>11</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1186/1471-2105-7-S1-S11">https://doi.org/10.1186/1471-2105-7-S1-S11</ext-link></comment></element-citation></ref>
<ref id="R66"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Yu</surname><given-names>T.</given-names></name><name><surname>Yu</surname><given-names>G.</given-names></name><name><surname>Li</surname><given-names>P.-Y.</given-names></name><name><surname>Wang</surname><given-names>L.</given-names></name></person-group><year>2014</year><article-title>Citation impact prediction for scientific papers using stepwise regression analysis</article-title><source>Scientometrics</source><volume>101</volume><fpage>1233</fpage><lpage>1252</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-014-1279-6">https://doi.org/10.1007/s11192-014-1279-6</ext-link></comment></element-citation></ref>
<ref id="R67"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zafar</surname><given-names>L.</given-names></name><name><surname>Masood</surname><given-names>N.</given-names></name><name><surname>Hadi</surname><given-names>F.</given-names></name><name><surname>Ahmed</surname><given-names>S.</given-names></name></person-group><year>2024</year><article-title>Citation count prediction of scholarly articles</article-title><source>Journal of Computing &amp; Biomedical Informatics</source><volume>6</volume><issue>2</issue><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.56979/602/2024">https://doi.org/10.56979/602/2024</ext-link></comment></element-citation></ref>
<ref id="R68"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>Z.</given-names></name><name><surname>Yu</surname><given-names>C.</given-names></name><name><surname>Wang</surname><given-names>J.</given-names></name><name><surname>An</surname><given-names>L.</given-names></name></person-group><year>2025</year><article-title>A temporal evolution and fine-grained information aggregation model for citation count prediction</article-title><source>Scientometrics</source><volume>130</volume><issue>4</issue><fpage>2069</fpage><lpage>2091</lpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.1007/s11192-025-05294-2">https://doi.org/10.1007/s11192-025-05294-2</ext-link></comment></element-citation></ref>
<ref id="R69"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhu</surname><given-names>J.</given-names></name><name><surname>Zhou</surname><given-names>J.</given-names></name><name><surname>Pan</surname><given-names>J.</given-names></name><name><surname>Gu</surname><given-names>F.</given-names></name><name><surname>Guo</surname><given-names>J.</given-names></name></person-group><year>2025</year><article-title>Ranking influential non-content factors on scientific papers&#x2019; citation impact: A multidomain comparative analysis</article-title><source>Big Data and Cognitive Computing</source><volume>9</volume><issue>2</issue><fpage>30</fpage><comment><ext-link ext-link-type="doi" xlink:href="https://doi.org/10.3390/bdcc9020030">https://doi.org/10.3390/bdcc9020030</ext-link></comment></element-citation></ref>
</ref-list>
</back>
</article>