<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">IR</journal-id>
<journal-title-group>
<journal-title>Information Research</journal-title>
</journal-title-group>
<issn pub-type="epub">1368-1613</issn>
<publisher>
<publisher-name>University of Bor&#x00E5;s</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">ir30iConf47524</article-id>
<article-id pub-id-type="doi">10.47989/ir30iConf47524</article-id>
<article-categories>
<subj-group xml:lang="en">
<subject>Research article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>How do data authors perform in data-intensive research activities? Evidence from author contribution statement in data papers</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Heng</surname><given-names>Yang</given-names></name>
<xref ref-type="aff" rid="aff0001"/></contrib>
<contrib contrib-type="author"><name><surname>Yonglin</surname><given-names>Yu</given-names></name>
<xref ref-type="aff" rid="aff0002"/></contrib>
<contrib contrib-type="author"><name><surname>Fenghong</surname><given-names>Liu</given-names></name>
<xref ref-type="aff" rid="aff0003"/></contrib>
<aff id="aff0001"><bold>YANG Heng</bold> is a Ph.D. candidate at the National Science Library, Chinese Academy of Sciences. His research interests include scientific data management, data publishing and dissemination, FAIR principles, and related areas. He can be contacted at <email xlink:href="yangheng@mail.las.ac.cn">yangheng@mail.las.ac.cn</email></aff>
<aff id="aff0002"><bold>YU Yonglin</bold> is a master&#x2019;s candidate at the National Science Library, Chinese Academy of Sciences. Her research interests include semantic publishing, scientific information editing and dissemination, and related fields. She can be contacted at <email xlink:href="yuyonglin@mail.las.ac.cn">yuyonglin@mail.las.ac.cn</email></aff>
<aff id="aff0003"><bold>LIU Fenghong</bold> is a research librarian at the National Science Library, Chinese Academy of Sciences. Her research interests include data publishing, scientific data management, FAIR principles, semantic publishing, and related areas. She can be contacted at <email xlink:href="liufh@mail.las.ac.cn">liufh@mail.las.ac.cn</email></aff>
</contrib-group>
<pub-date pub-type="epub"><day>06</day><month>05</month><year>2025</year></pub-date>
<pub-date pub-type="collection"><year>2025</year></pub-date>
<volume>30</volume>
<issue>i</issue>
<fpage>219</fpage>
<lpage>239</lpage>
<permissions>
<copyright-year>2025</copyright-year>
<copyright-holder>&#x00A9; 2025 The Author(s).</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by-nc/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</ext-link>), permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract xml:lang="en">
<title>Abstract</title>
<p><bold>Introduction.</bold> Despite the increasing prevalence of data-intensive scientific research, the division of labor in these activities and the performance of data authors remain underexplored. By employing the Contributor Roles Taxonomy (CRediT), this study examines the division of scientific labor in data papers from <italic>Data in Brief.</italic></p>
<p><bold>Method and analysis.</bold> Utilizing methods of mathematical statistics and data visualization, we analysed the connections between the 14 CRediT roles within data papers. We also explored the relationship between the distribution of labor and the size and discipline of the authorial team, as well as the associations between key authors and their respective CRediT roles.</p>
<p><bold>Results.</bold> The results show that 1) data papers rarely make full use of the 14 CRediT roles to describe author contributions. 2) Team size and discipline have a significant impact on the labor division of data-intensive scientific research activities. 3) The need for data collection and analysis is the main reason for the expansion of team size, which is particularly evident in the natural sciences. 4) Corresponding authors and first authors continue to take on core roles. 5) Meanwhile, undertaking data analysis and processing-related tasks, such as <italic>&#x2018;Software&#x2019;</italic>, helps authors advance in the author order of data papers.</p>
<p><bold>Conclusion.</bold> This study provides insights into the division of labor in data-intensive scientific research and shows that CRediT has limitations in fully capturing the research workflow of data papers. We propose developing a taxonomy specific to data papers, such as DP - CRediT.</p>
</abstract>
</article-meta>
</front>
<body>
<sec id="sec1">
<title>Introduction</title>
<p>Scientific data, as a crucial research output, has now widely gained recognition for its academic and economic value (Greenberg, Wu, et al., 2023; <xref rid="R30" ref-type="bibr">Pasquetto et al., 2017</xref>; <xref rid="R48" ref-type="bibr">Wilson et al., 2014</xref>). Under the impetus of open science, the forms of data publication are becoming increasingly diverse and the content is continually enriched (<xref rid="R21" ref-type="bibr">Landi et al., 2020</xref>; <xref rid="R49" ref-type="bibr">Wittenburg, 2021</xref>). As the infrastructure for data openness and sharing is progressively established and refined (<xref rid="R4" ref-type="bibr">Benhamed et al., 2023</xref>; Greenberg, McClellan, et al., 2023), the issue of how to incentivize researchers&#x2019; willingness to open their data has become a new challenge for scientific data sharing in an open environment <bold>(</bold><xref rid="R11" ref-type="bibr">Faniel &#x0026; Jacobsen, 2010</xref><bold>;</bold> <xref rid="R42" ref-type="bibr">Tenopir et al., 2011</xref><bold>;</bold> <xref rid="R44" ref-type="bibr">Treadway et al., 2016</xref><bold>)</bold>.</p>
<p>In recent years, it has been recognized that appropriate and meaningful incentives are essential to capitalize on the promise of data sharing (<xref rid="R25" ref-type="bibr">Lo &#x0026; DeMets, 2016</xref>) and that crediting data generators is key in this effort (<xref rid="R20" ref-type="bibr">Kalager et al., 2016</xref>). As stated by the International Council for Science (ICSU), <italic>&#x2018;Scientists should be recognized and given credit for the scientific contribution of the data sets that they produce as well as for the analysis of those data&#x2019;</italic> (ICSU). Consequently, Barbara E. et al. proposed the designation of <italic>&#x2018;data authors&#x2019;</italic> as an incentive for data sharing, with explicit identification in publications (<xref rid="R5" ref-type="bibr">Bierer et al., 2017</xref>). Notably, more than a decade ago, Jillian C.&#x2019;s exploratory study found that for many participants, the term <italic>&#x2018;author&#x2019;</italic> in the context of data was not fitting, raising the question: according to scientific researchers, is data something that can be authored (<xref rid="R47" ref-type="bibr">Wallis &#x0026; Borgman, 2011</xref>)? We believe that this shift in perception is related to the development of data publication and the rise of data papers over the past decade. In the dissemination of data papers, in order to be cited as a data author, a person must have made substantial contributions to the original acquisition, quality control, and curation of the data, be accountable for all aspects of the accuracy and integrity of the data provided, and ensure that the available data set follows FAIR Guiding Principles (<xref rid="R5" ref-type="bibr">Bierer et al., 2017</xref>). However, accurately defining data responsibilities to clarify the identity of data authors is a current challenge.</p>
<p>In data-intensive scientific research activities, establishing the identity of data authors requires that their contributions to the data are quantifiable and evaluable. With the proliferation of author contribution statements, various contributor role ontologies and taxonomies (CROTs) have emerged (Hosseini, Colomb, et al., 2023), offering standardized lists of roles or terms to designate individuals&#x2019; contributions to research. Among these, CRediT (NISO), as a standardized method for describing author contributions, has been vigorously promoted by numerous journal publishers and widely adopted since its introduction in 2014. It is also gradually being applied to data papers to describe the contributions of authors in the production and publication process of scientific data.</p>
<p>In this research, we employ CRediT to examine the division of scientific labor in a sample of data papers from <italic>Data in Brief</italic>, exploring how research contributions are allocated in data-intensive scientific activities. More specifically, we first explore the intercorrelations among the 14 CRediT roles to assess the utilization of CRediT in data papers. We also consider the relationship between the division of scientific labor and the size of the author team, as well as the discipline of the research. Finally, we investigate the correlation between key authors in data papers, such as corresponding authors and first authors, and the CRediT roles they undertake.</p>
</sec>
<sec id="sec2">
<title>Related work</title>
<sec id="sec2_1">
<title>Invisible labor in data-intensive science</title>
<p>As the fourth paradigm (<xref rid="R28" ref-type="bibr">Nielsen, 2009</xref>; <xref rid="R43" ref-type="bibr">Tolle et al., 2011</xref>), data-intensive science has been widely discussed in the academic community. Ramachandran et al. regard data-intensive science as a scientific discovery process that is driven by knowledge extracted from large volumes of data rather than the traditional hypothesis-driven discovery process, and they introduce the concept of <italic>&#x2018;data prospecting&#x2019;</italic> to address the challenges of data-intensive science (<xref rid="R33" ref-type="bibr">Ramachandran et al., 2013</xref>). Data prospecting requires more interdisciplinary collaboration, Cheruvelil and Soranno argue that data-intensive science will be most successful when used in combination with open science and team science (<xref rid="R7" ref-type="bibr">Cheruvelil &#x0026; Soranno, 2018</xref>).</p>
<p>The division of labor in data-intensive science diverges significantly from traditional research models (<xref rid="R31" ref-type="bibr">Pietsch, 2015</xref>), primarily due to the sheer scale and complexity of data management and analysis. Unlike research activities that may be more individualistic or limited in scope, data- intensive science often requires collaborative efforts across multiple disciplines and relies heavily on technological infrastructures (<xref rid="R24" ref-type="bibr">Lenhardt et al., 2016</xref>; <xref rid="R36" ref-type="bibr">Schultes et al., 2022</xref>). This collaborative aspect introduces a new dimension to the scientific process, where the labor is not just intellectual, but also deeply intertwined with the technical and logistical support systems that enable data collection, processing, and interpretation at massive scales.</p>
<p>Scroggins et al. apply the concept of invisible labor to data-intensive science (<xref rid="R38" ref-type="bibr">Scroggins &#x0026; Pasquetto, 2020</xref>). Drawing on a fifteen-year corpus of research into multiple domains of data- intensive science, they used a series of ethnographic vignettes to offer a snapshot of the varieties and valences of labor in data-intensive science. They finally pointed out that a full and nuanced understanding of data-intensive science can only be obtained by starting with the in-situ work and labor of scientific practice in all its manifold forms. Their work underscores a critical perspective: that the comprehensive grasp of data-intensive science is not merely about acknowledging the presence of invisible labor, but also about deciphering its unique characteristics when compared to other scientific endeavours.</p>
<p>Moreover, the importance of invisible labor in data-intensive science cannot be overstated. It is the often-uncredited work of data cleaning, metadata creation, and algorithm development that forms the backbone of robust scientific inquiry (<xref rid="R34" ref-type="bibr">Resnik et al., 2017</xref>; <xref rid="R39" ref-type="bibr">Shamoo, 2013</xref>). These tasks, while critical, are frequently overshadowed by the final published research outputs, which tend to be the metrics by which scientific success is commonly judged (<xref rid="R8" ref-type="bibr">Dance, 2012</xref>). The labor behind data maintenance, ensuring the integrity and accessibility of datasets, is equally vital, yet it remains largely invisible to those outside the immediate circle of data-intensive scientific practice.</p>
<p>In contemporary scientific research, data has become the central element propelling innovation and discovery. With the rapid evolution of big data technologies, data-intensive science has emerged as a prominent field, bringing revolutionary changes to various academic disciplines. This research methodology relies on the use of extensive datasets to drive experimentation, simulations, and analyses, simpler models with a lot of data supposedly trump more elaborate models with less data (<xref rid="R15" ref-type="bibr">Halevy et al., 2009</xref>), thereby enabling the acquisition of profound insights and knowledge. However, there is a paucity of research on data-intensive scientific activities. Approaching the understanding of data-intensive scientific activities from the perspective of division of labor and collaboration can better reveal the patterns of the fourth paradigm.</p>
</sec>
<sec id="sec2_2">
<title>Evolution and adoption of CRediT</title>
<p>As the international community places increasing emphasis on the construction of research integrity, there is a growing call in the journal industry for the use of CRediT (Contribution Roles Taxonomy) to facilitate more transparent and granular descriptions of author contributions (Das &#x0026; <xref rid="R9" ref-type="bibr">Das, 2020</xref>; <xref rid="R32" ref-type="bibr">Rahman &#x0026; Verhagen, 2023</xref>; <xref rid="R45" ref-type="bibr">Udey, 2018</xref>). However, the forms and content of scientific research activities are evolving rapidly, the current taxonomy may need to evolve as science and the types of contributions that may become less or more important change (<xref rid="R1" ref-type="bibr">Allen et al., 2019</xref>). Therefore, for CORTs (Contributor Role Taxonomies), maintaining an up-to-date list of roles is essential to meet the evolving needs of users and is one of the factors that promote the adoption of CROTs (<xref rid="R46" ref-type="bibr">Vasilevsky et al., 2021</xref>). As emphasized by its developers, CRediT was initially designed as a contributor role taxonomy for life and physical sciences, and thus may not be suitable for all disciplines(<xref rid="R1" ref-type="bibr">Allen et al., 2019</xref>), more specific roles should be added to the CRediT, and a more granular lexicon of contribution elements should be established. (<xref rid="R41" ref-type="bibr">Steele et al., 2021</xref>) &#x3002;</p>
<p>Holcombe compared the author roles reflected by CRediT with the authorship criteria provided by the ICMJE (International Committee of Medical Journal Editors), elucidating the significance of the CRediT role taxonomy (<xref rid="R16" ref-type="bibr">Holcombe, 2019</xref>). Some scholars have discussed the boundaries of CRediT usage and proposed modifications. For instance, Alpi and Akers noted that CRediT does not currently describe all the roles played by librarians (<xref rid="R3" ref-type="bibr">Alpi &#x0026; Akers, 2021</xref>). Larivi&#x00E8;re et al. employed the CRediT, combined with data from PLOS journals, to study the division of scientific labor from aspects such as author order, gender, and contribution combinations, highlighting the need for increased attention to labor division across different disciplines and research teams (<xref rid="R23" ref-type="bibr">Lariviere et al., 2021</xref>). Ding et al. proposed a new method for co-author credit allocation based on CRediT, demonstrating through empirical analysis that this approach can effectively prevent credit inflation and reasonably reflect author contributions, particularly mitigating the impact of the number of co-authors on the first author&#x2019;s credit (<xref rid="R10" ref-type="bibr">Ding et al., 2021</xref>).</p>
<p>Numerous scholars have taken CRediT as a foundation and, in conjunction with domain research or specific scenarios, have empirically conducted certain additions, deletions, and modifications to enhance its applicability. Some studies have pointed out the deficiencies of CRediT in the field of Randomized Controlled Trials (RCTs), with some attributing these to improper use by authors, such as not distinguishing between manuscript editing and drafting the initial manuscript (<xref rid="R41" ref-type="bibr">Steele et al., 2021</xref>). Others have identified inherent design flaws in CRediT, leading to the proposal of CRediT-RCT for researchers in this field (<xref rid="R29" ref-type="bibr">Zhang et al., 2019</xref>). Matarese and Shashok argued that CRediT overlooks some non-author contributions, including technical support, translation, and manuscript editing (<xref rid="R27" ref-type="bibr">Matarese &#x0026; Shashok, 2019</xref>). In response, they proposed specific recommendations to improve three roles within CRediT (Investigation; Writing - Original Draft; Writing - Review &#x0026; Editing) and to add two new roles (Technical support; Translating or editing the manuscript, as non-author). Alliez et al. emphasized the importance of software development, proposing nine more refined categories to better represent the tasks involved (design, debugging, maintenance, coding, architecture, documentation, testing, support, and management) (<xref rid="R2" ref-type="bibr">Alliez et al., 2020</xref>). Fitzgerald et al. summarized the shortcomings of the CRediT in the social sciences and proposed improvements, suggesting 12 librarian author role classifications (literature synthesis, conceptualization, methodology, instruments, software, investigation, data curation, data analysis, interpretation, visualization, writing, editing) (<xref rid="R12" ref-type="bibr">Fitzgerald et al., 2020</xref>).</p>
<p>In summary, existing research provides empirical analyses of CRediT across various disciplines and application scenarios, along with recommendations for improvement. The data paper, as an emerging form of academic communication, represents a unique type of scholarly output generated from data-intensive research activities, and the description and attribution of author contributions within this context also warrant attention.</p>
</sec>
</sec>
<sec id="sec3">
<title>Dataset and methods</title>
<sec id="sec3_1">
<title>Data source</title>
<p>We selected data papers published in the Data in Brief journal as our sample data source. The reason for choosing this particular data source is that Data in Brief is a purely data-focused interdisciplinary journal where all papers published are descriptions of datasets and serve as records of data-intensive scientific research activities. Additionally, Data in Brief utilizes CRediT to describe authors&#x2019; contributions, which provides us with a consistent data source for analysis.</p>
<p>We utilized the web scraping tool Web Scraper (Scraper) to crawl and collect information related to data papers of <italic>Data in Brief</italic>. The collected data included the disciplinary fields of the articles, titles, DOIs, author names, author order, corresponding authors, and CRediT Author Statements, from 10 volumes (V41-V50) of <italic>Data in Brief</italic> published between 2022 and 2023, totalling 1,724 articles. After excluding corrigendum, articles without author contribution statements, and those not fully utilizing the CRediT format, we ultimately included 1,513 data papers for analysis, which comprised 7,697 CRediT contribution entries.</p>
</sec>
<sec id="sec3_2">
<title>Research route</title>
<p>Scholars have conducted extensive research on author task division using CRediT (<xref rid="R23" ref-type="bibr">Lariviere et al., 2021</xref>; <xref rid="R29" ref-type="bibr">Zhang et al., 2019</xref>). Building upon their research approaches, we have formulated the research route for this paper, as detailed below:</p>
<list list-type="order">
<list-item><p>We initially present some basic information about the dataset, then employ Kendall&#x2019;s coefficient to examine the correlation between CRediT roles and conduct a statistical analysis of the symmetry in the roles undertaken by authors.</p></list-item>
<list-item><p>Then, we investigate the general division of labor characteristics of data-intensive scientific research activities from the perspective of the CRediT roles involved in data papers. Firstly, the usage of CRediT roles, includes the number of CRediT roles used in data papers and the frequency of use for each role. Second, we examined the variation in CRediT roles with the number of authors per article, ranging from single-authored works to collaborative papers with up to 16 authors. Third, we designed the CRediT Concentration Index (CCI) and utilized the Gini coefficient to investigate the preference for CRediT roles across different disciplinary fields. The formula for calculating CCI is as follows:
<disp-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mi>C</mml:mi><mml:mi>C</mml:mi><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>max</mml:mi><mml:mo>&#x005F;</mml:mo><mml:mtext>role</mml:mtext></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mtext>role</mml:mtext></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula></p>
<p>U<sub>max_role</sub> represents the number of times the most frequently used role appears in a particular academic discipline, and U<sub>role</sub> represents the usage count of each role within that discipline. It should be noted that to prevent excessive bias in the data, the calculation of CCI excludes the relevant data for the two writing roles, <italic>&#x2018;Writing &#x2013; original draft&#x2019;</italic> and <italic>&#x2018;Writing &#x2013; review and editing&#x2019;.</italic></p></list-item>
<list-item><p>Finally, we examine the detailed division of labor characteristics of data-intensive scientific research activities by considering the CRediT roles undertaken by key authors. Descriptive statistics were conducted based on whether an author was the corresponding author and whether they were the first author (without considering joint first authorship), and the chi-square test was used to examine if the differences between groups were significant. The number of roles undertaken by each author was treated as skewed data, and the Wilcoxon rank-sum test was employed to assess the significance of the differences between groups. All 14 CRediT roles were coded as binary variables. A logistic regression model was employed to examine the influence of individual roles on the status of the corresponding author and the first author. Due to the right- skewed distribution of the data, a generalized linear model with a Gamma distribution and a log link function was constructed to assess the impact of individual roles on the author order status.</p></list-item></list>
</sec>
</sec>
<sec id="sec4">
<title>Results</title>
<sec id="sec4_1">
<title>Overview</title>
<sec id="sec4_1_1">
<title>Demography of the dataset</title>
<p>As shown in Table 1, the dataset of data papers utilized in this study spans 24 disciplines, including Agricultural Sciences and Arts and Humanities, as designated by Data in Brief. The number of data papers varies across each discipline, with the highest count in Biological Sciences, totalling 205 papers, and the lowest in Mathematics, with only 3 papers. There is also a significant variation in the average number of authors per paper, ranging from 1.5 to 7.3 authors per paper. However, the average number of CRediT roles per paper and per author across different disciplines is relatively stable, with most papers in the dataset involving an average of 6 to 10 CRediT roles per paper, and each author undertaking 3 to 5 CRediT roles.</p>
<table-wrap id="T1">
<label>Table 1.</label>
<caption><p>Characteristics of CRediT roles from data papers in different disciplines</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top"><bold>Disciplines</bold></th>
<th align="center" valign="top"><bold>No. papers</bold></th>
<th align="center" valign="top"><bold>Mean No. authors per paper</bold></th>
<th align="center" valign="top"><bold>Mean No. roles per paper</bold></th>
<th align="center" valign="top"><bold>Mean No. roles per author</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top"><bold>Agricultural Sciences</bold></td>
<td align="center" valign="top">164</td>
<td align="center" valign="top">5.0</td>
<td align="center" valign="top">8.5</td>
<td align="center" valign="top">3.3</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Arts and Humanities</bold></td>
<td align="center" valign="top">17</td>
<td align="center" valign="top">3.5</td>
<td align="center" valign="top">7.9</td>
<td align="center" valign="top">4.0</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Biological Sciences</bold></td>
<td align="center" valign="top">205</td>
<td align="center" valign="top">5.6</td>
<td align="center" valign="top">9.1</td>
<td align="center" valign="top">3.4</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Business, Management, and Decision Sciences</bold></td>
<td align="center" valign="top">64</td>
<td align="center" valign="top">3.3</td>
<td align="center" valign="top">8.5</td>
<td align="center" valign="top">4.1</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Chemistry</bold></td>
<td align="center" valign="top">58</td>
<td align="center" valign="top">5.2</td>
<td align="center" valign="top">8.9</td>
<td align="center" valign="top">3.1</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Computer Science</bold></td>
<td align="center" valign="top">166</td>
<td align="center" valign="top">4.5</td>
<td align="center" valign="top">9.3</td>
<td align="center" valign="top">3.7</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Data Science</bold></td>
<td align="center" valign="top">105</td>
<td align="center" valign="top">4.5</td>
<td align="center" valign="top">9.1</td>
<td align="center" valign="top">3.8</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Earth and Planetary Sciences</bold></td>
<td align="center" valign="top">90</td>
<td align="center" valign="top">6.0</td>
<td align="center" valign="top">9.2</td>
<td align="center" valign="top">3.5</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Economics, Econometrics and Finance</bold></td>
<td align="center" valign="top">34</td>
<td align="center" valign="top">2.9</td>
<td align="center" valign="top">8.7</td>
<td align="center" valign="top">4.9</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Energy</bold></td>
<td align="center" valign="top">43</td>
<td align="center" valign="top">5.6</td>
<td align="center" valign="top">9.2</td>
<td align="center" valign="top">3.4</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Engineering</bold></td>
<td align="center" valign="top">108</td>
<td align="center" valign="top">4.5</td>
<td align="center" valign="top">9.0</td>
<td align="center" valign="top">4.0</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Environmental Science</bold></td>
<td align="center" valign="top">100</td>
<td align="center" valign="top">5.5</td>
<td align="center" valign="top">8.8</td>
<td align="center" valign="top">3.3</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Food Science</bold></td>
<td align="center" valign="top">10</td>
<td align="center" valign="top">6.4</td>
<td align="center" valign="top">10.7</td>
<td align="center" valign="top">3.7</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Health and Medical Sciences</bold></td>
<td align="center" valign="top">103</td>
<td align="center" valign="top">7.3</td>
<td align="center" valign="top">9.2</td>
<td align="center" valign="top">3.4</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Materials Science</bold></td>
<td align="center" valign="top">56</td>
<td align="center" valign="top">4.6</td>
<td align="center" valign="top">9.2</td>
<td align="center" valign="top">3.5</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Mathematics</bold></td>
<td align="center" valign="top">3</td>
<td align="center" valign="top">2.3</td>
<td align="center" valign="top">11.7</td>
<td align="center" valign="top">7.3</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Microbiology</bold></td>
<td align="center" valign="top">25</td>
<td align="center" valign="top">5.8</td>
<td align="center" valign="top">8.9</td>
<td align="center" valign="top">3.1</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Neuroscience</bold></td>
<td align="center" valign="top">20</td>
<td align="center" valign="top">6.4</td>
<td align="center" valign="top">9.8</td>
<td align="center" valign="top">3.2</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Omics</bold></td>
<td align="center" valign="top">34</td>
<td align="center" valign="top">5.1</td>
<td align="center" valign="top">8.6</td>
<td align="center" valign="top">3.3</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Pharmaceutical Sciences</bold></td>
<td align="center" valign="top">6</td>
<td align="center" valign="top">7.2</td>
<td align="center" valign="top">7.5</td>
<td align="center" valign="top">2.4</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Physical sciences</bold></td>
<td align="center" valign="top">4</td>
<td align="center" valign="top">1.5</td>
<td align="center" valign="top">6.3</td>
<td align="center" valign="top">4.5</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Plant Science</bold></td>
<td align="center" valign="top">10</td>
<td align="center" valign="top">5.8</td>
<td align="center" valign="top">8.9</td>
<td align="center" valign="top">2.9</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Psychology</bold></td>
<td align="center" valign="top">30</td>
<td align="center" valign="top">4.3</td>
<td align="center" valign="top">8.1</td>
<td align="center" valign="top">3.8</td>
</tr>
<tr>
<td align="center" valign="top"><bold>Social Sciences</bold></td>
<td align="center" valign="top">57</td>
<td align="center" valign="top">4.4</td>
<td align="center" valign="top">8.0</td>
<td align="center" valign="top">3.5</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="sec4_1_2">
<title>Correlation between CRediT roles</title>
<p><xref ref-type="fig" rid="F1">Figure 1</xref> illustrates the correlations between various CRediT roles. Notably, the strongest correlation is observed between <italic>&#x2018;methodology&#x2019;</italic> and <italic>&#x2018;conceptualization&#x2019;</italic> (correlation coefficient = 0.39), followed by the correlation between <italic>&#x2018;funding acquisition&#x2019;</italic> and <italic>&#x2018;project administration&#x2019;</italic> (0.38), and the correlation between <italic>&#x2018;data curation&#x2019;</italic> and <italic>&#x2018;writing &#x2013; original draft&#x2019;</italic> (0.32).</p>
<p><xref ref-type="fig" rid="F2">Figure 2</xref> reflects the asymmetry in the associations between CRediT roles. More specifically, it shows the percentage of authors who have performed contribution A who have also performed contribution B (<xref rid="R23" ref-type="bibr">Lariviere et al., 2021</xref>). For instance, the figure shows that while 68.18% of authors who contributed to funding acquisition also reviewed and edited the manuscript, only 13.87% of authors who reviewed and edited the manuscript were involved in funding acquisition, representing the most asymmetric relationship. Following this is the asymmetry between theoretical and practical work, such as the observation that a significant portion of authors who undertook hands-on roles like <italic>&#x2018;formal analysis&#x2019; &#x2018;software&#x2019; &#x2018;validation&#x2019;</italic> and <italic>&#x2018;visualization&#x2019;</italic> also took on theoretical roles such as <italic>&#x2018;conceptualization&#x2019;</italic> and <italic>&#x2018;methodology&#x2019;</italic>. However, among the group of authors engaged in theoretical work, a relatively small fraction participated in the aforementioned practical tasks.</p>
<fig id="F1">
<label>Figure 1.</label>
<caption><p>Heatmap showing the pairwise correlation of author roles defined in the CRediT</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c19-fig1.jpg"><alt-text>none</alt-text></graphic>
</fig>
<fig id="F2">
<label>Figure 2.</label>
<caption><p>Percentage of authors who have performed contribution A who also have performed contribution</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c19-fig2.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
</sec>
<sec id="sec4_2">
<title>CRediT roles involved in data papers</title>
<sec id="sec4_2_1">
<title>Usage of CRediT roles</title>
<p><xref ref-type="fig" rid="F3">Figure 3(a)</xref> presents the distribution of the number of CRediT roles involved across all articles. It can be observed that the majority of articles utilize approximately 9 CRediT roles to describe the research work. <xref ref-type="fig" rid="F3">Figure 3(b)</xref> reflects the percentage of papers employing a particular CRediT role out of all data papers. It is evident that the usage of various roles in data papers is not uniform, the most frequently used roles are <italic>&#x2018;writing &#x2013; original draft&#x2019;, &#x2018;writing &#x2013; review &#x0026; editing&#x2019;,</italic> and <italic>&#x2018;conceptualization&#x2019;,</italic> all with a prevalence of over 90%. Following these are <italic>&#x2018;methodology&#x2019;,</italic> which is associated with 86.52% of the papers, and roles such as <italic>&#x2018;data curation&#x2019;, &#x2018;supervision&#x2019;,</italic> and <italic>&#x2018;investigation&#x2019;,</italic> which are included in 77.79%, 74.03%, and 70.32% of the data papers, respectively. It is also noticeable that <italic>&#x2018;project administration&#x2019;</italic> and <italic>&#x2018;resources&#x2019;</italic> are less commonly utilized in data papers.</p>
<fig id="F3">
<label>Figure 3.</label>
<caption><p>Statistical Overview of CRediT Role Usage</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c19-fig3.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
<sec id="sec4_2_2">
<title>The division of labor varies with the number of authors</title>
<p>The division of labor in scientific research varies with the number of authors involved. <xref ref-type="fig" rid="F4">Figure 4</xref> presents the percentage of authors who have performed a given task, for papers between 1 and 16 authors (N=7296 papers, 94.8% of the data set). We assume that the sole author of a paper undertakes all 14 CRediT roles, and as the number of authors increases, the distribution of different tasks among authors begins to shift. The changes can be broadly categorized into three types: one type of role is consistently carried out by a smaller proportion of the team (dashed line), such as <italic>&#x2018;project administration&#x2019;, &#x2018;funding acquisition&#x2019;</italic>, and <italic>&#x2018;resources&#x2019;,</italic> which are consistently undertaken by about one-fifth of the total team members; a second type of role is consistently undertaken by a larger proportion of team members (thick line), such as <italic>&#x2018;writing &#x2013; review and editing&#x2019;</italic> and <italic>&#x2018;investigation&#x2019;,</italic> which are consistently handled by about one-third to one-half of the authors; and a third type of role sees a significant decrease in the proportion of authors undertaking it as the number of authors increases (solid line), such as <italic>&#x2018;conceptualization&#x2019;, &#x2018;methodology&#x2019;,</italic> and <italic>&#x2018;writing &#x2013; original draft&#x2019;</italic>, which are initially undertaken by half or more of the authors when the number of authors is low, but drop to being handled by only about one-fifth of the authors when the team size exceeds 10.</p>
<p>It is noteworthy that as the size of the author team increases, particularly when it reaches more than 10 members, the proportion of authors assuming CRediT roles begins to fluctuate. However, roles associated with data collection or analysis tend to emerge with disproportionately high levels of involvement, such as <italic>&#x2018;validation&#x2019;</italic> and &#x2018;<italic>formal analysis&#x2019;</italic>. Notably, the role of <italic>&#x2018;investigation&#x2019;</italic> stands out; when the number of authors exceeds 11, the proportion of authors undertaking the <italic>&#x2018;investigation&#x2019;</italic> role surpasses that of roles like <italic>&#x2018;conceptualization&#x2019;, &#x2018;methodology&#x2019;,</italic> and <italic>&#x2018;data curation&#x2019;.</italic> Alongside <italic>&#x2018;writing &#x2013; review and editing&#x2019;, &#x2018;investigation&#x2019;</italic> becomes one of the roles with the highest proportion of authors involved.</p>
<fig id="F4">
<label>Figure 4.</label>
<caption><p>Statistical overview of CRediT role variation with the number of authors</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c19-fig4.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
<sec id="sec4_2_3">
<title>CRediT role concentration across different disciplines</title>
<p>Disciplinary differences exist in the concentration of CRediT roles, and <xref ref-type="fig" rid="F5">Figure 5</xref> reflects the concentration of CRediT roles involved in articles from the top 13 most represented fields in the sample dataset (N=6767, 87.9%).</p>
<p>Overall, the concentration of CRediT roles in data papers across various academic disciplines fluctuates between 10% and 20%. The highest concentration is observed in the social sciences field, at 19.18%, while the lowest is in the computer science field, at 13.44%. Relatively speaking, social science fields, such as social sciences, business, management, and decision sciences, exhibit significantly higher CRediT role concentration compared to some natural science fields, like biological sciences and energy. The Gini coefficient, a measure of inequality, further highlights the disparities in CRediT role distribution across disciplines. Higher values of the Gini coefficient indicate greater inequality in role distribution, which aligns with the observed higher concentration in social science fields (0.31). This metric underscores the need for a more balanced distribution of CRediT roles across all academic disciplines.</p>
<p>Furthermore, the specific CRediT roles that authors concentrate on vary by discipline. Upon closer examination, with the exception of the <italic>&#x2018;writing &#x2013; review and editing&#x2019;</italic> role, authors in social science fields predominantly focus on the <italic>&#x2018;conceptualization&#x2019;</italic> role, whereas authors in natural science fields are more inclined toward the <italic>&#x2018;investigation&#x2019;</italic> and <italic>&#x2018;methodology&#x2019;</italic> roles (<xref ref-type="fig" rid="F6">Figure 6</xref>).</p>
<fig id="F5">
<label>Figure 5.</label>
<caption><p>CRediT Concentration Index (CCI) and Gini coefficient across different disciplines</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c19-fig5.jpg"><alt-text>none</alt-text></graphic>
</fig>
<fig id="F6">
<label>Figure 6.</label>
<caption><p>Percentage of authors per CRediT Role across different disciplines</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c19-fig6.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
</sec>
<sec id="sec4_3">
<title>CRediT roles undertaken by key authors</title>
<sec id="sec4_3_1">
<title>Factors associated with the corresponding author</title>
<p>A statistical analysis of 7967 author contributions across 1513 articles revealed that (Table 2), in general, apart from writing roles, authors were most involved in <italic>&#x2018;methodology&#x2019;</italic> (38.9%) and <italic>&#x2018;conceptualization&#x2019;</italic> (37.8%), while a smaller proportion of authors participated in "Project administration" (10.6%). The median number of roles per author was 3 (IQR: 2-5).</p>
<p>For corresponding authors, the proportion of those assuming the <italic>&#x2018;conceptualization&#x2019;</italic> role is significantly higher compared to other roles, accounting for 65.0%, and they are more likely to take on any of the 14 CRediT roles relative to non-corresponding authors. It is noteworthy that the majority of corresponding authors are also first authors, with 899 out of 1958 (45.9%). Corresponding authors take on a greater number of roles than non-corresponding authors [5 (3-6) vs. 3 (2-4); P&#x003C;0.001], and they are ranked higher in author order [2 (1-4) vs. 4 (2-6); P &#x003C; 0.001]. Additionally, there is no significant difference in involvement in the <italic>&#x2018;investigation&#x2019;</italic> and <italic>&#x2018;resources&#x2019;</italic> roles based on whether an author is a corresponding author or not (P&#x003E;0.1).</p>
<table-wrap id="T2">
<label>Table 2.</label>
<caption><p>Comparison of CRediT roles between corresponding author and other co-authors</p></caption>
<table>
<thead>
<tr>
<th align="left" valign="top"><bold>Variables</bold></th>
<th align="left" valign="top"><bold>Overall ( n=7697 )</bold></th>
<th align="left" valign="top"><bold>Non-corresponding author (n=5739)</bold></th>
<th align="left" valign="top"><bold>Corresponding author (n=1958)</bold></th>
<th align="left" valign="top"><bold>P</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Conceptualization, n (%)</td>
<td align="left" valign="top">2906 (37.8)</td>
<td align="left" valign="top">1633 (28.5)</td>
<td align="left" valign="top">1273 (65.0)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Data_curation, n (%)</td>
<td align="left" valign="top">2400 (31.2)</td>
<td align="left" valign="top">1623 (28.3)</td>
<td align="left" valign="top">777 (39.7)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Formal_Analysis, n (%)</td>
<td align="left" valign="top">1403 (18.2)</td>
<td align="left" valign="top">917 (16.0)</td>
<td align="left" valign="top">486 (24.8)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Funding_acquisition, n (%)</td>
<td align="left" valign="top">942 (12.2)</td>
<td align="left" valign="top">531 (9.3)</td>
<td align="left" valign="top">411 (21.0)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Investigation, n (%)</td>
<td align="left" valign="top">2633 (34.2)</td>
<td align="left" valign="top">1937 (33.8)</td>
<td align="left" valign="top">696 (35.5)</td>
<td align="left" valign="top">0.156</td>
</tr>
<tr>
<td align="left" valign="top">Methodology, n (%)</td>
<td align="left" valign="top">2994 (38.9)</td>
<td align="left" valign="top">1932 (33.7)</td>
<td align="left" valign="top">1062 (54.2)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Project_administration, n (%)</td>
<td align="left" valign="top">814 (10.6)</td>
<td align="left" valign="top">479 (8.3)</td>
<td align="left" valign="top">335 (17.1)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Resources, n (%)</td>
<td align="left" valign="top">1046 (13.6)</td>
<td align="left" valign="top">762 (13.3)</td>
<td align="left" valign="top">284 (14.5)</td>
<td align="left" valign="top">0.184</td>
</tr>
<tr>
<td align="left" valign="top">Software, n (%)</td>
<td align="left" valign="top">1222 (15.9)</td>
<td align="left" valign="top">799 (13.9)</td>
<td align="left" valign="top">423 (21.6)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Supervision, n (%)</td>
<td align="left" valign="top">1898 (24.7)</td>
<td align="left" valign="top">1199 (20.9)</td>
<td align="left" valign="top">699 (35.7)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Validation, n (%)</td>
<td align="left" valign="top">1323 (17.2)</td>
<td align="left" valign="top">909 (15.8)</td>
<td align="left" valign="top">414 (21.1)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Visualization, n (%)</td>
<td align="left" valign="top">1199 (15.6)</td>
<td align="left" valign="top">720 (12.5)</td>
<td align="left" valign="top">479 (24.5)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Writing-original draft, n (%)</td>
<td align="left" valign="top">2216 (28.8)</td>
<td align="left" valign="top">1174 (20.5)</td>
<td align="left" valign="top">1042 (53.2)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Writing-review &#x0026; editing, n (%)</td>
<td align="left" valign="top">4001 (52.0)</td>
<td align="left" valign="top">2886 (50.3)</td>
<td align="left" valign="top">1115 (56.9)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">First_author, n (%)</td>
<td align="left" valign="top">1513 (19.7)</td>
<td align="left" valign="top">614 (10.7)</td>
<td align="left" valign="top">899 (45.9)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">roles_number, median (IQR)</td>
<td align="left" valign="top">3 (2, 5)</td>
<td align="left" valign="top">3 (2, 4)</td>
<td align="left" valign="top">5 (3, 6)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Author_order, median (IQR)</td>
<td align="left" valign="top">3 (2, 5)</td>
<td align="left" valign="top">4 (2, 6)</td>
<td align="left" valign="top">2 (1, 4)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>IQR, interquartile range.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>The logistic regression model indicates that authors who perform <italic>&#x2018;conceptualization&#x2019;</italic> are twice as likely to be corresponding authors (OR: 2.33; 95% CI: 2.04-2.67; P&#x003C;0.001). Furthermore, authors engaged in <italic>&#x2018;funding acquisition&#x2019;</italic> (OR: 1.91; 95% CI: 1.60-2.29; P&#x003C;0.001), <italic>&#x2018;supervision&#x2019;</italic> (OR: 2.26; 95% CI: 1.95-2.61; P&#x003C;0.001), <italic>&#x2018;writing - original draft&#x2019;</italic> (OR: 2.37; 95% CI: 2.03-2.77; P&#x003C;0.001), and those serving as first authors (OR: 4.65; 95% CI: 3.94-5.50; P&#x003C;0.001) are more likely to be corresponding authors (Table 3).</p>
<table-wrap id="T3">
<label>Table 3.</label>
<caption><p>Logistic regression model investigating the factors associated with the corresponding author role</p></caption>
<table>
<thead>
<tr>
<th align="left" valign="top"></th>
<th align="left" valign="top"><bold>OR (95% CI)</bold></th>
<th align="left" valign="top"><bold>P</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Conceptualization</td>
<td align="left" valign="top">2.33 (2.04, 2.67)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Data_curation</td>
<td align="left" valign="top">1.12 (0.97, 1.29)</td>
<td align="left" valign="top">0.115</td>
</tr>
<tr>
<td align="left" valign="top">Formal_Analysis</td>
<td align="left" valign="top">1.00 (0.85, 1.17)</td>
<td align="left" valign="top">1</td>
</tr>
<tr>
<td align="left" valign="top">Funding_acquisition</td>
<td align="left" valign="top">1.91 (1.60, 2.29)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Investigation</td>
<td align="left" valign="top">0.75 (0.66, 0.86)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Methodology</td>
<td align="left" valign="top">0.95 (0.83, 1.09)</td>
<td align="left" valign="top">0.461</td>
</tr>
<tr>
<td align="left" valign="top">Project_administration</td>
<td align="left" valign="top">1.19 (0.98, 1.45)</td>
<td align="left" valign="top">0.078</td>
</tr>
<tr>
<td align="left" valign="top">Resources</td>
<td align="left" valign="top">1.05 (0.87, 1.26)</td>
<td align="left" valign="top">0.599</td>
</tr>
<tr>
<td align="left" valign="top">Software</td>
<td align="left" valign="top">1.04 (0.87, 1.23)</td>
<td align="left" valign="top">0.68</td>
</tr>
<tr>
<td align="left" valign="top">Supervision</td>
<td align="left" valign="top">2.26 (1.95, 2.61)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Validation</td>
<td align="left" valign="top">1.02 (0.87, 1.20)</td>
<td align="left" valign="top">0.819</td>
</tr>
<tr>
<td align="left" valign="top">Visualization</td>
<td align="left" valign="top">1.23 (1.03, 1.45)</td>
<td align="left" valign="top">0.018</td>
</tr>
<tr>
<td align="left" valign="top">Writing-original draft, n (%)</td>
<td align="left" valign="top">2.37 (2.03, 2.77)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Writing-review &#x0026; editing, n (%)</td>
<td align="left" valign="top">1.37 (1.21, 1.56)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">First_author</td>
<td align="left" valign="top">4.65 (3.94, 5.50)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>OR odds ratio; CI, confidence interval.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="sec4_3_2">
<title>Factors associated with the first author</title>
<p>Due to a significant overlap between the first authors and corresponding authors, the roles undertaken by the first authors are similar to those of the corresponding authors. However, there are subtle differences. There are no significant differences between first authors and non-first authors in the roles of <italic>&#x2018;funding acquisition&#x2019;</italic> and <italic>&#x2018;project administration&#x2019;</italic> (Table 4).</p>
<table-wrap id="T4">
<label>Table 4.</label>
<caption><p>Comparison of CRediT roles between the first author and other co-authors</p></caption>
<table>
<thead>
<tr>
<th align="left" valign="top"><bold>Variables</bold></th>
<th align="left" valign="top"><bold>Overall ( n=7697 )</bold></th>
<th align="left" valign="top"><bold>Non-first author (n=6184)</bold></th>
<th align="left" valign="top"><bold>First author (n=1513)</bold></th>
<th align="left" valign="top"><bold>P</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Conceptualization, n (%)</td>
<td align="left" valign="top">2906 (37.8)</td>
<td align="left" valign="top">1910 (30.9)</td>
<td align="left" valign="top">996 (65.8)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Data_curation, n (%)</td>
<td align="left" valign="top">2400 (31.2)</td>
<td align="left" valign="top">1541 (24.9)</td>
<td align="left" valign="top">859 (56.8)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Formal_Analysis, n (%)</td>
<td align="left" valign="top">1403 (18.2)</td>
<td align="left" valign="top">839 (13.6)</td>
<td align="left" valign="top">564 (37.3)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Funding_acquisition, n (%)</td>
<td align="left" valign="top">942 (12.2)</td>
<td align="left" valign="top">795 (12.9)</td>
<td align="left" valign="top">147 (9.7)</td>
<td align="left" valign="top">0.001</td>
</tr>
<tr>
<td align="left" valign="top">Investigation, n (%)</td>
<td align="left" valign="top">2633 (34.2)</td>
<td align="left" valign="top">1863 (30.1)</td>
<td align="left" valign="top">770 (50.9)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Methodology, n (%)</td>
<td align="left" valign="top">2994 (38.9)</td>
<td align="left" valign="top">1949 (31.5)</td>
<td align="left" valign="top">1045 (69.1)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Project_administration, n (%)</td>
<td align="left" valign="top">814 (10.6)</td>
<td align="left" valign="top">660 (10.7)</td>
<td align="left" valign="top">154 (10.2)</td>
<td align="left" valign="top">0.607</td>
</tr>
<tr>
<td align="left" valign="top">Resources, n (%)</td>
<td align="left" valign="top">1046 (13.6)</td>
<td align="left" valign="top">913 (14.8)</td>
<td align="left" valign="top">133 (8.8)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Software, n (%)</td>
<td align="left" valign="top">1222 (15.9)</td>
<td align="left" valign="top">728 (11.8)</td>
<td align="left" valign="top">494 (32.7)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Supervision, n (%)</td>
<td align="left" valign="top">1898 (24.7)</td>
<td align="left" valign="top">1686 (27.3)</td>
<td align="left" valign="top">212 (14.0)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Validation, n (%)</td>
<td align="left" valign="top">1323 (17.2)</td>
<td align="left" valign="top">968 (15.7)</td>
<td align="left" valign="top">355 (23.5)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Visualization, n (%)</td>
<td align="left" valign="top">1199 (15.6)</td>
<td align="left" valign="top">658 (10.6)</td>
<td align="left" valign="top">541 (35.8)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Writing-original draft, n (%)</td>
<td align="left" valign="top">2216 (28.8)</td>
<td align="left" valign="top">980 (15.8)</td>
<td align="left" valign="top">1236 (81.7)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Writing-review &#x0026; editing, n (%)</td>
<td align="left" valign="top">4001 (52.0)</td>
<td align="left" valign="top">3379 (54.6)</td>
<td align="left" valign="top">622 (41.1)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Author_corresponding, n (%)</td>
<td align="left" valign="top">1958 (25.4)</td>
<td align="left" valign="top">1059 (17.1)</td>
<td align="left" valign="top">899 (59.4)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">roles_number, median (IQR)</td>
<td align="left" valign="top">3 (2, 5)</td>
<td align="left" valign="top">3 (2, 4)</td>
<td align="left" valign="top">5 (4, 7)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>IQR, interquartile range.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>The logistic regression model indicates a significant correlation between the role of <italic>&#x2018;writing - original draft&#x2019;</italic> and being designated as the first author (OR: 11.23; 95% CI: 9.56-13.23; P&#x003C;0.001). However, first authors are seldom involved in <italic>&#x2018;writing - review and editing&#x2019;</italic> (OR: 0.71; 95% CI: 0.61-0.84; P&#x003C;0.001), <italic>&#x2018;supervision&#x2019;</italic> (OR: 0.53; 95% CI: 0.42-0.66; P&#x003C;0.001), or <italic>&#x2018;resources&#x2019;</italic> (OR: 0.57; 95% CI: 0.43-0.74; P&#x003C;0.001) (Table 5).</p>
<table-wrap id="T5">
<label>Table 5.</label>
<caption><p>Logistic regression model investigating the factors associated with the first author designation</p></caption>
<table>
<thead>
<tr>
<th align="left" valign="top"></th>
<th align="left" valign="top"><bold>OR (95% CI)</bold></th>
<th align="left" valign="top"><bold>P</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Conceptualization</td>
<td align="left" valign="top">3.50 (2.94, 4.17)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Data_curation</td>
<td align="left" valign="top">1.60 (1.36, 1.87)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Formal_Analysis</td>
<td align="left" valign="top">1.72 (1.44, 2.05)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Funding_acquisition</td>
<td align="left" valign="top">0.86 (0.65, 1.14)</td>
<td align="left" valign="top">0.304</td>
</tr>
<tr>
<td align="left" valign="top">Investigation</td>
<td align="left" valign="top">1.49 (1.27, 1.74)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Methodology</td>
<td align="left" valign="top">1.88 (1.59, 2.22)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Project_administration</td>
<td align="left" valign="top">0.89 (0.66, 1.19)</td>
<td align="left" valign="top">0.423</td>
</tr>
<tr>
<td align="left" valign="top">Resources</td>
<td align="left" valign="top">0.57 (0.43, 0.74)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Software</td>
<td align="left" valign="top">1.60 (1.33, 1.93)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Supervision</td>
<td align="left" valign="top">0.53 (0.42, 0.66)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Validation</td>
<td align="left" valign="top">0.87 (0.71, 1.06)</td>
<td align="left" valign="top">0.179</td>
</tr>
<tr>
<td align="left" valign="top">Visualization</td>
<td align="left" valign="top">1.79 (1.48, 2.15)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Writing-original draft, n (%)</td>
<td align="left" valign="top">11.23 (9.56, 13.23)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Writing-review &#x0026; editing, n (%)</td>
<td align="left" valign="top">0.71 (0.61, 0.84)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>OR odds ratio; CI, confidence interval.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="sec4_3_3">
<title>Factors associated with the author order</title>
<p><xref ref-type="fig" rid="F7">Figure 7</xref> revealed a significant correlation between the position of each author in an article and the number of roles they undertake (R&#x00B2;=0.073, P&#x003C;0.001). The results of the generalized linear model analysis (Table 6) indicate a significant correlation between <italic>&#x2018;conceptualization&#x2019;, &#x2018;data curation&#x2019;, &#x2018;software&#x2019;,</italic> and <italic>&#x2018;visualization&#x2019;</italic> with the order of authorship. For instance, authors who contributed to the <italic>&#x2018;software&#x2019;</italic> are likely to be ranked higher compared to those who did not participate in this role, with a coefficient of 0.28 (95% CI: 0.22-0.34).</p>
<fig id="F7">
<label>Figure 7.</label>
<caption><p>Scatter plot showing the correlation between the author order and the number of roles per author. There was a significant correlation (R<sup>2</sup> =0.073; P&#x003C;0.001) between the two variables.</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c19-fig7.jpg"><alt-text>none</alt-text></graphic>
</fig>
<table-wrap id="T6">
<label>Table 6.</label>
<caption><p>Generalized linear model investigating the factors associated with the author order</p></caption>
<table>
<thead>
<tr>
<th align="left" valign="top"></th>
<th align="left" valign="top"><bold>Coefficient (95% CI)</bold></th>
<th align="left" valign="top"><bold>P</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" valign="top">Conceptualization</td>
<td align="left" valign="top">-0.23 (-0.28, -0.18)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Data_curation</td>
<td align="left" valign="top">-0.15 (-0.19, -0.10)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Formal_Analysis</td>
<td align="left" valign="top">-0.10 (-0.16, -0.04)</td>
<td align="left" valign="top">&#x003C;0.05</td>
</tr>
<tr>
<td align="left" valign="top">Funding_acquisition</td>
<td align="left" valign="top">0.07 (-0.00, 0.14)</td>
<td align="left" valign="top">0.059</td>
</tr>
<tr>
<td align="left" valign="top">Investigation</td>
<td align="left" valign="top">-0.07 (-0.11, -0.03)</td>
<td align="left" valign="top">&#x003C;0.05</td>
</tr>
<tr>
<td align="left" valign="top">Methodology</td>
<td align="left" valign="top">-0.08 (-0.12, -0.03)</td>
<td align="left" valign="top">&#x003C;0.05</td>
</tr>
<tr>
<td align="left" valign="top">Project_administration</td>
<td align="left" valign="top">0.04 (-0.03, 0.12)</td>
<td align="left" valign="top">0.26</td>
</tr>
<tr>
<td align="left" valign="top">Resources</td>
<td align="left" valign="top">0.09 (0.03, 0.15)</td>
<td align="left" valign="top">&#x003C;0.05</td>
</tr>
<tr>
<td align="left" valign="top"><bold>Software</bold></td>
<td align="left" valign="top"><bold>-0.28 (-0.34, -0.22)</bold></td>
<td align="left" valign="top"><bold>&#x003C;0.001</bold></td>
</tr>
<tr>
<td align="left" valign="top">Supervision</td>
<td align="left" valign="top">0.04 (-0.01, 0.09)</td>
<td align="left" valign="top">0.142</td>
</tr>
<tr>
<td align="left" valign="top">Validation</td>
<td align="left" valign="top">-0.04 (-0.09, 0.02)</td>
<td align="left" valign="top">0.193</td>
</tr>
<tr>
<td align="left" valign="top">Visualization</td>
<td align="left" valign="top">-0.16 (-0.22, -0.10)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Writing-original draft</td>
<td align="left" valign="top">-0.59 (-0.64, -0.54)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
<tr>
<td align="left" valign="top">Writing-review &#x0026; editing</td>
<td align="left" valign="top">-0.10 (-0.15, -0.06)</td>
<td align="left" valign="top">&#x003C;0.001</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="sec5">
<title>Discussion</title>
<p>This study employs CRediT to explore the division of labor in data-intensive scientific research activities, represented by data papers. According to our result, the pattern of labor division in data articles is different from traditional research papers in many ways. For instance, in data papers, the core contribution of <italic>&#x2018;conceptualization&#x2019;</italic> is most strongly correlated with the contribution of <italic>&#x2018;methodology&#x2019;</italic>, while it is <italic>&#x2018;funding acquisition&#x2019;</italic> with research articles in the RCT field (<xref rid="R29" ref-type="bibr">Zhang et al., 2019</xref>). However, the roles that play a significant part in data paper collaborations show no significant difference from those in research articles: almost all articles involve roles such as <italic>&#x2018;conceptualization&#x2019;, &#x2018;writing &#x2013; original draft&#x2019;,</italic> and <italic>&#x2018;writing &#x2013; review and editing&#x2019;</italic> (<xref rid="R23" ref-type="bibr">Lariviere et al., 2021</xref>). Furthermore, some roles exhibit high autocorrelation and lack distinct differentiation, suggesting that they could be considered for consolidation within the context of contributor roles in data papers.</p>
<p>Data-intensive science is a collaborative endeavour, and our research indicates that as the number of co-authors increases, there is a corresponding rise in the number of individuals taking on roles such as investigation, validation, and formal analysis. In contrast, the number of individuals assuming core roles like conceptualization, methodology, and Writing &#x2013; original draft, or managerial roles such as project administration, funding acquisition, and resources, remains relatively stable. This suggests that the primary driver of team size expansion is the need for practical tasks such as data handling and investigation, while the number of scientific leaders remains scarce (<xref rid="R35" ref-type="bibr">Robinson-Garcia et al., 2020</xref>). As Larivi&#x00E8;re et al. have posited, <italic>&#x2018;the bureaucratization of science can be considered as an inevitable consequence of the ubiquity of collaborative science&#x2019;</italic> (<xref rid="R22" ref-type="bibr">Larivi&#x00E8;re et al., 2015</xref>).</p>
<p>The publication of data papers reflects the work patterns and research cultures across various disciplines. Different academic fields exhibit varying degrees of concentration in the use of CRediT roles, with data papers in the social sciences generally showing higher concentrations of CRediT role usage compared to those in the natural sciences. We suggest that the differences in research methodologies, types of data, and the nature of research questions are the primary causes of this phenomenon. For instance, data papers in the social sciences more prominently represent the <italic>&#x2018;paper&#x2019;</italic> dimension, with authors tending to focus on the <italic>&#x2018;conceptualization&#x2019;</italic> role, while data papers in the natural sciences emphasize the <italic>&#x2018;data&#x2019;</italic> dimension more, with authors concentrating more on <italic>&#x2018;investigation&#x2019;</italic> and <italic>&#x2018;methodology&#x2019;.</italic></p>
<p>The attribution of academic credit is one of the key concerns for researchers, with a common understanding that corresponding authors and first authors make significant contributions and play major leadership roles (Bhandari et al., 2014; Perneger et al., 2017; Teixeira da Silva, 2021; Yang et al., 2017). The statistical analysis indicates that in data papers, corresponding authors often undertake more leadership and coordination tasks and are more likely to be involved in key roles such as <italic>&#x2018;conceptualization&#x2019;, &#x2018;funding acquisition&#x2019;</italic>, and <italic>&#x2018;supervision&#x2019;.</italic> These roles pertain not only to the initial design and theoretical construction of the research but also encompass the supervision of the research process and financial support. The first authors, on the other hand, exhibit higher engagement in roles like <italic>&#x2018;writing - original draft&#x2019;</italic> and <italic>&#x2018;data curation&#x2019;,</italic> which are directly related to the implementation of the research and the accuracy of the data. Comparatively, corresponding authors are more likely than first authors to take on any given role, implying that they act as versatile players within the team (<xref rid="R26" ref-type="bibr">Lu et al., 2020</xref>) and are more likely to be the corresponding authors of a data paper. Furthermore, our research also reveals that the more roles an author takes on, and their involvement in key data processing roles such as <italic>&#x2018;data curation&#x2019;, &#x2018;software&#x2019;,</italic> and <italic>&#x2018;visualization&#x2019;,</italic> the more it aids in the author&#x2019;s ranking in terms of by-line order.</p>
<p>Our research indicates that some technical contributions related to data processing are clearly very important for the attribution of credit to authors of data papers. However, in the CRediT, the connotations of roles such as <italic>&#x2018;data curation&#x2019;</italic> encompass both work requiring profound professional background knowledge, such as data annotation, and technical tasks like data cleaning and maintenance. The roles of technical contributions (<xref rid="R40" ref-type="bibr">Smith, 2023</xref>) and human intervention are not adequately differentiated and described. This could lead to a situation where some authors&#x2019; significant contributions are not recognized as they should be, while others enjoy excessive credit due to their role designation. This situation could be more pronounced in data papers, ultimately affecting the fairness of academic credit attribution for data papers.</p>
<p>CRediT, as a standardized method for describing author contributions, aids in clarifying the specific contributions of each author and determining the data responsibilities of different authors in data papers. However, the content standards and organizational forms of data papers differ from those of traditional research articles (<xref rid="R6" ref-type="bibr">Callaghan et al., 2012</xref>). Data papers focus more on the collection, processing, and presentation of data, some of which do not have direct corresponding categories within the 14 CRediT roles. The majority of data papers use only about 9 roles to describe the work conducted, indicating that CRediT has limitations in fully capturing the research workflow. Some roles, such as <italic>&#x2018;validation&#x2019;</italic> and <italic>&#x2018;software&#x2019;,</italic> which ensure data quality and reuse value, are missing in most data papers. This leads to an underestimation or neglect of the actual significance and unique contributions of these roles in research work, affecting a comprehensive understanding of the entire research process. This uneven usage also implies that CRediT fails to accurately and consistently provide all important types of contributions when describing the research work in data papers (<xref rid="R2" ref-type="bibr">Alliez et al., 2020</xref>; <xref rid="R12" ref-type="bibr">Fitzgerald et al., 2020</xref>; <xref rid="R27" ref-type="bibr">Matarese &#x0026; Shashok, 2019</xref>).</p>
</sec>
<sec id="sec6">
<title>Conclusion</title>
<p>Over the last few decades, data-intensive scientific research has become increasingly prevalent, with some of its labor division characteristics reflected through the publication of data papers. Our study arrives at the following main conclusions:</p>
<list list-type="order">
<list-item><p>Data papers rarely make full use of the 14 CRediT roles to describe author contributions, with <italic>&#x2018;project administration&#x2019;</italic> and <italic>&#x2018;resources&#x2019;</italic> being unmentioned in over half of the data paper samples.</p></list-item>
<list-item><p>Team size and discipline have a significant impact on the labor division of data-intensive scientific research activities. The need for data collection and analysis is the main reason for the expansion of team size, which is particularly evident in the natural sciences, where authors&#x2019; roles are more concentrated on <italic>&#x2018;investigation&#x2019;</italic> and <italic>&#x2018;methodology&#x2019;</italic>.</p></list-item>
<list-item><p>Corresponding authors and first authors continue to take on core roles, such as <italic>&#x2018;methodology&#x2019;</italic> and <italic>&#x2018;conceptualization&#x2019;,</italic> but at the same time, undertaking data analysis and processing-related tasks, such as <italic>&#x2018;software&#x2019;,</italic> helps authors advance in the author order of data papers.</p></list-item>
</list>
<p>Data papers provide an excellent window into studying data-intensive scientific research activities, and CRediT offers a useful framework for characterizing data-centric scientific workflows, but it requires refinement to reflect the characteristics of data papers and the diversity of research work more comprehensively. Developing a taxonomy of contributor roles specific to data papers&#x2014;DP-CRediT (Data paper-CRediT)&#x2014;could be a good option. For instance, it might be beneficial to add specific roles such as <italic>&#x2018;data collection&#x2019;</italic> and <italic>&#x2018;metadata management&#x2019;</italic> to accurately reflect the contributions of these key steps in data papers. Moreover, the importance of technical work such as software development and data processing is increasingly recognized. Alliez et al. argue that since software development in research involves <italic>&#x2018;significant innovation&#x2019;,</italic> there is a need for appropriate human intervention and qualitative information to ensure accurate reporting of contributions (<xref rid="R2" ref-type="bibr">Alliez et al., 2020</xref>). Building on the existing roles of &#x2018;<italic>software&#x2019;</italic> and <italic>&#x2018;formal analysis&#x2019;,</italic> it could be considered to further refine technical contributions by introducing roles such as <italic>&#x2018;algorithm development&#x2019;</italic> and &#x2018;<italic>data engineering&#x2019;</italic> to describe technical work more comprehensively. Adding roles like <italic>&#x2018;critical analysis&#x2019;</italic> or <italic>&#x2018;data interpretation&#x2019;</italic> could emphasize the role of human intervention in data analysis and interpretation, ensuring a balance between technical contributions and human judgment. While refining roles, it is also necessary to maintain consistency among existing CRediT roles. Some CRediT roles describe project-level tasks while others describe paper-level tasks(Hosseini, Gordijn, et al., 2023), which could lead to an imbalance in the allocation of academic credit.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>This work was funded by the project "Publishing Model linked Research Paper and Scientific Data and its application in FAIR-compliant Manner"(23BXW097)supported by National Social Science Foundation of China</p>
</ack>
<ref-list>
<title>References</title>
<ref id="R1"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Allen</surname><given-names>L.</given-names></name><name><surname>O&#x2019;Connell</surname><given-names>A.</given-names></name><name><surname>Kiermer</surname><given-names>V.</given-names></name></person-group> <year>(2019)</year> <article-title>How can we ensure visibility and diversity in research contributions? How the Contributor Role Taxonomy (CRediT) is helping the shift from authorship to contributorship [Article]</article-title><source>Learned Publishing</source><volume>32</volume><issue>1</issue><fpage>71</fpage><lpage>74</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1002/leap.1210">https://doi.org/10.1002/leap.1210</ext-link></element-citation></ref>
<ref id="R2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alliez</surname><given-names>P.</given-names></name><name><surname>Di Cosmo</surname><given-names>R.</given-names></name><name><surname>Guedj</surname><given-names>B.</given-names></name><name><surname>Girault</surname><given-names>A.</given-names></name><name><surname>Hacid</surname><given-names>M.-S.</given-names></name><name><surname>Legrand</surname><given-names>A.</given-names></name><name><surname>Rougier</surname><given-names>N.</given-names></name></person-group> <year>(2020)</year> <article-title>Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria [Article]</article-title><source>Computing in Science &#x0026; Engineering</source><volume>22</volume><issue>1</issue><fpage>39</fpage><lpage>51</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/mcse.2019.2949413">https://doi.org/10.1109/mcse.2019.2949413</ext-link></element-citation></ref>
<ref id="R3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Alpi</surname><given-names>K. M.</given-names></name><name><surname>Akers</surname><given-names>K. G.</given-names></name></person-group> <year>(2021)</year> <article-title>CRediT for authors of articles published in the Journal of the Medical Library Association [Editorial Material]</article-title><source>Journal of the Medical Library Association</source><volume>109</volume><issue>3</issue><fpage>362</fpage><lpage>364</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5195/jmla.2021.1294">https://doi.org/10.5195/jmla.2021.1294</ext-link></element-citation></ref>
<ref id="R4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Benhamed</surname><given-names>O. M.</given-names></name><name><surname>Burger</surname><given-names>K.</given-names></name><name><surname>Kaliyaperumal</surname><given-names>R.</given-names></name><name><surname>da Silva Santos</surname><given-names>L. O. B.</given-names></name><name><surname>Such&#x00E1;nek</surname><given-names>M.</given-names></name><name><surname>Slifka</surname><given-names>J.</given-names></name><name><surname>Wilkinson</surname><given-names>M. D.</given-names></name></person-group> <year>(2023)</year> <article-title>The FAIR Data Point: Interfaces and Tooling</article-title><source>Data Intelligence</source><volume>5</volume><issue>1</issue><fpage>184</fpage><lpage>201</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1162/dint_a_00161">https://doi.org/10.1162/dint_a_00161</ext-link></element-citation></ref>
<ref id="R5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bierer</surname><given-names>B. E.</given-names></name><name><surname>Crosas</surname><given-names>M.</given-names></name><name><surname>Pierce</surname><given-names>H. H.</given-names></name></person-group> <year>(2017)</year> <article-title>Data Authorship as an Incentive to Data Sharing</article-title><source>New England Journal of Medicine</source><volume>376</volume><issue>17</issue><fpage>1684</fpage><lpage>1687</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/doi:10.1056/NEJMsb1616595">https://doi.org/doi:10.1056/NEJMsb1616595</ext-link></element-citation></ref>
<ref id="R6"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Callaghan</surname><given-names>S.</given-names></name><name><surname>Donegan</surname><given-names>S.</given-names></name><name><surname>Pepler</surname><given-names>S.</given-names></name><name><surname>Thorley</surname><given-names>M.</given-names></name><name><surname>Cunningham</surname><given-names>N.</given-names></name><name><surname>Kirsch</surname><given-names>P.</given-names></name><name><surname>Ault</surname><given-names>L.</given-names></name><name><surname>Bell</surname><given-names>P. J.</given-names></name><name><surname>Bowie</surname><given-names>R. C.</given-names></name><name><surname>Leadbetter</surname><given-names>A. M.</given-names></name><name><surname>Lowry</surname><given-names>R. K.</given-names></name><name><surname>Moncoiffe</surname><given-names>G.</given-names></name><name><surname>Harrison</surname><given-names>K.</given-names></name><name><surname>Smith-Haddon</surname><given-names>B.</given-names></name><name><surname>Weatherby</surname><given-names>A.</given-names></name><name><surname>Wright</surname><given-names>D. G.</given-names></name></person-group> <year>(2012)</year> <article-title>Making Data a First Class Scientific Output: Data Citation and Publication by NERC&#x2019;s Environmental Data Centres</article-title><source>Int. J. Digit. Curation</source><volume>7</volume><fpage>107</fpage><lpage>113</lpage></element-citation></ref>
<ref id="R7"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cheruvelil</surname><given-names>K. S.</given-names></name><name><surname>Soranno</surname><given-names>P. A.</given-names></name></person-group> <year>(2018)</year> <article-title>Data-Intensive Ecological Research Is Catalyzed by Open Science and Team Science</article-title><source>BioScience</source><volume>68</volume><issue>10</issue><fpage>813</fpage><lpage>822</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1093/biosci/biy097">https://doi.org/10.1093/biosci/biy097</ext-link></element-citation></ref>
<ref id="R8"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dance</surname><given-names>A.</given-names></name></person-group> <year>(2012)</year> <article-title>Authorship: Who&#x2019;s on first?</article-title><source>Nature</source><volume>489</volume><issue>7417</issue><fpage>591</fpage><lpage>593</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/nj7417-591a">https://doi.org/10.1038/nj7417-591a</ext-link></element-citation></ref>
<ref id="R9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Das</surname><given-names>N.</given-names></name><name><surname>Das</surname><given-names>S.</given-names></name></person-group> <year>(2020)</year> <article-title>&#x2019;Author Contribution Details&#x2019; and not &#x2019;Authorship Sequence&#x2019; as a merit to determine credit: A need to relook at the current Indian practice [Review]</article-title><source>National Medical Journal of India</source><volume>33</volume><issue>1</issue><fpage>24</fpage><lpage>30</lpage><comment>Article Pmid 33565483</comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.4103/0970-258x.308238">https://doi.org/10.4103/0970-258x.308238</ext-link></element-citation></ref>
<ref id="R10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ding</surname><given-names>J.</given-names></name><name><surname>Liu</surname><given-names>C.</given-names></name><name><surname>Zheng</surname><given-names>Q.</given-names></name><name><surname>Cai</surname><given-names>W.</given-names></name></person-group> <year>(2021)</year> <article-title>A new method of co-author credit allocation based on contributor roles taxonomy: proof of concept and evaluation using papers published in PLOS ONE [Article]</article-title><source>Scientometrics</source><volume>126</volume><issue>9</issue><fpage>7561</fpage><lpage>7581</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/s11192-021-04075-x">https://doi.org/10.1007/s11192-021-04075-x</ext-link></element-citation></ref>
<ref id="R11"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Faniel</surname><given-names>I. M.</given-names></name><name><surname>Jacobsen</surname><given-names>T. E.</given-names></name></person-group> <year>(2010)</year> <article-title>Reusing Scientific Data: How Earthquake Engineering Researchers Assess the Reusability of Colleagues&#x2019; Data</article-title><source>Computer Supported Cooperative Work (CSCW)</source><volume>19</volume><issue>3</issue><fpage>355</fpage><lpage>375</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/s10606-010-9117-8">https://doi.org/10.1007/s10606-010-9117-8</ext-link></element-citation></ref>
<ref id="R12"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fitzgerald</surname><given-names>S.</given-names></name><name><surname>Budd</surname><given-names>J.</given-names></name><name><surname>Beile</surname><given-names>P.</given-names></name><name><surname>Kaspar</surname><given-names>W.</given-names></name></person-group> <year>(2020)</year> <article-title>Modeling Transparency in Roles: Moving from Authorship to Contributorship</article-title><volume>2020</volume><issue>7</issue><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5860/crl.81.7.1056">https://doi.org/10.5860/crl.81.7.1056</ext-link></element-citation></ref>
<ref id="R13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Greenberg</surname><given-names>J.</given-names></name><name><surname>McClellan</surname><given-names>S.</given-names></name><name><surname>Rauch</surname><given-names>C.</given-names></name><name><surname>Zhao</surname><given-names>X.</given-names></name><name><surname>Kelly</surname><given-names>M.</given-names></name><name><surname>An</surname><given-names>Y.</given-names></name><name><surname>Kunze</surname><given-names>J.</given-names></name><name><surname>Orenstein</surname><given-names>R.</given-names></name><name><surname>Porter</surname><given-names>C.</given-names></name><name><surname>Meschke</surname><given-names>V.</given-names></name><name><surname>Toberer</surname><given-names>E.</given-names></name></person-group> <year>(2023)</year> <article-title>Building Community Consensus for Scientific Metadata with YAMZ</article-title><source>Data Intelligence</source><volume>5</volume><issue>1</issue><fpage>242</fpage><lpage>260</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1162/dint_a_00211">https://doi.org/10.1162/dint_a_00211</ext-link></element-citation></ref>
<ref id="R14"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Greenberg</surname><given-names>J.</given-names></name><name><surname>Wu</surname><given-names>M.</given-names></name><name><surname>Liu</surname><given-names>W.</given-names></name><name><surname>Liu</surname><given-names>F.</given-names></name></person-group> <year>(2023)</year> <article-title>Metadata as Data Intelligence</article-title><source>Data Intelligence</source><volume>5</volume><issue>1</issue><fpage>1</fpage><lpage>5</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1162/dint_e_00212">https://doi.org/10.1162/dint_e_00212</ext-link></element-citation></ref>
<ref id="R15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Halevy</surname><given-names>A.</given-names></name><name><surname>Norvig</surname><given-names>P.</given-names></name><name><surname>Pereira</surname><given-names>F.</given-names></name></person-group> <year>(2009)</year> <article-title>The Unreasonable Effectiveness of Data</article-title><comment>IEEE</comment><source>Intelligent Systems</source><volume>24</volume><issue>2</issue><fpage>8</fpage><lpage>12</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/MIS.2009.36">https://doi.org/10.1109/MIS.2009.36</ext-link></element-citation></ref>
<ref id="R16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Holcombe</surname><given-names>A. O.</given-names></name></person-group> <year>(2019)</year> <article-title>Contributorship, Not Authorship: Use CRediT to Indicate Who Did What [Article]</article-title><source>Publications</source><volume>7</volume><issue>3</issue><comment>Article 48</comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/publications7030048">https://doi.org/10.3390/publications7030048</ext-link></element-citation></ref>
<ref id="R17"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hosseini</surname><given-names>M.</given-names></name><name><surname>Colomb</surname><given-names>J.</given-names></name><name><surname>Holcombe</surname><given-names>A. O.</given-names></name><name><surname>Kern</surname><given-names>B.</given-names></name><name><surname>Vasilevsky</surname><given-names>N. A.</given-names></name><name><surname>Holmes</surname><given-names>K. L.</given-names></name></person-group> <year>(2023)</year> <article-title>Evolution and adoption of contributor role ontologies and taxonomies [Article]</article-title><source>Learned Publishing</source><volume>36</volume><issue>2</issue><fpage>275</fpage><lpage>284</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1002/leap.1496">https://doi.org/10.1002/leap.1496</ext-link></element-citation></ref>
<ref id="R18"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Hosseini</surname><given-names>M.</given-names></name><name><surname>Gordijn</surname><given-names>B.</given-names></name><name><surname>Wafford</surname><given-names>Q. E.</given-names></name><name><surname>Holmes</surname><given-names>K. L. L.</given-names></name></person-group> <year>(2023)</year> <article-title>A systematic scoping review of the ethics of Contributor Role Ontologies and Taxonomies [Review; Early Access]</article-title><source>Accountability in Research-Ethics Integrity and Policy</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/08989621.2022.2161049">https://doi.org/10.1080/08989621.2022.2161049</ext-link></element-citation></ref>
<ref id="R19"><element-citation publication-type="other"><person-group person-group-type="author"><collab>ICSU. Priority Area Assessment on Scientific Data and Information</collab></person-group><comment>ICSU</comment><ext-link ext-link-type="uri" xlink:href="https://council.science/publications/priority-area-assessment-on-scientific-data-and-information/">https://council.science/publications/priority-area-assessment-on-scientific-data-and-information/</ext-link></element-citation></ref>
<ref id="R20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kalager</surname><given-names>M.</given-names></name><name><surname>Adami</surname><given-names>H.-O.</given-names></name><name><surname>Bretthauer</surname><given-names>M.</given-names></name></person-group> <year>(2016)</year> <article-title>Recognizing Data Generation</article-title><source>New England Journal of Medicine</source><volume>374</volume><issue>19</issue><fpage>1898</fpage><lpage>1898</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/doi:10.1056/NEJMc1603789">https://doi.org/doi:10.1056/NEJMc1603789</ext-link></element-citation></ref>
<ref id="R21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Landi</surname><given-names>A.</given-names></name><name><surname>Thompson</surname><given-names>M.</given-names></name><name><surname>Giannuzzi</surname><given-names>V.</given-names></name><name><surname>Bonifazi</surname><given-names>F.</given-names></name><name><surname>Labastida</surname><given-names>I.</given-names></name><name><surname>da Silva Santos</surname><given-names>L. O. B.</given-names></name><name><surname>Roos</surname><given-names>M.</given-names></name></person-group> <year>(2020)</year> <article-title>The &#x201C;A&#x201D; of FAIR &#x2013; As Open as Possible, as Closed as Necessary</article-title><source>Data Intelligence</source><volume>2</volume><issue>1-2</issue><fpage>47</fpage><lpage>55</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1162/dint_a_00027">https://doi.org/10.1162/dint_a_00027</ext-link></element-citation></ref>
<ref id="R22"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Larivi&#x00E8;re</surname><given-names>V.</given-names></name><name><surname>Gingras</surname><given-names>Y.</given-names></name><name><surname>Sugimoto</surname><given-names>C. R.</given-names></name><name><surname>Tsou</surname><given-names>A.</given-names></name></person-group> <year>(2015)</year> <article-title>Team size matters: Collaboration and scientific impact since 1900</article-title><source>Journal of the Association for Information Science and Technology</source><volume>66</volume><issue>7</issue><fpage>1323</fpage><lpage>1332</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1002/asi.23266">https://doi.org/10.1002/asi.23266</ext-link></element-citation></ref>
<ref id="R23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lariviere</surname><given-names>V.</given-names></name><name><surname>Pontille</surname><given-names>D.</given-names></name><name><surname>Sugimoto</surname><given-names>C. R.</given-names></name></person-group> <year>(2021)</year> <article-title>Investigating the division of scientific labor using the Contributor Roles Taxonomy (CRediT) [Article]</article-title><source>Quantitative Science Studies</source><volume>2</volume><issue>1</issue><fpage>111</fpage><lpage>128</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1162/qss_a_00097">https://doi.org/10.1162/qss_a_00097</ext-link></element-citation></ref>
<ref id="R24"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Lenhardt</surname><given-names>W. C.</given-names></name><name><surname>Conway</surname><given-names>M.</given-names></name><name><surname>Scott</surname><given-names>E.</given-names></name><name><surname>Blanton</surname><given-names>B.</given-names></name><name><surname>Krishnamurthy</surname><given-names>A.</given-names></name><name><surname>Hadzikadic</surname><given-names>M.</given-names></name><name><surname>Vouk</surname><given-names>M.</given-names></name><name><surname>Wilson</surname><given-names>A.</given-names></name><collab>Ieee</collab></person-group> <year>(2016)</year> <comment>2016 Sep 13-15</comment><chapter-title>Cross-Institutional Research Cyberinfrastructure for Data Intensive Science.IEEE High Performance Extreme Computing Conference [2016 ieee high performance extreme computing conference (hpec)]</chapter-title><source>IEEE High Performance Extreme Computing Conference (HPEC)</source><publisher-loc>Waltham, MA</publisher-loc></element-citation></ref>
<ref id="R25"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lo</surname><given-names>B.</given-names></name><name><surname>DeMets</surname><given-names>D. L.</given-names></name></person-group> <year>(2016)</year> <article-title>Incentives for Clinical Trialists to Share Data</article-title><source>New England Journal of Medicine</source><volume>375</volume><issue>12</issue><fpage>1112</fpage><lpage>1115</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/doi:10.1056/NEJMp1608351">https://doi.org/doi:10.1056/NEJMp1608351</ext-link></element-citation></ref>
<ref id="R26"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname><given-names>C.</given-names></name><name><surname>Zhang</surname><given-names>Y.</given-names></name><name><surname>Ahn</surname><given-names>Y.-Y.</given-names></name><name><surname>Ding</surname><given-names>Y.</given-names></name><name><surname>Zhang</surname><given-names>C.</given-names></name><name><surname>Ma</surname><given-names>D.</given-names></name></person-group> <year>(2020)</year> <article-title>Co-contributorship Network and Division of Labor in Individual Scientific Collaborations [Article]</article-title><source>Journal of the Association for Information Science and Technology</source><volume>71</volume><issue>10</issue><fpage>1162</fpage><lpage>1178</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1002/asi.24321">https://doi.org/10.1002/asi.24321</ext-link></element-citation></ref>
<ref id="R27"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Matarese</surname><given-names>V.</given-names></name><name><surname>Shashok</surname><given-names>K.</given-names></name></person-group> <year>(2019)</year> <article-title>Transparent Attribution of Contributions to Research: Aligning Guidelines to Real-Life Practices [Article]</article-title><source>Publications</source><volume>7</volume><issue>2</issue><comment>Article 24</comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3390/publications7020024">https://doi.org/10.3390/publications7020024</ext-link></element-citation></ref>
<ref id="R28"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nielsen</surname><given-names>M.</given-names></name></person-group> <year>(2009)</year> <article-title>The Fourth Paradigm: Data-Intensive Scientific Discovery</article-title><source>Nature</source><volume>462</volume><issue>7274</issue><fpage>722</fpage><lpage>723</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1038/462722a">https://doi.org/10.1038/462722a</ext-link></element-citation></ref>
<ref id="R29"><element-citation publication-type="other"><person-group person-group-type="author"><collab>NISO. Contributor Roles Taxonomy</collab></person-group><comment>NISO</comment><ext-link ext-link-type="uri" xlink:href="https://credit.niso.org/">https://credit.niso.org/</ext-link></element-citation></ref>
<ref id="R30"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Pasquetto</surname><given-names>I. V.</given-names></name><name><surname>Randles</surname><given-names>B. M.</given-names></name><name><surname>Borgman</surname><given-names>C. L.</given-names></name></person-group> <year>(2017)</year> <article-title>On the Reuse of Scientific Data</article-title><source>Data Science Journal</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5334/dsj-2017-008">https://doi.org/10.5334/dsj-2017-008</ext-link></element-citation></ref>
<ref id="R31"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pietsch</surname><given-names>W.</given-names></name></person-group> <year>(2015)</year> <article-title>Aspects of Theory-Ladenness in Data-Intensive Science</article-title><source>Philosophy of Science</source><volume>82</volume><issue>5</issue><fpage>905</fpage><lpage>916</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1086/683328">https://doi.org/10.1086/683328</ext-link></element-citation></ref>
<ref id="R32"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Rahman</surname><given-names>M. T.</given-names></name><name><surname>Verhagen</surname><given-names>J. V.</given-names></name></person-group> <year>(2023)</year> <article-title>Implementing Quantitative Declarations of Authorship Contribution: A Call to Action [Article]</article-title><source>Journal of Scientometric Research</source><volume>12</volume><issue>2</issue><fpage>431</fpage><lpage>435</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5530/jscires.12.2.039">https://doi.org/10.5530/jscires.12.2.039</ext-link></element-citation></ref>
<ref id="R33"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ramachandran</surname><given-names>R.</given-names></name><name><surname>Rushing</surname><given-names>J.</given-names></name><name><surname>Lin</surname><given-names>A.</given-names></name><name><surname>Conover</surname><given-names>H.</given-names></name><name><surname>Li</surname><given-names>X.</given-names></name><name><surname>Graves</surname><given-names>S.</given-names></name><name><surname>Nair</surname><given-names>U. S.</given-names></name><name><surname>Kuo</surname><given-names>K. S.</given-names></name><name><surname>Smith</surname><given-names>D. K.</given-names></name></person-group> <year>(2013)</year> <article-title>Data Prospecting&#x2013;A Step Towards Data Intensive Science</article-title><source>IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing</source><volume>6</volume><issue>3</issue><fpage>1233</fpage><lpage>1241</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/JSTARS.2013.2248133">https://doi.org/10.1109/JSTARS.2013.2248133</ext-link></element-citation></ref>
<ref id="R34"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Resnik</surname><given-names>D. B.</given-names></name><name><surname>Elliott</surname><given-names>K. C.</given-names></name><name><surname>Soranno</surname><given-names>P. A.</given-names></name><name><surname>Smith</surname><given-names>E. M.</given-names></name></person-group> <year>(2017)</year> <article-title>Data-Intensive Science and Research Integrity</article-title><source>Accountability in Research-Ethics Integrity and Policy</source><volume>24</volume><issue>6</issue><fpage>344</fpage><lpage>358</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/08989621.2017.1327813">https://doi.org/10.1080/08989621.2017.1327813</ext-link></element-citation></ref>
<ref id="R35"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Robinson-Garcia</surname><given-names>N.</given-names></name><name><surname>Costas</surname><given-names>R.</given-names></name><name><surname>Sugimoto</surname><given-names>C. R.</given-names></name><name><surname>Larivi&#x00E8;re</surname><given-names>V.</given-names></name><name><surname>Nane</surname><given-names>G. F.</given-names></name></person-group> <year>(2020)</year> <article-title>Task specialization across research careers</article-title><source>Elife</source><volume>9</volume><fpage>e60586</fpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/ARTN e60586 10.7554/eLife.60586">https://doi.org/ARTN e60586 10.7554/eLife.60586</ext-link></element-citation></ref>
<ref id="R36"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Schultes</surname><given-names>E.</given-names></name><name><surname>Roos</surname><given-names>M.</given-names></name><name><surname>Bonino da Silva Santos</surname><given-names>L. O.</given-names></name><name><surname>Guizzardi</surname><given-names>G.</given-names></name><name><surname>Bouwman</surname><given-names>J.</given-names></name><name><surname>Hankemeier</surname><given-names>T.</given-names></name><name><surname>Baak</surname><given-names>A.</given-names></name><name><surname>Mons</surname><given-names>B.</given-names></name></person-group> <year>(2022)</year> <article-title>FAIR Digital Twins for Data-Intensive Research</article-title><source>Front Big Data</source><volume>5</volume><fpage>883341</fpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fdata.2022.883341">https://doi.org/10.3389/fdata.2022.883341</ext-link></element-citation></ref>
<ref id="R37"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Scraper</surname><given-names>W.</given-names></name></person-group><comment>About us. Web Scraper</comment><ext-link ext-link-type="uri" xlink:href="https://www.webscraper.io/about-us">https://www.webscraper.io/about-us</ext-link></element-citation></ref>
<ref id="R38"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Scroggins</surname><given-names>M. J.</given-names></name><name><surname>Pasquetto</surname><given-names>I. V.</given-names></name></person-group> <year>(2020)</year> <article-title>Labor Out of Place: On the Varieties and Valences of (In)visible Labor in Data-Intensive Science</article-title><source>Engaging Science Technology and Society</source><volume>6</volume><fpage>111</fpage><lpage>132</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.17351/ests2020.341">https://doi.org/10.17351/ests2020.341</ext-link></element-citation></ref>
<ref id="R39"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Shamoo</surname><given-names>A. E.</given-names></name></person-group> <year>(2013)</year> <article-title>Data Audit as a Way to Prevent/Contain Misconduct</article-title><source>Accountability in Research-Policies and Quality Assurance</source><volume>20</volume><issue>5-6</issue><fpage>369</fpage><lpage>379</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/08989621.2013.822259">https://doi.org/10.1080/08989621.2013.822259</ext-link></element-citation></ref>
<ref id="R40"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Smith</surname><given-names>E.</given-names></name></person-group> <year>(2023)</year> <article-title>"Technical" Contributors and Authorship Distribution in Health Science [Article]</article-title><source>Science and Engineering Ethics</source><volume>29</volume><issue>4</issue><comment>Article 22</comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/s11948-023-00445-1">https://doi.org/10.1007/s11948-023-00445-1</ext-link></element-citation></ref>
<ref id="R41"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Steele</surname><given-names>L.</given-names></name><name><surname>Lee</surname><given-names>H. L.</given-names></name><name><surname>Earp</surname><given-names>E.</given-names></name><name><surname>Hong</surname><given-names>A.</given-names></name><name><surname>Thomson</surname><given-names>J.</given-names></name></person-group> <year>(2021)</year> <article-title>Who writes dermatology randomized controlled trials? The need to specify the role of medical writers [Article]</article-title><source>Clinical and Experimental Dermatology</source><volume>46</volume><issue>6</issue><fpage>1086</fpage><lpage>1088</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/ced.14711">https://doi.org/10.1111/ced.14711</ext-link></element-citation></ref>
<ref id="R42"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tenopir</surname><given-names>C.</given-names></name><name><surname>Allard</surname><given-names>S.</given-names></name><name><surname>Douglass</surname><given-names>K.</given-names></name><name><surname>Aydinoglu</surname><given-names>A. U.</given-names></name><name><surname>Wu</surname><given-names>L.</given-names></name><name><surname>Read</surname><given-names>E.</given-names></name><name><surname>Manoff</surname><given-names>M.</given-names></name><name><surname>Frame</surname><given-names>M.</given-names></name></person-group> <year>(2011)</year> <article-title>Data Sharing by Scientists: Practices and Perceptions</article-title><source>PLOS ONE</source><volume>6</volume><issue>6</issue><fpage>e21101</fpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1371/journal.pone.0021101">https://doi.org/10.1371/journal.pone.0021101</ext-link></element-citation></ref>
<ref id="R43"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tolle</surname><given-names>K. M.</given-names></name><name><surname>Tansley</surname><given-names>D. S. W.</given-names></name><name><surname>Hey</surname><given-names>A. J. G.</given-names></name></person-group> <year>(2011)</year> <article-title>The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View]</article-title><source>Proceedings of the IEEE</source><volume>99</volume><issue>8</issue><fpage>1334</fpage><lpage>1337</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/JPROC.2011.2155130">https://doi.org/10.1109/JPROC.2011.2155130</ext-link></element-citation></ref>
<ref id="R44"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Treadway</surname><given-names>J.</given-names></name><name><surname>Hahnel</surname><given-names>M.</given-names></name><name><surname>Leonelli</surname><given-names>S.</given-names></name><name><surname>Penny</surname><given-names>D.</given-names></name><name><surname>Groenewegen</surname><given-names>D.</given-names></name><name><surname>Miyairi</surname><given-names>N.</given-names></name><name><surname>Hayashi</surname><given-names>K.</given-names></name><name><surname>O&#x2019;Donnell</surname><given-names>D.</given-names></name><name><surname>Science</surname><given-names>D.</given-names></name><name><surname>Hook</surname><given-names>D.</given-names></name></person-group> <year>(2016)</year> <article-title>The State of Open Data Report</article-title><source>Figshare</source><ext-link ext-link-type="uri" xlink:href="https://figshare.com/articles/report/The_State_of_Open_Data_Report/4036398?file=65580 51">https://figshare.com/articles/report/The_State_of_Open_Data_Report/4036398?file=65580 51</ext-link></element-citation></ref>
<ref id="R45"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Udey</surname><given-names>M. C.</given-names></name></person-group> <year>(2018)</year> <article-title>Giving Credit where Credit Is Due (and Assigning Individual Responsibilities) [Editorial Material]</article-title><source>Journal of Investigative Dermatology</source><volume>138</volume><issue>7</issue><fpage>1451</fpage><lpage>1452</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.jid.2018.05.010">https://doi.org/10.1016/j.jid.2018.05.010</ext-link></element-citation></ref>
<ref id="R46"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vasilevsky</surname><given-names>N. A.</given-names></name><name><surname>Hosseini</surname><given-names>M.</given-names></name><name><surname>Teplitzky</surname><given-names>S.</given-names></name><name><surname>Ilik</surname><given-names>V.</given-names></name><name><surname>Mohammadi</surname><given-names>E.</given-names></name><name><surname>Schneider</surname><given-names>J.</given-names></name><name><surname>Kern</surname><given-names>B.</given-names></name><name><surname>Colomb</surname><given-names>J.</given-names></name><name><surname>Edmunds</surname><given-names>S. C.</given-names></name><name><surname>Gutzman</surname><given-names>K.</given-names></name><name><surname>Himmelstein</surname><given-names>D. S.</given-names></name><name><surname>White</surname><given-names>M.</given-names></name><name><surname>Smith</surname><given-names>B.</given-names></name><name><surname>O&#x2019;Keefe</surname><given-names>L.</given-names></name><name><surname>Haendel</surname><given-names>M.</given-names></name><name><surname>Holmes</surname><given-names>K. L.</given-names></name></person-group> <year>(2021)</year> <article-title>Is authorship sufficient for today&#x2019;s collaborative research? A call for contributor roles [Article]</article-title><source>Accountability in Research-Ethics Integrity and Policy</source><volume>28</volume><issue>1</issue><fpage>23</fpage><lpage>43</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/08989621.2020.1779591">https://doi.org/10.1080/08989621.2020.1779591</ext-link></element-citation></ref>
<ref id="R47"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wallis</surname><given-names>J. C.</given-names></name><name><surname>Borgman</surname><given-names>C. L.</given-names></name></person-group> <year>(2011)</year> <article-title>Who is responsible for data? An exploratory study of data authorship, ownership, and responsibility</article-title><source>Proceedings of the American Society for Information Science and Technology</source><volume>48</volume><issue>1</issue><fpage>1</fpage><lpage>10</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/https://doi.org/10.1002/meet.2011.14504801188">https://doi.org/https://doi.org/10.1002/meet.2011.14504801188</ext-link></element-citation></ref>
<ref id="R48"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wilson</surname><given-names>A.</given-names></name><name><surname>Downs</surname><given-names>R. R.</given-names></name><name><surname>Lenhardt</surname><given-names>W. C.</given-names></name><name><surname>Meyer</surname><given-names>C.</given-names></name><name><surname>Michener</surname><given-names>W.</given-names></name><name><surname>Ramapriyan</surname><given-names>H.</given-names></name><name><surname>Robinson</surname><given-names>E.</given-names></name></person-group> <year>(2014)</year> <article-title>Realizing the Value of a National Asset: Scientific Data</article-title><source>Eos, Transactions American Geophysical Union</source><volume>95</volume><issue>50</issue><fpage>477</fpage><lpage>478</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/https://doi.org/10.1002/2014EO500006">https://doi.org/https://doi.org/10.1002/2014EO500006</ext-link></element-citation></ref>
<ref id="R49"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wittenburg</surname><given-names>P.</given-names></name></person-group> <year>(2021)</year> <article-title>Open Science and Data Science</article-title><source>Data Intelligence</source><volume>3</volume><issue>1</issue><fpage>95</fpage><lpage>105</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1162/dint_a_00082">https://doi.org/10.1162/dint_a_00082</ext-link></element-citation></ref>
<ref id="R50"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>Z.</given-names></name><name><surname>Wang</surname><given-names>S. D.</given-names></name><name><surname>Li</surname><given-names>G. S.</given-names></name><name><surname>Kong</surname><given-names>G.</given-names></name><name><surname>Gu</surname><given-names>H.</given-names></name><name><surname>Alfon</surname><given-names>F.</given-names></name></person-group> <year>(2019)</year> <article-title>The contributor roles for randomized controlled trials and the proposal for a novel CRediT-RCT [Article]</article-title><source>Annals of Translational Medicine</source><volume>7</volume><issue>24</issue><fpage>812</fpage><comment>Article 812</comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.21037/atm.2019.12.96">https://doi.org/10.21037/atm.2019.12.96</ext-link></element-citation></ref>
</ref-list>
</back>
</article>