<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN" "http://jats.nlm.nih.gov/publishing/1.0/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" article-type="research-article" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">IR</journal-id>
<journal-title-group>
<journal-title>Information Research</journal-title>
</journal-title-group>
<issn pub-type="epub">1368-1613</issn>
<publisher>
<publisher-name>University of Bor&#x00E5;s</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">ir30iConf47338</article-id>
<article-id pub-id-type="doi">10.47989/ir30iConf47338</article-id>
<article-categories>
<subj-group xml:lang="en">
<subject>Research article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>An ensemble framework for sentiment-embedded event evolution in diaspora oral archives</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author"><name><surname>Zhou</surname><given-names>Jing</given-names></name>
<xref ref-type="aff" rid="aff0001"/></contrib>
<aff id="aff0001"><bold>Jing Zhou</bold> is doctoral candidate in the School of Information Management, Wuhan University. Her research interest focuses on digital humanities, knowledge organization and natural language processing. She can be contacted at <email xlink:href="zhoujingwinky@whu.edu.cn">zhoujingwinky@whu.edu.cn</email>.</aff>
</contrib-group>
<pub-date pub-type="epub"><day>06</day><month>05</month><year>2025</year></pub-date>
<pub-date pub-type="collection"><year>2025</year></pub-date>
<volume>30</volume>
<issue>i</issue>
<fpage>123</fpage>
<lpage>141</lpage>
<permissions>
<copyright-year>2025</copyright-year>
<copyright-holder>&#x00A9; 2025 The Author(s).</copyright-holder>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by-nc/4.0/">
<license-p>This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (<ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by-nc/4.0/">http://creativecommons.org/licenses/by-nc/4.0/</ext-link>), permitting all non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<abstract xml:lang="en">
<title>Abstract</title>
<p><bold>Introduction.</bold> Diaspora oral archives should be displayed in the context of diversity and inclusiveness rather than being glossed over by the dominant, normative group so their voices need to be spread further.</p>
<p><bold>Method.</bold> This paper proposes an ensemble framework for sentiment-embedded event evolution in diaspora oral archives. It contains a knowledge representation model, an event evolutionary graph, and an event extraction workflow to extract entities, events, and relationships.</p>
<p><bold>Results.</bold> The South Asian Oral History Project is selected as the data source. The key events, entities, event types, sentiment and event relations are extracted with natural language processing techniques to construct sentiment-embedded event evolutionary graph. Based on this, the event evolution, spatio-temporal and spatio-sentiment patterns are analysed.</p>
<p><bold>Conclusion.</bold> Such methods allow researchers and archivists to engage in research on machine-assisted oral archives to ensure reproducibility, reduce interpretative biases, and efficiently and swiftly amplify hidden voices of <italic>&#x2018;the other&#x2019;</italic>.</p>
</abstract>
</article-meta>
</front>
<body>
<sec id="sec1">
<title>Introduction</title>
<p>Diasporas&#x2019; voices and stories are ignored in the mainstream public consciousness. Oral archives may communicate hidden voices, revealing alternative narratives to those published in written histories (<xref rid="R3" ref-type="bibr">Brinkhurst, 2012</xref>). In terms of data types, oral archives are the audio/video recordings with field notes and transcriptions collections. These interviews can take various shapes including autobiographical narratives by the interviewee, such as lifestyle interviews, or interviews with semi-structured, open-ended questions following a research agenda, such as thematic interviews, or collective information sessions, where many persons participate in the conversation (<xref rid="R21" ref-type="bibr">Thomson, 1998</xref>). Hence, diasporas oral archives always need to be read in relation to broader, dislocated contexts. The contexts include an account of events, personal affection, etc. Nevertheless, on the one hand, similar oral archive practices concentrate more on the collection and preservation processes, neglecting semantic features and dynamic correlation in stories so events in these projects are barely connected by now. On the other hand, the participants-generated stories are unstructured textual data that are not predefined and indexed so it&#x2019;s hard to integrate or retrieve entities and events from them. This calls for new models and processes that enable intuitive access to event-related knowledge in diaspora oral archives. Hence, this study will focus on events and sentiments in diaspora oral archives, together with related personal narrative elements. These events and their relations, as well as entities, are vital components of diasporas&#x2019; stories, constituting basic knowledge units of diaspora oral archives.</p>
<p>Diaspora oral archives are usually highly personal and emotive stories. The sentiment in archives creates a new legibility of the individual narratives (<xref rid="R18" ref-type="bibr">Roeschley and Kim, 2019</xref>; <xref rid="R12" ref-type="bibr">Jones, 2019</xref>). This study introduces the concept of &#x201C;sentiment-embedded events&#x201D; to describe events within these narratives that are closely tied to emotional responses, particularly those evoked by specific locations and contexts. Using a sentiment analysis framework that categorizes emotions into positive, neutral, and negative, this research adopts a coarse-grained approach to analyse sentiments associated with places and events. While this study does not aim to capture the finer nuances of affective experiences, it provides an overarching perspective on the emotional patterns linked to immigration activities and events.</p>
<p>An ensemble framework for sentiment-embedded event evolution is employed in this study, including a knowledge representation model, an event evolutionary graph (EEG) and automatic event extraction workflow by natural language processing (NLP). Precisely, this research exploits the sentiment-embedded event evolutionary graph (SEEGraph) to interpret and analyse these events and their relations, including the sentiment entailed in events. Besides, summarizing diaspora oral archives through manual annotation can be very tedious (or nearly impossible for long texts or large corpora). Many tasks have been successfully adapted to machine learning models in NLP that can help humans extract and summarize information from text automatically, as well as enhancing consistency and scalability in data analysis. Hence, events and their relations, and sentiments will be extracted through NLP in this study.</p>
<p>Motivated by these ideas, we hope to answer the following questions in our research:
<list list-type="bullet">
<list-item><p>What knowledge representation model can capture sentiment-embedded events and their relationships in diaspora oral archives?</p></list-item>
<list-item><p>How can machine learning models be integrated effectively to extract and enrich information in the proposed knowledge representation model?</p></list-item>
<list-item><p>What are the meaningful patterns and insights of sentimental trajectories and event evolution in diaspora narratives?</p></list-item>
</list></p>
</sec>
<sec id="sec2">
<title>Related works</title>
<p>Some researchers began to pay attention to knowledge representation of diaspora oral archives by text mining and NLP. These latest technologies create unique possibilities for the analysis of oral history interviews (<xref rid="R16" ref-type="bibr">Pessanha and Salah, 2021</xref>), such as identifying key topics within the histories, including events, and social or political issues (<xref rid="R17" ref-type="bibr">Rieping, 2022</xref>; <xref rid="R4" ref-type="bibr">Brown and Shackel ,2023</xref>).</p>
<p>Recently, sentiment in diaspora archives has drawn some researchers&#x2019; attention. For instance, the First Days Project (<xref rid="R5" ref-type="bibr">Caswell and Mallick, 2014</xref>), Harvest Moon Oral History and the Flin Flon Heritage Project provide opportunities to acquire, describe, and preserve affective records that recount affective emotions (<xref rid="R9" ref-type="bibr">Grant, 2020</xref>). As for the automatic identification of complicated sentiment in diaspora archives, sentiment analysis is a relatively new approach for retrieving affective information in contexts of diaspora oral histories and interviews (<xref rid="R7" ref-type="bibr">Domingu&#x00E8;s et al., 2019</xref>; <xref rid="R15" ref-type="bibr">Ozdemir and Bergler, 2015</xref>).</p>
<p>To further extract and store complete events and their relations, a conceptual model and SEEGraph are introduced in this study to standardize knowledge representation and enhance affective narration. The EEG originates from the concept of the event knowledge graph, which emphasizes the realization of goals such as events logic mining, and generalization. Event knowledge graphs have previously been employed in areas such as news articles (<xref rid="R19" ref-type="bibr">Rudnik et al., 2019</xref>), and contemporary and historical events (<xref rid="R8" ref-type="bibr">Gottschalk et al., 2023</xref>). The SEEGraph is proposed to provide a foundational descriptive framework for the event extraction results in NLP, while also incorporating sentiment within the events.</p>
</sec>
<sec id="sec3">
<title>Research method</title>
<sec id="sec3_1">
<title>Knowledge representation model</title>
<p>The classes, properties, and hierarchical structures in the <xref rid="R2" ref-type="bibr">CIDOC </xref>CRM ontology are reused to construct the ontology of diaspora oral archives. <xref rid="R2" ref-type="bibr">CIDOC </xref>CRM is a high-level ontology that could play the role of the mediated schema in the information integration of cultural heritage data and their correlation with library and archive information (Bekiari et al., 2022). In <xref rid="R2" ref-type="bibr">CIDOC </xref>CRM classes like E53 Place, E2 Temporal Entity, E5 Event, E39 Actor, E55 Type are always connected and represented with an Event. The ontological framework of diaspora oral archives is shown in <xref ref-type="fig" rid="F1">Figure 1</xref>. Entities in the ontology include Event, Event Type, Action, Participant, Time, Location, and Sentiment. Relations between Events are temporal relation and causal relation. As for relations between entities, this research reveals spatio-temporal relations and spatio-sentiment relations.</p>
<fig id="F1">
<label>Figure 1.</label>
<caption><p>The ontological framework of diaspora oral archives</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c28-fig1.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
<sec id="sec3_2">
<title>Event evolutionary graph</title>
<p>Since the EEG is event-centered, including two types of nodes (events and entities) and three types of directed edges (between events, between events and entities, and between entities and entities). The structure of the EEG is formally expressed as G, as follows:</p>
<disp-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mfenced close="}" open="{"><mml:mrow><mml:mfenced><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>o</mml:mi></mml:mrow></mml:mfenced><mml:mo>&#x007C;</mml:mo><mml:mfenced close="}" open="{"><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>o</mml:mi></mml:mrow></mml:mfenced><mml:mo>&#x2208;</mml:mo><mml:mi>N</mml:mi><mml:mo>,</mml:mo><mml:mi>r</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>P</mml:mi><mml:mo>,</mml:mo><mml:mi>N</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x222A;</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:msub><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x222A;</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x222A;</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula>
<p>Here the nodes in the EEG are represented as N, mainly including event nodes N<sub>evt</sub> and entity nodes N<sub>ent</sub>. Meanwhile, the edges in the EEG are represented as P, mainly including relations P<sub>evt-evt</sub> between events, relations P<sub>evt-ent</sub> between events and entities, and relations P<sub>ent-ent</sub> between entities. In the EEG, event nodes, entities nodes, and their relations are determined by the knowledge representation model.</p>
<p>(1) Events.</p>
<p>An event refers to an objective event or state change consisting of one or more action characteristics participated by one or more arguments in a specific time period or a specific region. Considering the ontological structure, the event is formally expressed as follows:</p>
<disp-formula><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" display="block"><mml:mrow><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mfenced><mml:mrow><mml:mi>A</mml:mi><mml:mo>,</mml:mo><mml:mi>P</mml:mi><mml:mo>,</mml:mo><mml:mi>T</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi><mml:mo>,</mml:mo><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:mi>E</mml:mi><mml:mi>T</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula>
<p>Here, &#x2018;e&#x2019;, &#x2018;A&#x2019;, &#x2018;P&#x2019;, &#x2018;T&#x2019;, &#x2018;L&#x2019;, &#x2018;S&#x2019;, &#x2018;ET&#x2019; represents Event, Action, Participant, Time, Location, Sentiment and Event Type.</p>
<p>(2) Event relations.</p>
<p>The causal and temporal relations are the most common relations between events (<xref rid="R6" ref-type="bibr">Caselli and Vossen, 2017</xref>). Temporal relations can be defined as before, after, simultaneous, begins, ends, etc. The causal relation can be described in precise semantics or hidden in logic, which means it could be inferred (<xref rid="R14" ref-type="bibr">Liu et al., 2021</xref>).</p>
</sec>
<sec id="sec3_3">
<title>Event extraction workflow</title>
<p>We construct an event extraction workflow to construct the information as described in the knowledge representation model. There are four main components for the EE workflow:</p>
<p>(1) Key events and entities extraction.</p>
<p>The real-world events have different granularities, from the top-level themes to key events and then to event mentions corresponding to concrete actions (<xref rid="R23" ref-type="bibr">Zhang et al., 2022</xref>). In this context, we propose a new task, key event detection at the intermediate level, which aims to detect event blocks. Each archive is divided into several event blocks manually according to contents. Considering the spatial role that places play and the study&#x2019;s objectives, entities labeled as &#x2018;GPE&#x2019; (Geo-Political Entity) are extracted from each event block. Events associated with placenames are identified as key events. Additionally, other entities related to specific locations, including Participant, Action, and Time, are also extracted.</p>
<p>(2) Event classification.</p>
<p>The event type classification task is based on a taxonomy and automatic classifier. The Major Life Events Taxonomy is employed as the taxonomy of diaspora key events in this research. It is a U.S.-based list of major life changes that people experience (<xref rid="R11" ref-type="bibr">Haimson et al., 2021</xref>). In this taxonomy, <italic>&#x2018;life events&#x2019;</italic> are used as an umbrella term to encompass life experiences involving both moments and processes of change. It can be used to classify diasporas&#x2019; key events in their lives into 12 categories: Health, Financial, Relocation, Legal, Relationships, Family Relationships, Death, Career, Education, Lifestyle Change, Identity, and Societal.</p>
<p>(3) Sentiment analysis.</p>
<p>The connection between sentiment and space is particularly significant, as certain locations often evoke distinct emotional responses. Sentiment analysis concerning places can reveal their intuitive feelings about their immigration activities and events. Thus, Aspect-based sentiment analysis (ABSA) is applied to analyze the sentiments (<italic>&#x2018;positive&#x2019;, &#x2018;negative&#x2019;</italic> and <italic>&#x2018;neutral&#x2019;)</italic> of diasporas with the aspects of places (<xref rid="R20" ref-type="bibr">Syamala &#x0026; Nalini, 2019</xref>).</p>
<p>(4) Event relation extraction.</p>
<p>The purpose of event relation extraction is to extract relative clauses and relationship indicator words in the corpus based on syntactic relationships and matching rules, which are tuples in the form of &#x003C;event A, relation, event B&#x003E;. This article chooses the commonly used pattern-matching method to extract event relations. It summarizes event relation words and sentences with explicit conjunction (<xref ref-type="table" rid="T1">Table 1</xref>) and uses a predefined rule base to perform semantic relationship matching.</p>
<table-wrap id="T1">
<label>Table 1.</label>
<caption><p>The syntactic pattern and conjunction of event relation pattern</p></caption>
<table>
<thead>
<tr>
<th align="center" valign="top"><bold>Relations</bold></th>
<th align="center" valign="top"><bold>Properties</bold></th>
<th align="center" valign="top"><bold>Syntactic pattern</bold></th>
<th align="center" valign="top"><bold>Conjunction</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top" rowspan="4">Casual relation</td>
<td align="center" valign="top" rowspan="4">resultedIn/ resultedFrom</td>
<td align="center" valign="top">{Event_effect} &#x003C;Conj&#x003E; {Event_cause}</td>
<td align="center" valign="top">because, because of, since, as, for, etc.</td>
</tr>
<tr>
<td align="center" valign="top">{Event_cause} &#x003C;Conj&#x003E; {Event_effect}</td>
<td align="center" valign="top">because of this, so, thus, therefore, consequently, etc.</td>
</tr>
<tr>
<td align="center" valign="top">&#x003C;Conj&#x003E; {Event_effect} {Event_cause}</td>
<td align="center" valign="top">another important factor&#xFF0F;reason of</td>
</tr>
<tr>
<td align="center" valign="top">&#x003C;Conj&#x003E; {Event_cause} {Event_effect}</td>
<td align="center" valign="top">because, because of, since, as, for, etc.</td>
</tr>
<tr>
<td align="center" valign="top" rowspan="5">Temporal relation</td>
<td align="center" valign="top" rowspan="5">happenedBefore/ happenedAfter/ happenedSimultaneously</td>
<td align="center" valign="top">{Event_previous} &#x003C;Conj&#x003E; {Event_latter}</td>
<td align="center" valign="top">next, then, after that, later, etc.</td>
</tr>
<tr>
<td align="center" valign="top">{Event_latter} &#x003C;Conj&#x003E; {Event_previous}</td>
<td align="center" valign="top">previously, prior to this</td>
</tr>
<tr>
<td align="center" valign="top">{Event_simul1} &#x003C;Conj&#x003E; {Event_simul2}</td>
<td align="center" valign="top">simultaneously</td>
</tr>
<tr>
<td align="center" valign="top">&#x003C;Conj&#x003E;{Event_latter} {Event_previous}</td>
<td align="center" valign="top">before</td>
</tr>
<tr>
<td align="center" valign="top">&#x003C;Conj&#x003E;{Event_previous} {Event_latter}</td>
<td align="center" valign="top">after</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="sec4">
<title>Analysis and results</title>
<sec id="sec4_1">
<title>Data source</title>
<p>The author analyses 42 English transcripts of The South Asian Oral History Project (SAOHP) at the University of Washington Libraries. The SAOHP represents one of the first attempts in the U.S. to record pan-South Asian immigrant experiences in the Pacific Northwest using the medium of oral history. The SAOHP is marked by key historical events that drew South Asians to the United States. These interviews include important events in diasporas&#x2019; lives reflecting religious, linguistic, occupational, and gender diversity and provide rich insight into the changing experiences of South Asians in the Pacific Northwest. The interviews are in English digitalized and transformed into textual transcripts. Each interview lasts more than 60 minutes and the transcript contains over 20 pages of text. These transcripts were formatted as CSVs separated into chunks of text.</p>
</sec>
<sec id="sec4_2">
<title>Model evaluation</title>
<p>The spaCy library, DeBERTa and GPT3.5 are selected to conduct the event extraction workflow: entity extraction, event type extraction, sentiment analysis, event relation extraction. Precisely, the spaCy library, and GPT3.5 are conducted on entity extraction and event relation extraction. While in terms of the limitation of the spaCy library, only GPT3.5 are applied to event type extraction and sentiment analysis. Besides, a DeBERTa-based ABSA classifier is also utilized as a supervised machine learning method in sentiment analysis. About 900 manually annotated samples, are used as the baseline for the evaluation. The performance of these models is assessed using standard NLP metrics such as precision, recall and F1 score. The evaluation results are shown in <xref ref-type="table" rid="T2">Table 2</xref>.</p>
<table-wrap id="T2">
<label>Table 2.</label>
<caption><p>The evaluation of chosen models</p></caption>
<table>
<thead>
<tr>
<th align="left" valign="top" rowspan="2"><bold>Tasks</bold></th>
<th align="left" valign="top" rowspan="2"><bold>Methods</bold></th>
<th align="left" valign="top" colspan="3"><bold>Evaluation metrics</bold></th>
</tr>
<tr>
<th align="left" valign="top"><bold>Precision</bold></th>
<th align="left" valign="top"><bold>Recall</bold></th>
<th align="left" valign="top"><bold>F1</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="top" colspan="2">Manual annotation</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">1</td>
<td align="center" valign="top">1</td>
</tr>
<tr>
<td align="center" valign="top" rowspan="2">Entity extraction</td>
<td align="center" valign="top">The spaCy library</td>
<td align="center" valign="top">42.37%</td>
<td align="center" valign="top">52.49%</td>
<td align="center" valign="top">46.89%</td>
</tr>
<tr>
<td align="center" valign="top">GPT3.5</td>
<td align="center" valign="top">74.26%</td>
<td align="center" valign="top">86.80%</td>
<td align="center" valign="top">80.04%</td>
</tr>
<tr>
<td align="center" valign="top">Event type extraction</td>
<td align="center" valign="top">GPT3.5</td>
<td align="center" valign="top">69.75%</td>
<td align="center" valign="top">77.49%</td>
<td align="center" valign="top">73.42%</td>
</tr>
<tr>
<td align="center" valign="top" rowspan="2">Sentiment analysis</td>
<td align="center" valign="top">GPT3.5</td>
<td align="center" valign="top">89.38%</td>
<td align="center" valign="top">92.39%</td>
<td align="center" valign="top">90.86%</td>
</tr>
<tr>
<td align="center" valign="top">DeBERTa</td>
<td align="center" valign="top">93.54%</td>
<td align="center" valign="top">92.39%</td>
<td align="center" valign="top">92.96%</td>
</tr>
<tr>
<td align="center" valign="top" rowspan="2">Event relation extraction</td>
<td align="center" valign="top">The spaCy library</td>
<td align="center" valign="top">30.25%</td>
<td align="center" valign="top">35.25%</td>
<td align="center" valign="top">32.56%</td>
</tr>
<tr>
<td align="center" valign="top">GPT3.5</td>
<td align="center" valign="top">12.42%</td>
<td align="center" valign="top">10.49%</td>
<td align="center" valign="top">11.37%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The results demonstrate GPT-3.5 achieves the highest performance in the entity extraction task, while it also demonstrates good performance in event type extraction. For sentiment analysis, the DeBERTa-based ABSA model delivers the best results. However, for event relation extraction, manual validation and complementary methods remain necessary to ensure accuracy.</p>
<p>For each task, the model with the best performance will be utilized to complete all extraction and classification processes. And the extracted data is checked and supplemented through manual inspection. There are 1533 key events determined altogether. And a total of 1043 temporal relationships and 632 causal relationships are ultimately identified.</p>
</sec>
<sec id="sec4_3">
<title>SEEGraph construction</title>
<p>The proposed event extraction workflow is applied to extracted defined entities, events, and relations, which are imported into the Neo4j database to realize the mapping of knowledge framework and ontology classes to properties and instances. To sum up, there are 1533 event nodes N<sub>evt</sub>, 19435 entity nodes N<sub>ent</sub>, 1675 relations P<sub>evt-evt</sub> between events, 25920 relations P<sub>evt-ent</sub> between events and entities, and 14825 relations P<sub>ent-ent</sub> between entities. Hence, the SEEGraph is generated which provides a structured framework for representing and reasoning about diasporas&#x2019; events.</p>
<sec id="sec4_3_1">
<title>Event evolution analysis</title>
<p>The SEEGraph traces the sequences, patterns, and trajectories of events influencing diaspora movements, transformations, and experiences over time. For example, Seattle had its large share of Indians who were predominately engineers who worked for Boeing in the 1960s and 1970s. The job offers from Boeing have become a key facilitator for diasporas to immigrate during that period. As for AM (<xref ref-type="fig" rid="F2">Figure 2</xref>), after getting a job at Boeing, he moved to Seattle. He made friends with colleagues who are also Indian diasporas and was admitted to the engineering college which collaborated with Boeing company.</p>
<fig id="F2">
<label>Figure 2.</label>
<caption><p>The event evolutionary process according to job offers from Boeing</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c28-fig2.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
<sec id="sec4_3_2">
<title>Spatio-temporal analysis</title>
<p>The spatio-temporal analysis creates a multidimensional representation of spatial movements and temporal changes. The whole picture of interviewees&#x2019; immigration activities is revealed in <xref ref-type="fig" rid="F3">Figure 3</xref>. The nodes are the places they have stayed for more than one year and the connecting curves show the flows and years of immigration. We can conclude four types of immigration modes: single-directional, round trip, multi-directional, and internal trip.</p>
<fig id="F3">
<label>Figure 3.</label>
<caption><p>Interviewees&#x2019; immigration routes and modes</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c28-fig3.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
<sec id="sec4_3_3">
<title>Spatio-sentiment analysis</title>
<p>The spatio-sentiment analysis creates a rich and context-aware representation of the diasporas&#x2019; sentiment towards geographical entities, especially their destination and homeland. As is shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, In terms of their sentiment toward the most frequently mentioned locations, their comments on the USA and American cities are one-sided, and positive perspectives are much more than negative ones. According to their sentiment towards their homeland, the feelings are more complicated and conflicted. An almost equal quantity of positive and negative opinions indicates a mixed reviews attitude.</p>
<fig id="F4">
<label>Figure 4.</label>
<caption><p>The diasporas&#x2019; sentiment toward main locations</p></caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="images\c28-fig4.jpg"><alt-text>none</alt-text></graphic>
</fig>
</sec>
</sec>
</sec>
<sec id="sec5">
<title>Discussion and conclusion</title>
<p>The analysis of diaspora oral archives serves as a pathway to represent diversified voices in repositories. In this research, the ensemble framework is introduced not only to realize the semantic, correlation, and structured expression of knowledge units such as events, entities, and attributes but also to clearly reveal the dynamic evolution rules and patterns between historical and personal events. this study has innovations and contributions in the following aspects:</p>
<p>Firstly, unlike previous event extraction conducted using relatively high-quality data such as media articles (<xref rid="R13" ref-type="bibr">Li et al., 2018</xref>; <xref rid="R19" ref-type="bibr">Rudnik et al., 2019</xref>), the presented study processes the information extraction of diaspora oral archives transformed into structured text. Specifically, the dataset used in this study is differentiated in the following aspects: (1) longer and complicated context in the interview conversation with imbalanced length, (2) complex event content and relations due to interviewees&#x2019; different narratives, and (3) including many idiomatic expressions, slang, colloquialisms, or repeated expressions in oral narratives. To solve this, the ontological model is proposed to normalize the domain of diasporas&#x2019; major life events. Instead of defining event as event trigger words and event arguments as in previous research (<xref rid="R10" ref-type="bibr">Guan et al., 2022</xref>), this study refines it as action, participant, time, location, sentiment, and event type. So, the extracted information in the event is more abundant and comprehensive. It fills research gaps in event detection which mainly focuses on political events or news corpora. Besides, it&#x2019;s the first time that the EEG combined with NLP pipelines to analyze diaspora oral archives. The performance of spaCy, LLMs, etc. has been examined in each task. The ensemble framework with multiple models is expected to be used in the processing of other diaspora oral archives and low-quality textual transcripts of oral materials in the future.</p>
<p>Secondly, the sentiment is involved, which is also first introduced into the EEG and diaspora oral archives. This study helps to reveal the reasons and results of the diasporas, how they are influenced and exert an influence on the mobility of the migration, and how they build up a sense of belonging toward the destination country. This led to intense confrontations, with both positive and negative emotions. Though in this research the sentiment is coarse-grained, it can be seen as the first attempt to understand and respond appropriately to diasporas&#x2019; emotional reactions. With embedded sentiment, diasporas&#x2019; affective assessment of places and events deepens the understanding of their immigration activities.</p>
</sec>
</body>
<back>
<ack>
<title>Acknowledgements</title>
<p>I would like to express my gratitude to Prof. Charles Jeurgens for his careful reading of the manuscript and his valuable feedback and suggestions.</p>
<p>The oral history transcripts used in this study are part of the South Asian Oral History Project (SAOHP) collection at the University of Washington Library. While these materials are publicly accessible, their copyright remains with the creators (interviewees and interviewers). This study uses the data strictly for academic research purposes in compliance with the principles of <italic>&#x2018;Fair Use.&#x2019;</italic> The data has been anonymized, and no identifiable personal information is disclosed. Any results or findings are derived from analytical methods and do not reproduce or distribute the original content of the transcripts.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="R1"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Bang</surname><given-names>Y.</given-names></name><name><surname>Cahyawijaya</surname><given-names>S.</given-names></name><name><surname>Lee</surname><given-names>N.</given-names></name><name><surname>Dai</surname><given-names>W.</given-names></name><name><surname>Su</surname><given-names>D.</given-names></name><name><surname>Wilie</surname><given-names>B.</given-names></name><name><surname>Fung</surname><given-names>P.</given-names></name></person-group><year>2023</year><article-title>A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity</article-title><source>arXiv preprint arXiv:2302.04023</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2302.04023.">https://doi.org/10.48550/arXiv.2302.04023.</ext-link></element-citation></ref>
<ref id="R2"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Bekiari</surname><given-names>C.</given-names></name><name><surname>Bruseker</surname><given-names>G.</given-names></name><name><surname>Doerr</surname><given-names>M.</given-names></name><name><surname>Ore</surname><given-names>C. E.</given-names></name><name><surname>Stead</surname><given-names>S.</given-names></name><name><surname>Velios</surname><given-names>A.</given-names></name></person-group><year>2021</year><comment>April</comment><article-title>Volume A: Definition of the <xref rid="R2" ref-type="bibr">CIDOC </xref>conceptual reference model</article-title><comment>Version 7.1</comment><ext-link ext-link-type="uri" xlink:href="https://www.cidoc- crm.org/sites/default/files/cidoc_crm_version_7.1.2.pdf">https://www.cidoc- crm.org/sites/default/files/cidoc_crm_version_7.1.2.pdf</ext-link></element-citation></ref>
<ref id="R3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brinkhurst</surname><given-names>E.</given-names></name></person-group><year>2012</year><article-title>Archives and Access: Reaching Out to the Somali Community of London&#x2019;s King&#x2019;s Cross</article-title><source>Ethnomusicology Forum</source><volume>21</volume><issue>2</issue><fpage>243</fpage><lpage>258</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/17411912.2012.689470">https://doi.org/10.1080/17411912.2012.689470</ext-link></element-citation></ref>
<ref id="R4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brown</surname><given-names>M.</given-names></name><name><surname>Shackel</surname><given-names>P.</given-names></name></person-group><year>2023</year><article-title>Text Mining Oral Histories in Historical Archaeology&#x201D;</article-title><source>International Journal of Historical Archaeology</source><volume>27</volume><fpage>865</fpage><lpage>881</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/ 10.1007/s10761- 022-00680-5.">https://doi.org/ 10.1007/s10761- 022-00680-5.</ext-link></element-citation></ref>
<ref id="R5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Caswell</surname><given-names>M.</given-names></name><name><surname>Mallick</surname><given-names>S.</given-names></name></person-group><year>2014</year><article-title>Collecting the easily missed stories: digital participatory microhistory and the South Asian American Digital Archive</article-title><source>Archives and Manuscripts, Taylor &#x0026; Francis</source><volume>42</volume><issue>1</issue><fpage>73</fpage><lpage>86</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1080/01576895.2014.880931.">https://doi.org/10.1080/01576895.2014.880931.</ext-link></element-citation></ref>
<ref id="R6"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Caselli</surname><given-names>T.</given-names></name><name><surname>Vossen</surname><given-names>P.</given-names></name></person-group><year>2017</year><comment>August</comment><article-title>The event storyline corpus: A new benchmark for causal and temporal relation extraction</article-title><source>In Proceedings of the Events and Stories in the News Workshop</source><fpage>77</fpage><lpage>86</lpage><comment>Association for Computational Linguistics</comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18653/v1/W17-2711.">https://doi.org/10.18653/v1/W17-2711.</ext-link></element-citation></ref>
<ref id="R7"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Domingu&#x00E8;s</surname><given-names>C.</given-names></name><name><surname>Jolivet</surname><given-names>L.</given-names></name><name><surname>Brando</surname><given-names>C.</given-names></name><name><surname>Cargill</surname><given-names>M.</given-names></name></person-group><year>2019</year><article-title>Place and Sentiment-based Life story Analysis. From the Spanish Republican Army to the French Resistance</article-title><source>Revue Fran&#x00E7;aise Des Sciences de l&#x2019;information et de La Communication</source><volume>17</volume><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.4000/rfsic.7228.">https://doi.org/10.4000/rfsic.7228.</ext-link></element-citation></ref>
<ref id="R8"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Gottschalk</surname><given-names>S.</given-names></name><name><surname>Kacupaj</surname><given-names>E.</given-names></name><name><surname>Abdollahi</surname><given-names>S.</given-names></name><name><surname>Alves</surname><given-names>D.</given-names></name><name><surname>Amaral</surname><given-names>G.</given-names></name><name><surname>Koutsiana</surname><given-names>E.</given-names></name><name><surname>Thakkar</surname><given-names>G.</given-names></name></person-group><year>2023</year><article-title>OEKG: The open event knowledge graph</article-title><source>arXiv preprint arXiv:2302.14688</source><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.48550/arXiv.2302.14688.">https://doi.org/10.48550/arXiv.2302.14688.</ext-link></element-citation></ref>
<ref id="R9"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Grant</surname><given-names>K.A.</given-names></name></person-group><year>2020</year><source>Affective Collections: Exploring Care Practices in Digital Community Heritage Projects (Unpublished master thesis)</source><publisher-name>University of Alberta</publisher-name><publisher-loc>Alberta, Canada</publisher-loc></element-citation></ref>
<ref id="R10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Guan</surname><given-names>S.</given-names></name><name><surname>Cheng</surname><given-names>X.</given-names></name><name><surname>Bai</surname><given-names>L.</given-names></name><name><surname>Zhang</surname><given-names>F.</given-names></name><name><surname>Li</surname><given-names>Z.</given-names></name><name><surname>Zeng</surname><given-names>Y.</given-names></name><name><surname>Guo</surname><given-names>J.</given-names></name></person-group><year>2022</year><article-title>What is event knowledge graph: A survey</article-title><source>IEEE Transactions on Knowledge and Data Engineering</source><volume>35</volume><issue>7</issue><fpage>7569</fpage><lpage>7589</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1109/TKDE.2022.3180362.">https://doi.org/10.1109/TKDE.2022.3180362.</ext-link></element-citation></ref>
<ref id="R11"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Haimson</surname><given-names>O.L.</given-names></name><name><surname>Carter</surname><given-names>A.J.</given-names></name><name><surname>Corvite</surname><given-names>S.</given-names></name><name><surname>Wheeler</surname><given-names>B.</given-names></name><name><surname>Wang</surname><given-names>L.</given-names></name><name><surname>Liu</surname><given-names>T.</given-names></name><name><surname>Lige</surname><given-names>A.</given-names></name></person-group><year>2021</year><article-title>The major life events taxonomy: Social readjustment, social media information sharing, and online network separation during times of life transition</article-title><source>Journal of the Association for Information Science and Technology</source><volume>72</volume><issue>7</issue><fpage>933</fpage><lpage>947</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1002/asi.24455.">https://doi.org/10.1002/asi.24455.</ext-link></element-citation></ref>
<ref id="R12"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Jones</surname><given-names>M.</given-names></name></person-group><year>2019</year><article-title>Archiving the trauma diaspora: Affective artifacts in the higher education arts classroom</article-title><source>Marilyn Zurmuehlen Work. Pap. Art Educ</source><comment>2019</comment><fpage>1</fpage><lpage>14</lpage></element-citation></ref>
<ref id="R13"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Li</surname><given-names>Z.</given-names></name><name><surname>Ding</surname><given-names>X.</given-names></name><name><surname>Liu</surname><given-names>T.</given-names></name></person-group><year>2018</year><comment>July</comment><chapter-title>Constructing narrative event evolutionary graph for script event prediction</chapter-title><source>In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence</source><publisher-loc>Stockholm, Sweden</publisher-loc><fpage>4201</fpage><lpage>4207</lpage><publisher-name>AAAI Press</publisher-name></element-citation></ref>
<ref id="R14"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>Y.</given-names></name><name><surname>Tian</surname><given-names>J.</given-names></name><name><surname>Zhang</surname><given-names>L.</given-names></name><name><surname>Feng</surname><given-names>Y.</given-names></name><name><surname>Fang</surname><given-names>H.</given-names></name></person-group><year>2021</year><chapter-title>A Survey on Event Relation Identification</chapter-title><source>In Knowledge Graph and Semantic Computing: Knowledge Graph and Cognitive Intelligence: 5th China Conference</source><comment>CCKS 2020</comment><publisher-loc>Nanchang, China</publisher-loc><comment>November 12&#x2013;15, 2020, Revised Selected Papers</comment><fpage>173</fpage><lpage>184</lpage><publisher-name>Springer Singapore</publisher-name><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/978-981-16-1964-9_14.">https://doi.org/10.1007/978-981-16-1964-9_14.</ext-link></element-citation></ref>
<ref id="R15"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Ozdemir</surname><given-names>C.</given-names></name><name><surname>Bergler</surname><given-names>S.</given-names></name></person-group><year>2015</year><article-title>CLaC-SentiPipe: SemEval2015 subtasks 10 B, E, and task 11</article-title><source>Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)</source><ext-link ext-link-type="uri" xlink:href="https://aclanthology.org/S15-2081.pdf">https://aclanthology.org/S15-2081.pdf</ext-link></element-citation></ref>
<ref id="R16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pessanha</surname><given-names>F.</given-names></name><name><surname>Salah</surname><given-names>A. A.</given-names></name></person-group><year>2021</year><article-title>A computational look at oral history archives</article-title><source>ACM Journal on Computing and Cultural Heritage (JOCCH)</source><volume>15</volume><issue>1</issue><fpage>1</fpage><lpage>16</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1145/3477605.">https://doi.org/10.1145/3477605.</ext-link></element-citation></ref>
<ref id="R17"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Rieping</surname><given-names>H.A.</given-names></name></person-group><year>2022</year><source>Audio Segmenting and Natural Language Processing in Oral History Archiving</source><comment>(Unpublished doctoral dissertation)</comment><publisher-name>Massachusetts Institute of Technology</publisher-name><publisher-loc>Massachusetts, U.S.A</publisher-loc></element-citation></ref>
<ref id="R18"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Roeschley</surname><given-names>A.</given-names></name><name><surname>Kim</surname><given-names>J.</given-names></name></person-group><year>2019</year><article-title>&#x201C;Something that feels like a community&#x201D;: the role of personal stories in building community-based participatory archives</article-title><source>Archival Science</source><volume>19</volume><fpage>27</fpage><lpage>49</lpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1007/s10502-019-09302-2.">https://doi.org/10.1007/s10502-019-09302-2.</ext-link></element-citation></ref>
<ref id="R19"><element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Rudnik</surname><given-names>C.</given-names></name><name><surname>Ehrhart</surname><given-names>T.</given-names></name><name><surname>Ferret</surname><given-names>O.</given-names></name><name><surname>Teyssou</surname><given-names>D.</given-names></name><name><surname>Troncy</surname><given-names>R.</given-names></name><name><surname>Tannier</surname><given-names>X.</given-names></name></person-group><year>2019</year><comment>May</comment><chapter-title>Searching news articles using an event knowledge graph leveraged by wikidata</chapter-title><source>In Companion proceedings of the 2019 world wide web conference</source><fpage>1232</fpage><lpage>1239</lpage><publisher-name>Association for Computing Machinery</publisher-name></element-citation></ref>
<ref id="R20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Syamala</surname><given-names>M.</given-names></name><name><surname>Nalini</surname><given-names>N. J.</given-names></name></person-group><year>2019</year><article-title>A deep analysis on aspect-based sentiment text classification approaches</article-title><source>International Journal of Advanced Trends in Computer Science and Engineering</source><volume>8</volume><issue>5</issue><fpage>1795</fpage><lpage>1801</lpage></element-citation></ref>
<ref id="R21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Thomson</surname><given-names>A.</given-names></name></person-group><year>1998</year><article-title>Fifty years on: An international perspective on oral history</article-title><source>Journal of American History</source><volume>85</volume><issue>2</issue><fpage>581</fpage><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.2307/2567753.">https://doi.org/10.2307/2567753.</ext-link></element-citation></ref>
<ref id="R22"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Yang</surname><given-names>H.</given-names></name><name><surname>Zeng</surname><given-names>B.</given-names></name><name><surname>Xu</surname><given-names>M.</given-names></name><name><surname>Wang</surname><given-names>T.</given-names></name></person-group><year>2021</year><article-title>Back to Reality: Leveraging Pattern-driven Modeling to Enable Affordable Sentiment Dependency Learning</article-title><source>ArXiv Preprint</source><ext-link ext-link-type="uri" xlink:href="https://www.researchgate.net/profile/Heng-Yang-17/publication/355391949_Back_to_Reality_Leveraging_Pattern-driven_Modeling_to_Enable_Affordable_Sentiment_Dependency_Learning/links/6189682107be5f31b7590ae3/Back-to-Reality-Leveraging-Pattern-driven-Modeling-to-Enable-Affordable-Sentiment-Dependency-Learning.pdf">https://www.researchgate.net/profile/Heng-Yang-17/publication/355391949_Back_to_Reality_Leveraging_Pattern-driven_Modeling_to_Enable_Affordable_Sentiment_Dependency_Learning/links/6189682107be5f31b7590ae3/Back-to-Reality-Leveraging-Pattern-driven-Modeling-to-Enable-Affordable-Sentiment-Dependency-Learning.pdf</ext-link></element-citation></ref>
<ref id="R23"><element-citation publication-type="other"><person-group person-group-type="author"><name><surname>Zhang</surname><given-names>Y.</given-names></name><name><surname>Guo</surname><given-names>F.</given-names></name><name><surname>Shen</surname><given-names>J.</given-names></name><name><surname>Han</surname><given-names>J.</given-names></name></person-group><year>2022</year><comment>August</comment><article-title>Unsupervised key event detection from massive text corpora</article-title><source>In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining</source><fpage>2535</fpage><lpage>2544</lpage><comment>Association for Computing Machinery</comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1145/3534678.3539395.">https://doi.org/10.1145/3534678.3539395.</ext-link></element-citation></ref>
</ref-list>
</back>
</article>