Inconsistency-driven approach for human-in-the-loop entity matching

Authors

DOI:

https://doi.org/10.47989/ir30iConf47140

Keywords:

Active learning, Human-in-the-Loop, Data integration

Abstract

Introduction. Entity matching is a fundamental operation in a wide range of information management applications and a tremendous number of methods have been proposed to address the problem. Human-in-the-loop entity matching is a human-AI collaborative approach which is effective when the data for entity matching is incomplete or requires domain knowledge. A typical human-in-the-loop approach is to allow a machine-learning-based matcher to ask humans to match entities when it cannot match them with high confidence. However, ML-based matchers cannot avoid the unknown-unknown problem, i.e., they can resolve the entities incorrectly with high confidence.

Method. This paper addresses an inconsistency-based method to deal with this problem. The method asks humans to resolve the entities when we find inconsistency in the transitivity property behind entity matching. For example, if a matcher returns a positive result only for two combinations among three entities, the result is inconsistent.

Analysis. This paper shows an implementation of our idea in similarity-based blocking method and Bayesian inference and explains the result of an extensive set of experiments that reveals how and when the method is effective.

Results. The result showed that the inconsistency-based sampling selects very different entity pairs compared to other sampling strategies and that a simple hybrid strategy performs well in many practical situations.

Conclusion. The results indicate our approach complements any existing matcher that can cause the unknown-unknown problem in entity matching.

Published

2025-03-11

How to Cite

Ito, H., Koizumi, T., Yoshimoto, R., Fukushima, Y., Harada, T., & Morishima, A. (2025). Inconsistency-driven approach for human-in-the-loop entity matching. Information Research an International Electronic Journal, 30(iConf), 1024–1038. https://doi.org/10.47989/ir30iConf47140

Issue

Section

Peer-reviewed papers

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.