The Five Generations of Entity Resolution on Web Data

The Five Generations of Entity Resolution on Web Data#

Tutorial at 24th International Conference on Web Engineering (ICWE 2024)

Location: Tampere, Finland

Date: 17-20 June 2024


  • Konstantinos Nikoletos, University of Athens

  • Ekaterini Ioannou, Tilburg University

  • George Papadakis, University of Athens


Entity Resolution constitutes a core data integration task that has attracted a bulk of works on improving its effectiveness and time efficiency. This tutorial provides a comprehensive overview of the field, distinguishing relevant methods into five main generations. The first one targets Veracity in the context of structured data with a clean schema. The second generation extends its focus to cover Volume, as well, leveraging multi-core or massive parallelization to process large-scale datasets. The third generation addresses the additional challenge of Variety, targeting voluminous, noisy, semi-structured, and highly heterogeneous data from the Semantic Web. The fourth generation also tackles Velocity so as to process data collections of a continuously increasing volume. The latest works, though, belong to the fifth generation, involving pre-trained (large) language models which heavily rely on external knowledge to address all four Vs with high effectiveness.


to be updated…


to be updated…


to be updated…


to be updated…