Benchmarks#

Dataset specifying the best configuration#

D1

D2

D3

D5

D8

D10

Recall

Precision

F1

Recall

Precision

F1

Recall

Precision

F1

Recall

Precision

F1

Recall

Precision

F1

Recall

Precision

F1

D1

1.000

0.788

0.881

1.000

0.263

0.416

0.989

0.281

0.438

1.000

0.322

0.488

0.933

0.417

0.576

1.000

0.263

0.416

D2

0.000

0.000

0.000

0.942

0.952

0.947

0.695

0.798

0.743

0.851

0.922

0.885

0.270

0.997

0.424

0.746

0.853

0.796

D3

0.037

0.539

0.069

0.431

0.354

0.389

0.674

0.584

0.625

0.486

0.425

0.454

0.092

0.545

0.158

0.523

0.472

0.496

D4

0.000

0.000

0.000

0.808

0.794

0.801

0.920

0.702

0.796

0.931

0.844

0.886

0.038

0.961

0.072

0.845

0.659

0.741

D5

0.096

0.812

0.172

0.856

0.288

0.431

0.673

0.235

0.348

0.792

0.330

0.466

0.753

0.671

0.709

0.859

0.302

0.446

D6

0.000

0.000

0.000

0.000

0.000

0.000

0.474

0.748

0.580

0.805

0.924

0.860

0.014

0.970

0.028

0.858

0.940

0.897

Datasets specs#

Dataset

E1

E2

Entities E1

Entities E2

Duplicates

D1

Restaurants 1

Restaurants 2

339

2,256

89

D2

Abt

Buy

1,076

1,076

1,076

D3

Amazon

Google Pr.

1,354

3,039

1,104

D4

IMDb

TMDb

5,118

6,056

1,968

D5

Walmart

Amazon

2,554

22,074

853

D6

IMDb

DBpedia

27,615

23,182

22,863

Configurations specifics#

Block Building

Blocking Cleaning

Comprison Cleaning

Entity Matching

Entity Clustering

Method

Ratio

Pruning algorithm

Weighting Scheme

Algorithm

Representation Model

Similarity Function

Algorithm

Similarity Threshold

D1

Standard Blocking

Block Filtering

0.050

BLAST

ARCS

Profile Matcher

CHARACTER_BIGRAMS

COSINE_SIMILARITY

Unique Mapping Clustering

0.90

D2

Standard Blocking

Block Filtering

0.900

WEP

EJS

Profile Matcher

CHARACTER_TRIGRAMS_TF_IDF

ARCS_SIMILARITY

Unique Mapping Clustering

0.90

D3

Standard Blocking

Block Filtering

0.600

WNP

ARCS

Profile Matcher

TOKEN_BIGRAMS_TF_IDF

COSINE_SIMILARITY

Unique Mapping Clustering

0.05

D4

Standard Blocking

Block Filtering

0.925

CEP

ECBS

Profile Matcher

CHARACTER_FOURGRAMS_TF_IDF

ARCS_SIMILARITY

Unique Mapping Clustering

0.85

D5

Standard Blocking

Block Filtering

0.075

WEP

ARCS

Profile Matcher

CHARACTER_BIGRAMS_TF_IDF

COSINE_SIMILARITY

Unique Mapping Clustering

0.65

D6

Standard Blocking

Block Filtering

0.575

BLAST

X2

Profile Matcher

TOKEN_UNIGRAMS_TF_IDF

ARCS_SIMILARITY

Unique Mapping Clustering

0.25