Benchmarks#
Dataset specifying the best configuration#
D1 |
D2 |
D3 |
D5 |
D8 |
D10 |
|||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Recall |
Precision |
F1 |
Recall |
Precision |
F1 |
Recall |
Precision |
F1 |
Recall |
Precision |
F1 |
Recall |
Precision |
F1 |
Recall |
Precision |
F1 |
|
D1 |
1.000 |
0.788 |
0.881 |
1.000 |
0.263 |
0.416 |
0.989 |
0.281 |
0.438 |
1.000 |
0.322 |
0.488 |
0.933 |
0.417 |
0.576 |
1.000 |
0.263 |
0.416 |
D2 |
0.000 |
0.000 |
0.000 |
0.942 |
0.952 |
0.947 |
0.695 |
0.798 |
0.743 |
0.851 |
0.922 |
0.885 |
0.270 |
0.997 |
0.424 |
0.746 |
0.853 |
0.796 |
D3 |
0.037 |
0.539 |
0.069 |
0.431 |
0.354 |
0.389 |
0.674 |
0.584 |
0.625 |
0.486 |
0.425 |
0.454 |
0.092 |
0.545 |
0.158 |
0.523 |
0.472 |
0.496 |
D4 |
0.000 |
0.000 |
0.000 |
0.808 |
0.794 |
0.801 |
0.920 |
0.702 |
0.796 |
0.931 |
0.844 |
0.886 |
0.038 |
0.961 |
0.072 |
0.845 |
0.659 |
0.741 |
D5 |
0.096 |
0.812 |
0.172 |
0.856 |
0.288 |
0.431 |
0.673 |
0.235 |
0.348 |
0.792 |
0.330 |
0.466 |
0.753 |
0.671 |
0.709 |
0.859 |
0.302 |
0.446 |
D6 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.000 |
0.474 |
0.748 |
0.580 |
0.805 |
0.924 |
0.860 |
0.014 |
0.970 |
0.028 |
0.858 |
0.940 |
0.897 |
Datasets specs#
Dataset |
E1 |
E2 |
Entities E1 |
Entities E2 |
Duplicates |
|---|---|---|---|---|---|
D1 |
Restaurants 1 |
Restaurants 2 |
339 |
2,256 |
89 |
D2 |
Abt |
Buy |
1,076 |
1,076 |
1,076 |
D3 |
Amazon |
Google Pr. |
1,354 |
3,039 |
1,104 |
D4 |
IMDb |
TMDb |
5,118 |
6,056 |
1,968 |
D5 |
Walmart |
Amazon |
2,554 |
22,074 |
853 |
D6 |
IMDb |
DBpedia |
27,615 |
23,182 |
22,863 |
Configurations specifics#
Block Building |
Blocking Cleaning |
Comprison Cleaning |
Entity Matching |
Entity Clustering |
||||||
|---|---|---|---|---|---|---|---|---|---|---|
Method |
Ratio |
Pruning algorithm |
Weighting Scheme |
Algorithm |
Representation Model |
Similarity Function |
Algorithm |
Similarity Threshold |
||
D1 |
Standard Blocking |
Block Filtering |
0.050 |
BLAST |
ARCS |
Profile Matcher |
CHARACTER_BIGRAMS |
COSINE_SIMILARITY |
Unique Mapping Clustering |
0.90 |
D2 |
Standard Blocking |
Block Filtering |
0.900 |
WEP |
EJS |
Profile Matcher |
CHARACTER_TRIGRAMS_TF_IDF |
ARCS_SIMILARITY |
Unique Mapping Clustering |
0.90 |
D3 |
Standard Blocking |
Block Filtering |
0.600 |
WNP |
ARCS |
Profile Matcher |
TOKEN_BIGRAMS_TF_IDF |
COSINE_SIMILARITY |
Unique Mapping Clustering |
0.05 |
D4 |
Standard Blocking |
Block Filtering |
0.925 |
CEP |
ECBS |
Profile Matcher |
CHARACTER_FOURGRAMS_TF_IDF |
ARCS_SIMILARITY |
Unique Mapping Clustering |
0.85 |
D5 |
Standard Blocking |
Block Filtering |
0.075 |
WEP |
ARCS |
Profile Matcher |
CHARACTER_BIGRAMS_TF_IDF |
COSINE_SIMILARITY |
Unique Mapping Clustering |
0.65 |
D6 |
Standard Blocking |
Block Filtering |
0.575 |
BLAST |
X2 |
Profile Matcher |
TOKEN_UNIGRAMS_TF_IDF |
ARCS_SIMILARITY |
Unique Mapping Clustering |
0.25 |