摘要

Deterministic record linkage (RL) is frequently regarded as a rival to more sophisticated strategies like probabilistic RL. We investigate the erect of combining deterministic linkage with other linkage techniques. For this task, we use a simple deterministic linkage strategy as a preceding filter: a data pair is classified as 'match' if all values of attributes considered agree exactly, otherwise as 'nonmatch'. This strategy is separately combined with two probabilistic RL methods based on the Fellegi-Sunter model and with two classification tree methods (CART and Bagging). An empirical comparison was conducted on two real data sets. We used four different partitions into training data and test data to increase the validity of the results. In almost all cases, application of deterministic linkage as a preceding filter leads to better results compared to the omission of such a pre-filter, and overall classification trees exhibited best results. On all data sets, probabilistic RL only profited from deterministic linkage when the underlying probabilities were estimated before applying deterministic linkage. When using a pre-filter for subtracting definite cases, the underlying population of data pairs changes. It is crucial to take this into account for model-based probabilistic RL.

全文