You are in:   Home > Precision vs. Recall


Existing Search Methods

Precision vs. Recall


How does Nominex work?





Precision vs. Recall

Precision and Recall are two widely used measures for evaluating the quality of results in fields such as information retrieval and statistical classification.

Precision can be seen as a measure of exactness or fidelity, whereas Recall is a measure of completeness. One would ideally try to maximize both, but in a real-world situation it's often difficult to achieve this.

In the context of surname matching one would aim for an automated process to award high scores to name pairs that manual inspection would say are equivalent. This pre-supposes that all surname spellings can be put into groups with well-defined boundaries. In practice, however, it may be difficult to say precisely where a particular spelling belongs. For instance, does the unusual variant Ellesander belong with Alexander? Then, since Wills and Wells are two separate surnames should they be kept apart, or do they get mis-recorded so frequently that one would want to keep them together? And should Birkinshaw be grouped with Burtonshaw?


Precision can be defined as the number of true positives divided by the total number of elements labeled as belonging to that class (i.e. true positives plus false positives). However, it may be difficult to define exactly what constitutes the class of elements that belong, i.e. the set of surname spellings that are ‘sufficiently similar’ to the original surname. One way would be to look at the name pairs that exceed a score of, say, 75%, and then to count the number of instances that carry the same standardized form. This assumes that the dataset already includes manually assigned standard forms, and of course the result is affected by the threshold chosen.


Recall can be defined as the number of true positives divided by the sum of true positives and false negatives. A perfect recall score of 1.0 means that all relevant documents were retrieved by the search, but says nothing about how many irrelevant documents were also retrieved. Thus in the current context: By lowering the threshold to say 50% one might capture all of the surname spellings deemed to be "correct", but at the expense of including many that manual inspection would say don’t belong.

See also the Wikipedia article.

In the current context it’s probably impossible to design a system that achieves perfect Precision and Recall, except (arguably) by manually assigning a score to each name pair. This might work for a given test database, at the expense of many months of work. But in practice a new dataset of any size tends to throw up many fresh spellings, and hence much more work is needed to incorporate these.