Nominex

You are in:   Home > Existing Search Methods

Home

Existing Search Methods

Overview

NameX

Precision vs. Recall

Demo

How does Nominex work?

FAQ

Links

References

Acknowledgements

Overview of Existing Search Methods

Various methods have been developed over the years to deal with the problem of variation in surname spelling, some of which are applicable only to printed/static name indexes, others being more relevant to computerised databases.

Static Methods

The grouping of similar sounding surnames together is a solution often used in 'static' databases such as indexes in print or on microfiche. For instance, Davis and Davyes might be grouped under the general heading Davies. The approach can be described as a Name Bucket, in that all spellings are assigned to an individual group. But it has serious problems when it comes to assigning doubtful names. Taking the example above, should Davy/Davey be included in the Davies group, or given its own group? And if the former, then what about Davison?

Soundex

Soundex is a type of Name Bucket approach whereby each surname is given a code, and every surname spelling is regarded as either belonging to a group, or not. The code (e.g. 'R163' for Roberts) throws away all vowels apart from any initial vowel. It has been used for many years, but despite its many flaws it refuses to die, probably because it’s extremely quick and easy to implement - it doesn't rely on manually inspecting each spelling. However, as well as grouping together many spellings that don’t belong together at all, it allows no discretion in doubtful cases since spellings can’t belong to more than one group. Soundex (and improvements such as Metaphone etc) had some value in the days of manual systems where names had to be filed in a single sequence - such as a card index or printed index - but continuing to use such methods in computerized databases throws away the flexibility that computers can offer. Soundex is not described in any further detail on this website, but further information can be found on the web.

FamilySearch

Despite the problems of the Name Bucket approach, the LDS Church generally employ it to deal with name variants. Their version is much more sophisticated than Soundex, as they (presumably) manually inspect spellings to decide which group it belongs to. It often gives reasonable results, and you'll find it used, for instance, in the IGI on their FamilySearch website. Presumably this is a legacy from the days when most of their material was available on microfiche or in print. The problems are that each new database involves a considerable amount of work to assign new low-frequency spellings to existing groups. It's clear by studying the data that in certain cases (e.g. the 1881 census) they didn't complete this work. As a result many low-frequency names which should have been assigned to a group remained as singletons. This is perhaps unsurprising since manually assigning hundreds of thousands of variants spellings to individual groups is extremely labour-intensive. Secondly, as with Soundex the group boundaries are still rigid.

Dynamic Methods

These methods are dynamic in the sense that when used in a computerised database the variants returned depend upon the surname you start with.

NameX

NameX is a proprietary solution from Image Partners that bears a close resemblance to Nominex.   Click here for a critique.

Wildcards

Not really a name-matching algorithm, but mentioned here as wildcards do provide a powerful way of returning results from a database. Wildcard searches of greater or lesser flexibility are offered by most on-line search systems. Some restrict you on where wildcards can be placed in the search pattern, e.g. sometimes not in the first character position, e.g. *OBSON. Such restrictions usually reflect limitations of the underlying database, and/or design & performance issues.

One problem with wildcards is that a given search may return many spellings that are not relevant, e.g. searching for Bailey and variants one could use the wildcard string Ba*ley. This will successfully return Bailey, Bayley and Baley, but will miss Baily and Bayly while including (amongst many others) Bagley and Bardsley. Modifying the search string to Ba*l*y will capture the Baily/Bayly variants but at the expense of increasing the number of irrelevant matches. And you would still miss Bayleye, Baylley, Baylea, Baylee, Baylly, Baley, Baylie and many others. Even by running multiple searches it can be difficult to think of all the spellings that might be of relevance, which is where a system like Nominex can help.