The test data used by the demo system is derived from the 1881 Census of England, Scotland & Wales. This contains some 30 million entries, representing some 415,000 different surname spellings.
The demo system currently allows searches among the top 40,000 surnames, i.e. those with a frequency of 30 or more occurrences in the 1881 Census. The frequency of each variant is also given in the results table.
The data is stored on-line in a MySQL database as a large number of scored pairs, the average number of variants for each surname being around 200.
Results are presented in ranked order, from an exact match (100%) down to 85%. The percentage shown is a notional value calculated by the matching algorithms used. The algorithm calculates values to a lower degree of similarity, but these are less likely to be useful in practice.
In a working system one might well offer a number of categories to the user such as 'Exact Match', 'Close Match' and 'All Matches'. These might be assigned ranges of percentages such as 100% match, 100%-90%, 100%-80% respectively.