Nominex

You are in:   Home > How does Nominex Work > IPA Conversion

Home

Existing Search Methods

Precision vs. Recall

Demo

How does Nominex work?

Overview

Derived Forms

IPA Conversion

Creating Scores

FAQ

Links

References

Acknowledgements

IPA Conversion

One of the Derived Forms of each surname is its phonetic version. This uses the symbols of the IPA (international phonetic alphabet) - actually a modified form known as Sampa which can be rendered more easily in the computer using ASCII symbols.

Carney’s list of 225 spelling-to-pronunciation rules was used as the basis of these IPA versions, supplemented by additional rules created to deal with his numerous exceptions, as well as many other exceptions found in the corpus of British surnames. The total rule set is currently around 2,400.

These are the steps involved in creating the IPA version of each surname:

StepDetails
1.Pre-processExamines each surname and can optionally create a second or third version of the spelling, such that the output table contains both. The following circumstances are considered: (a) surname is an abbreviated forename, e.g. for Robts creates Roberts, currently c.45 entries in this table; (b) pairs like Beauchamp/Beecham. Currently around 20 of these; (c) expands initial X- to CHRIST- (d) deals with sequences of 3 vowels, e.g. by reduction to 2-vowel sequences: EAA -> EA, or conversion to consonants: EUA -> EVA; (f) tries to deal with embedded apostrophes, e.g. O’NEIL and MC’ etc; (f) splits a double-barrelled name into two separate entries in the output table.
2.Word splitterSome surnames of two syllables or more can potentially cause a problem for the IPA conversion engine due to the presence of a medial e, an example being STONEHAM. The program needs to recognise that not only is the 'e' silent in this surname, but it also has an ‘e-marking’ function, which changes the pronunciation of the 'o'. So the program aims to split this surname into two parts, STONE+HAM, then perform IPA conversion on each part. But spotting a compound spelling with medial e is not a trivial exercise, and there are many exceptions such as CAVENAGH and BACHELOR where the 'e' IS pronounced and therefore doesn't affect the first syllable. The program incorporates a rudimentary ‘splitter’ based on Carney’s suggestions - whereby he identifies a list of consonant onsets that may follow a medial E. This is supplemented by lists of known exception patterns, currently there are around 250 of these, eg p*rcev* to capture Perceval & variants.
3.IPA ConversionPerform IPA conversion on the surname (or on the two parts of a compound name, e.g. WANE and WRIGHT separately). Write the IPA/SAMPA version to the output table. The current version of the system can create two different phonetic versions where necessary, for those cases where alternative pronunciations of a surname are either known or suspected. The program generates a single-character 'revised Sampa' code for all Sampa codes which are two characters, such as eI.
4.Syllable countingProgram tries to estimate the number of syllables by counting the vowel phonemes in the IPA/SAMPA spelling. The syllable count is one of the metrics that the program uses in the matching step. Some problems remain, mainly associated with ‘free’ vowels. e.g. EWELL vs. YULE is a problem (one syllable or two?).

Pronunciation Issues

Nominex deals with many of the spelling & pronunciation oddities that occur in the corpus of British surnames, amongst which are the following:

Tricky Letter Groups

GroupPronunciation #1 ExamplesPronunciation #2 ExamplesNotes
By-Bye, Byatt, Bycroft, Byfield, BywaterByng, Bysh, Byshop'y' as a substitute for 'i', occurs frequently in historical contexts
-earBear, Pear-dear, -fear, -gear, near
-ngerAbinger, Challenger, Ginger, GraingerRinger, Stringer, BellringerSoft vs. hard 'g'
Ge-, Gi-Geak, Gelling, Gilbert, Gibbard, Gingham, GillardGee, Geeson, Geoffrey, Giles, Gillingham, Gillott, GillardSoft vs. hard 'g' [Gillard can be either]
-igha-Brigham, WighamHigham, Meigham, WeighamSoft vs. hard 'g'
-oughBough, Gough, HoughRough, ToughRhymes -off or -uff
-ow-Bowles, Knowles, RowlandCowley, Dowling, Fowler, Howlett, Ashdown
-ull-Bull, Pull, FullerGull, Hull, Mull, Tull
Wa-Wackford, Waghorn, Wagstaff, WaxWaddington, Wadham, Walpole, Wands, Wash, WattsRhymes 'a' or 'o'
-our-Courcey, Jourdan, YourkSourby, Flourday

Abbreviated Forms

SpellingMatched to:
Will'ms, W'ms WmsWilliams
Ric'dsRichards

Conventional Spellings

SpellingMatched to:
BeauchampBeecham
FiennesFines
CholmondleyChumley
CockburnCoburn
MainwaringMannering
PepysPeeps
FeatherstonehaughFanshaw
MarjoribanksMarchbanks
GeogheganGehegan

Examples of ambiguous or misinterpreted letters in historical documents

SpellingMatched to:
u/vDauisDavis
u/vEuansEvans or Ewans
i/jMaiorMajor
u/vVnderwoodUnderwood
i/jIacobsJacobs
f/sGreengrafsGreengrass

Welsh 'w'

SpellingMatched to:
wLl[h]wydLloyd

Greek initial 'chi'

SpellingMatched to:
xXmasChristmas
xXferChristopher

Foreign Forms

Nominex also deals with many of the foreign forms that appear in historical contexts, e.g. French Louis, -cois, -eux; Italian -cci and -cce, -fiore; Germanic -baum-, -stien, -stein, -wein-, -muller, -jung-, Fuchs, Sachs, -meist-, Sch-, Zei-, Fleisch, Klein-, Rein- etc.