*Title: Grapheme-to-phoneme conversion adaptation using conditional 
random fields*

*Description:*
Grapheme-to-phoneme conversion consists in generating possible
pronunciations for an isolated word or for a sequence of words. More
formally, this conversion is a transliteration of a sequence of
graphemes, i.e., letters, into a sequence of phonemes, symbolic units to
represent elementary sounds of a language. Grapheme-to-phoneme
converters are used in speech processing

- either to help automatic speech recognition systems to decode words
  from a speech signal

- or as a mean to explain speech synthesizers how a written input should
  be acoustically produced.

A problem with such tools is that they are trained on large and varied
amounts of aligned sequences of graphemes and phonemes, leading to
generic manners of pronouncing words in a given language. As a
consequence, they are not adequate as soon as one wants to recognize or
synthesize specific voices, for instance, accentuated speech, stressed
speech, dictating voices versus chatting voices, etc. [1].

While multiple methods have been proposed for grapheme-to-phoneme
conversion [2, 3], the primary goal of this internship is to propose a
method to adapt grapheme-to-phoneme models which can easily be adapted
under conditions specified by the user. More precisely, the use of
conditional random fields (CRF) will be studied to model the generic
French pronunciation and variants of it [4]. CRFs are state-of-the-art
statistical tools widely used for labelling problems in natural language
processing [5]. A further important goal is to be able to automatically
characterize pronunciation distinctive features of a given specific
voice as compared to a generic voice. This means highlighting and
generalizing differences that can be observed between two sequences of
phonemes derived from a same sequence of graphemes.

Results of this internship would be integrated into the speech synthesis
platform of the team in order to easily and automatically simulate and
imitate specific voices.

*Technical skills:* C/C++ and a scripting language (e.g., Perl or
 Python)

*Keywords:* Natural language processing, speech processing, machine
 learning, statistical learning

*Contact:* Gwénolé Lecorvé (gwenole.lecorve@irisa.fr)

*References:*
[1] B. Hutchinson and J. Droppo. Learning non-parametric models of
    pronunciation. In Proceedings of ICASSP, 2011.
[2] M. Bisani and H. Ney. Joint-sequence models for grapheme-to-phoneme 
    conversion. In Speech Communication, 2008.
[3] S. Hahn, P. Lehnen, and Ney H. Powerful extensions to crfs for
    grapheme to phoneme conversion. In Proceedings of ICASSP, 2011.
[4] Irina Illina, Dominique Fohr, and Denis Jouvet. Multiple
    pronunciation generation using grapheme-to-phoneme conversion based
    on conditional random fields. In Proceedings of SPECOM, 2011.
[5] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira.
    Conditional random fields: probabilistic models for segmenting and
    labeling sequence data. In Proceedings of ICML, 2001.