The University of Bordeaux invites applications for a 2 year full-time postdoctoral researcher in Automatic Speech Recognition. The position is part of the FVLLMONTI project on efficient speech-to-speech translation for embedded autonomous devices, funded by the European Community. To apply, please send by email a single PDF file containing a full CV (including publication list), cover letter (describing your personal qualifications, research interests and motivation for applying), evidence for software development experience (active Github/Gitlab profile or similar), two of your key publications, contact information of two referees and academic certificates (PhD, Diploma/Master, Bachelor certificates). Details on the position are given below: Job description: Post-doctoral position in Automatic Speech Recognition Duration: 24 months Starting date: as early as possible (from March 1st 2021) Project: European FETPROACT project FVLLMONTI (starts January 2021) Location: Bordeaux Computer Science Lab. (LaBRI CNRS UMR 5800), Bordeaux, France (Image and Sound team) Salary: from 2 086,45EUR to 2 304,88EUR/month (estimated net salary after taxes, according to experience) Contact: jean-luc.rouas@labri.fr Short description: The applicant will be in charge of developing state-of-the-art Automatic Speech Recognition systems for English and French as well as related Machine Translation systems using Deep Neural Networks. The objective is to provide the exact specifications of the designed systems to the other partners of the project specialized in hardware. Adjustments will have to be made to take into account the hardware constraints (i.e. memory and energy consumption impacting the number of parameters, computation time, ...) while keeping an eye on performance metrics (WER and BLEU scores). When a satisfactory trade-off is reached, more exploratory work is to be carried out on using emotion/attitude/affect recognition on the speech samples to supply additional information to the translation system. Context of the project: The aim of the FVLLMONTI project is to build a lightweight autonomous in-ear device allowing speech-to-speech translation. Today, pocket-talk devices integrate IoT products requiring internet connectivity which, in general, is proven to be energy inefficient. While machine translation (MT) and Natural Language Processing (NLP) performances have greatly improved, an embedded lightweight energy-efficient hardware remains elusive. Existing solutions based on artificial neural networks (NNs) are computation-intensive and energy-hungry requiring server-based implementations, which also raises data protection and privacy concerns. Today, 2D electronic architectures suffer from "unscalable" interconnect and are thus still far from being able to compete with biological neural systems in terms of real-time information-processing capabilities with comparable energy consumption. Recent advances in materials science, device technology and synaptic architectures have the potential to fill this gap with novel disruptive technologies that go beyond conventional CMOS technology. A promising solution comes from vertical nanowire field-effect transistors (VNWFETs) to unlock the full potential of truly unconventional 3D circuit density and performance. Role: The tasks assigned to the Computer Science lab are the design of the Automatic Speech Recognition (for French and English) and the Machine Translation (English to French and French to English) systems. Speech synthesis will not be explored in the project but an open-source implementation will be used for demonstration purposes. Both ASR and MT tasks benefit from the use of Transformer architectures over Convolutional (CNNs) or Recurrent (RNNs) neural network architectures. Thus, the role of the applicant will be to design and implement state-of-the-art systems for ASR using Transformer networks (e.g. with the ESPNET toolkit) and to assist another post-doctorate for the MT systems. Once the performances reached by these baseline systems are satisfactory, details on the network will be given to our hardware designers partners (e.g. number of layers, value of the parameters, etc.). With the feedback of these partners, adjustments will be made to the network considering the hardware constraints while trying not to degrade the performances too much. The second part of the project will focus on keeping up with the latest innovations and translating them into hardware specifications. For example, recent research suggest that adding convolutional layers to the transformer architecture (i.e. the "conformer" network) can help reduce the number of parameters of the model which is critical regarding the memory usage of the hardware system. Finally, more exploratory work on the detection of social affects (i.e. the vocal expression of the intent of the speaker: 'politeness', 'irony', etc) will be carried out. The additional information gathered using this detection will be added to the translation system for potential usage in the future speech synthesis system. Required skills: - PhD in Automatic Speech Recognition (preferred) or Machine Translation using deep neural networks - Knowledge of most widely used toolboxes/frameworks (tensorflow, pytorch, espnet for example) - Good programming skills (python) - Good communication skills (frequent interactions with hardware specialists) - Interest in hardware design will be a plus Selected references: S. Karita et al., "A Comparative Study on Transformer vs RNN in Speech Applications," 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), SG, Singapore, 2019, pp. 449-456, doi: 10.1109/ASRU46091.2019.9003750. Gulati, Anmol, et al. "Conformer: Convolution-augmented Transformer for Speech Recognition." arXiv preprint arXiv:2005.08100 (2020). Rouas, Jean-Luc, et al. "Categorisation of spoken social affects in Japanese: human vs. machine." ICPhS. 2019.