The World Intellectual Property Organization (WIPO) is a specialized agency of the United Nations dedicated to developing a balanced and accessible international intellectual property (IP) system. As part of the Patent Cooperation Treaty (PCT), WIPO disseminates patent applications through the public PATENTSCOPE search engine and provides English and French translations for the bibliographic data of every patent filed in the PCT system. However, in order to improve the accessibility of the patent documents, WIPO has developed its own machine translation (MT) toolkit, WIPO Translate, which allows to automatically translate whole patent documents. WIPO Translate is available for 13 languages: Arabic, Chinese, English, Finnish, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish At WIPO we use machine learning (ML) techniques to create the best possible MT models out of our collection of word class data, with the goal of providing the most accurate translation possible in the patent domain while ensuring translation is quick enough for all users. WIPO Translate also provides models more fit for the purpose of translating non-patent documents. The Advanced Technology Applications Center (ATAC), located in the WIPO Global Databases Division, is looking for a research and development fellow who will mainly work on improving WIPO Translate. ATAC looks for candidates who are driven and motivated by conducting and improving the different steps required to train MT models, such as data preparation, training, evaluation, and deployment to production, using state-of-the-art MT frameworks such as the Marian NMT toolkit. In addition, the candidate will assist the team with other natural language processing (NLP) tasks such as classification, summarization, and text generation. Project ATAC is a research and development center focused on prototyping, developing, and providing a range of ML powered services and tools such as MT, speech to text and multimodal classification and similarity search. The tools created by ATAC are used both by internal and external clients for different purposes. ATAC is looking for candidates that could contribute to WIPO Translate, and potentially assist on other tools as well. The main tasks associated with the position are: Neural Machine Translation (NMT) - Make the best use of state-of-the-art NMT frameworks such as Marian NMT - Run, maintain and improve pipelines for data processing, data augmentation, domain modelling, model training and evaluation - Use language and domain aware techniques such as data augmentation, use of placeholders and factored models to annotate, pre-process, and post-process our data - Leverage both industry-standard and state-of-the-art tools, frameworks, and architectures to create the best possible NMT models MT quality estimation - Compare different MT models using state-of-the-art metrics - Use feedback from human evaluation to further improve our engines, and introduce new automatic metrics to capture the feedback - Help in developing further quality estimation metrics for our MT models Integration and deployment of MT and NLP tools - Work on different tools and interfaces to consume MT: graphical user interfaces and APIs, targeting segment, paragraph, and document level translation, and supporting different formats such as plain text, XML and HTML - Deploy WIPO Translate on different environments, such as local servers, cloud servers and containers Develop methods to collect and clean training data - Have knowledge of techniques and frameworks to filter, clean and align documents, both for patent documents and other kinds of documents such as meeting proceedings - Work on data augmentation and combining various sources of data, such as in- and out-of-domain corpora, synthetic data, and human translations - Define workflows for updating NMT models using newly published documents using techniques such as incremental training, online training, and domain adaptation Required skills - Hold a MSc or PhD in computer science in the fields of MT, computational linguistics, or related fields - Prove extensive knowledge of deep learning architectures focused on text processing, both sequence to sequence and regression, such as RNN and transformer, along with popular implementations such as Marian NMT or the transformers library - Strong programming skills, preferably in Python and Java - Familiarity with Unix/Linux environments. Desirable skills - ML techniques: Neural Networks, Naive Bayes, SVM, Knn, EM, ANNs, etc. - Programming and scripting languages: Python, Java, bash, Perl - Web technologies: tomcat, JavaScript, angular, gunicorn, nginx - Databases and data storage: MySQL, Redis, Lucene/Solr, ElasticSearch - Administration of Debian and RHEL setups - DevOps: version control systems (git, svn), build tooling, and deployment strategies both in local and cloud servers and containers - Excellent writing skills for technical documentation, administration guides and user guides in English - Scientific publications or significant contributions to open-source projects will be an advantage Language skills The candidate should have either excellent knowledge of written and spoken English, or excellent knowledge of written and spoken French and good knowledge of English. Knowledge of the other official languages of the United Nations (Arabic, Chinese, Spanish, Russian and French), or the official languages of WIPO (Japanese, Korean and German) will be considered an asset. Salary and location The fellow should relocate to Geneva, Switzerland, where the WIPO headquarters are located. However, teleworking is possible up to three days a week from the nearby area. The fellowship position offers compensation for travel expenses on engagement and separation from service and an attractive monthly stipend which is set in accordance with the level of qualifications and experience of the Fellow (stipend starts from 6,000 CHF monthly and may vary up to maximum 7,000 CHF monthly). Details are available on request. This is not a regular employment position within the WIPO Secretariat, and the position does not lead to any employment rights and entitlements beyond the terms of the fellowship. Additional information Expected starting date: September 15, 2023 Duration: up to 12 months, with the possibility of extension one or more times, provided the maximum cumulative length does not exceed three years. How do I register my interest? Expressions of interest, formulated through a brief statement by the candidate addressing each of the requirements set out above, accompanied only by a full curriculum vitae (résumé), should be sent by email to: Daniel Torregrosa daniel.torregrosarivero@wipo.int by July 16 2023 with subject "NMT Fellowship 2023". Note for PhD students: WIPO cannot provide academic support for supervising a PhD. However, an agreement with a local supervisor can be obtained.