========================================================================
Stage sur les réseaux de neurones pour le traitement automatique des
langues
========================================================================

* Descriptif rapide
-----------------------

Le LATTICE, en collaboration avec l'IRIT, propose un stage de niveau M2
sur l'analyse des paramètres d'un modèle de réseau de neurones appliqué
à l'acquisition de restrictions de sélection. Un descriptif détaillé en
anglais figure ci-dessous.

Le stage aura lieu au laboratoire LATTICE à Montrouge, près de Paris (à
10 mn du métro Mairie de Montrouge et à 5 mn de l'arrêt du Tram ligne 3
« Jean Moulin »). Il sera co-encadré par Thierry Poibeau et Marco
Dinarelli au LATTICE et par Tim van de Cruys à l'IRIT (les échanges avec
Toulouse se feront principalement par Skype).

Le stage est prévu pour une durée de 6 mois à compter de mars ou avril
2015. Il donnera obligatoirement lieu à la signature d'une convention de
stage et sera rémunéré suivant les règles en vigueur.

* Profil recherché 
-----------------------

- Formation en informatique ou traitement automatique des langues (M2,
  école d'ingénieur, éventuellement M1 avec une bonne expérience de la
  programmation)
- Bonne connaissance de python ou, à défaut, de perl
- Intérêt pour le traitement automatique des langues
- Bon niveau d'anglais (écrit / oral)
- Des connaissances en matière de réseau de neurones seraient évidement
  un plus

Pour candidater : envoyer un mail avec un CV et une lettre de motivation
à thierry.poibeau@ens.fr


* Descriptif détaillé
-----------------------


An Exploration of a Neural Network Model's Parameters for Selectional
Preference Acquisition

Predicates often have a semantically motivated preference for particular
arguments [1]. Compare for example the sentences in (1) and (2).

(1) The vocalist sings a ballad.
(2) The exception sings a tomato.

While both sentences are grammatically correct, the second sentence is
clearly ill-formed. This preference of a verb for particular arguments
is known as the verb's selectional preference. Recently, a neural
network approach has been shown to perform well on the modeling of
selectional preferences [2]. However, many parameters remain to be
investigated. First of all, a neural network's parameters may be
initialized in a number of different ways. For example, the parameters
might be initialized randomly, or they may be initialized using
previously constructed word embeddings. Secondly, the neural network's
architecture leaves ample space for experiments. The neural network's
architecture might be more `deep' or more `shallow', the size of the
network's layers may be varied, and certain parameters within the
network might be shared.

This internship will investigate the influence of different network
parameters on the performance of a neural network for the modeling of
selectional preferences. The student will adapt and train an existing
neural network implementation for selectional preference acquisition,
and examine the role of various model parameters for the network's
performance.

References:

[1] Van de Cruys, Tim ; Rimell, Laura ; Poibeau, Thierry and Korhonen,
Anna. 2012. Multi-way Tensor Factorization for Unsupervised Lexical
Acquisition. In Proceedings of the 24th International Conference on
Computational Linguistics (COLING), Mumbai, India.

[2] Van de Cruys, Tim. 2014. A Neural Network Approach to Selectional
Preference Acquisition. In Proceedings of the 2014 Conference on
Empirical Methods in Natural Language Processing (EMNLP), pp. 26-35,
Doha, Qatar. Association for Computational Linguistics.