*Keywords*: Large Language Models, Morphology, Natural Language Processing *Research Context and Questions* Hofmann et al. (2025) studied the modeling of competition between the nominal suffixes -ity (available -> availability) and -ness (selfish -> selfishness). They find that LLMs model this competition fairly well. However, they did not study the competition between prefixes (e.g., un- and non-). This is an important research question because studying the morphological competence of LLMs allows us to measure their generalization ability (Weissweiler et al., 2023; Weller-Di Marco and Fraser, 2024; Lerner and Yvon, 2025b). Indeed, the lexicon is not a list of words that is known a priori and immutable (Corbin, 2012; ¦tekauer et al., 2005). However, LLMs are probabilistic models trained to maximize the likelihood of their training data. They model a probability distribution over a finite token vocabulary. While infrequent words in a corpus were typically filtered out in traditional approaches (Eisenstein, 2019), modern models (OpenAI, 2023; Llama Team, 2024; Gemma Team, 2024) all rely on BPE (Byte Pair Encoding) segmentation, which segments rare words into subwords by optimizing a data compression criterion (Gage, 1994; Sennrich et al., 2016; Beinborn and Pinter, 2023). Thus, models are theoretically capable of deriving or inflecting words in forms absent from their training corpus, but the reality is more complex (Hofmann et al., 2020; Weissweiler et al., 2023; Lerner and Yvon, 2025b). Morphologically competent LLMs would be useful for a wide range of NLP applications, notably for Machine Translation (Ataman et al., 2019; Marco et al., 2022; Lerner and Yvon, 2025a), and more generally for Natural Language Generation. Our previous work is limited to fairly simple concatenative phenomena (e.g., the prefixation of pré+entraînement). However, several affixes can be competitive/synonymous (Corbin, 2012), for example pré- and anté-, which raises the following question: given the same definition, could we produce antéentraînement rather than préentraînement? If not, why? It may be because of: a phonological constraint (e.g., the number of syllables (Plénat, 2009; Lindsay and Aronoff, 2013), euphony (Lignon and Plénat, 2009)), lexical consistency (e.g., analogy with prétraitement), or simply historical chance (e.g., influence of English's pretraining (Lignon and Plénat, 2009; Hole¨, 2023))? *Objectives* These questions will be assessed by comparing the probability that LLMs assign to different affixes for pseudo-words (e.g. generated using UniPseudo (New et al., 2024)) to that of a cognitively plausible model, GCM (Nosofsky, 1990). If these results are not conclusive, we will conduct a survey with native speakers, to collect judgments of acceptability (comparing, e.g. "unwug" vs. "nonwug"), in the same fashion as Hofmann et al. (2025); Copot and Bonami (2024). Other phenomena in derivational morphology raise similar questions (Corbin, 2012), notably allomorphy, where different variants of the same morpheme can be used according to morphophonological constraints (e.g., *indétruisable vs. indestructible or, conversely, traduisible vs. *traductible). These questions will be studied by comparing BPE-based LLMs with byte-based LLMs, which are an emerging alternative to BPE (Wang et al., 2024; Zuo et al., 2024). However, they must process much longer sequences (since a word is typically segmented into many characters or bytes), which limits the use of Transformers, whose complexity is quadratic with respect to the sequence length (Vaswani et al., 2017). This will allow us to understand why BPE-based LLMs sometimes fail or succeed in deriving new lexemes. Marco and Fraser (2024) found that for inflection, the most important criterion was the consistency of the tokenization among all inflections of a given lexeme. *Internship conditions* The internship will be supervised by Paul Lerner (https://paullerner.github.io/), postdoc researcher, Leonie Weissweiler (https://leonieweissweiler.github.io/), postdoc researcher, and François Yvon (https://fyvo.github.io/), senior researcher. The internship may lead to a PhD thesis, provided available funding. The internship will take place at ISIR in the MLIA team (https://www.isir.upmc.fr/teams/mlia/presentation-mlia/?lang=en). ISIR is under the dual supervision of Sorbonne University, which is a world-class multidisciplinary university, and the French National Centre for Scientific Research (CNRS), which is one of the most important research institutions in the world. ISIR includes 6 research teams and 226 people. The intern will be located at 4, place Jussieu, 75005 Paris. - Remuneration: around 600¤ along with the refund of 75% of the Navigo (public transport) card. - Starting date: the internship is expected to start in February or March 2025. - Duration: 5-6 months *Requirements* We are looking for a second-year Master's student with a strong background in Natural Language Processing/Computational Linguistics. The intern is expected to be proficient in programming, especially in the Python language, and to have already worked under Linux. They should also have experience with a deep learning framework, preferably PyTorch. *Application* Please send a resume along with a cover letter (in French or English) and grade transcripts for the last two years to Paul Lerner at lerner@isir.upmc.fr. A list of pointers to example projects (e.g., via GitHub) or a letter of recommendation is a plus.