Temporal Reasoning in Large Language Models Members and Partners - Laboratory: IRISA - Research team: Linkmedia - Supervisors: Caio CORRO and Laurent AMSALEG - Gratification: selon les règles en vigueur au moment du stage - Contact: caio.corro@irisa.fr et laurent.amsaleg@irisa.fr Topic Description Many facts are only valid during a specific period of time. However, most Large Language Models (LLMs) are trained on snapshots of data collected at a specific moment in time, and it is very unclear whether or not they learn the appropriate temporal scope for the facts they encode. A few studies have analyzed the poor capacities of LLMs to correctly deal with time [4, 5]. From there, studies investigate various techniques to encode the temporal context associated to facts [2, 1]. Effectively modeling time and its complexities in LLMs remains an open challenge. Narrowing down this problem facilitates progressing toward its better understanding. Kougia et al. [3] study proposes to focus on observing the capacity of LLMs to extract the temporal relations between pairs of events in a zero-shot setting. To this aim, a series of biomedical texts are processed, within which temporal information exist: clinical notes include the temporal context for symptoms, treatments, drug prescriptions, tests and other actions, overall building the medical history of patients. That study demonstrates that recents LLMs (including GPT-3.5, Mixtral, Llama 2, . . . ) struggle with correctly maintaining the temporal consistency of such events in the answer they return when prompted. This is particularly true when temporal transitivity or symmetry (is A is before B, then B is after A) are at stake. Why is that? How are time contexts represented? What is poorly captured? How temporal consistency can be improved? Is it possible to build bridges with Allen's interval algebra? The intern will first reproduce some of the results reported in the referenced studies, relying on the datasets proposed in these works. From there, weaknesses will be identified and analyzed. Then, we will focus on methods to mitigate current limitation of temporal reasoning with LLMs. The intern must have a strong interest in deep learning. Implementation will be done in Python using the Pytorch library References [1] Piyush Bagad, Makarand Tapaswi, and Cees G. M. Snoek. Test of time: Instilling video-language models with a sense of time. In CVPR, 2023. [2] Bhuwan Dhingra et al. Time-aware language models as temporal knowledge bases. Trans. Assoc. Comput. Linguistics, 10, 2022. [3] Vasiliki Kougia et al. Analysing zero-shot temporal relation extraction on clinical notes using temporal consistency. In BioNLP@ACL, 2024. [4] Angeliki Lazaridou et al. Mind the gap: Assessing temporal generalization in neural language models. In NeurIPS, 2021. [5] Paul Röttger and Janet B. Pierrehumbert. Temporal adaptation of BERT and performance on downstream document classification: Insights from social media. In EMNLP (Findings), 2021.