Temporal Reasoning in Large Language Models

Members and Partners
- Laboratory: IRISA
- Research team: Linkmedia
- Supervisors: Caio CORRO and Laurent AMSALEG
- Gratification: selon les règles en vigueur au moment du stage
- Contact: caio.corro@irisa.fr et laurent.amsaleg@irisa.fr

Topic Description

Many facts are only valid during a specific period of time. However,
most Large Language Models (LLMs) are trained on snapshots of data
collected at a specific moment in time, and it is very unclear whether
or not they learn the appropriate temporal scope for the facts they
encode. A few studies have analyzed the poor capacities of LLMs to
correctly deal with time [4, 5]. From there, studies investigate
various techniques to encode the temporal context associated to facts
[2, 1]. Effectively modeling time and its complexities in LLMs remains
an open challenge.

Narrowing down this problem facilitates progressing toward its better
understanding. Kougia et al.  [3] study proposes to focus on observing
the capacity of LLMs to extract the temporal relations between pairs
of events in a zero-shot setting. To this aim, a series of biomedical
texts are processed, within which temporal information exist: clinical
notes include the temporal context for symptoms, treatments, drug
prescriptions, tests and other actions, overall building the medical
history of patients. That study demonstrates that recents LLMs
(including GPT-3.5, Mixtral, Llama 2, . . . ) struggle with correctly
maintaining the temporal consistency of such events in the answer they
return when prompted. This is particularly true when temporal
transitivity or symmetry (is A is before B, then B is after A) are at
stake.

Why is that? How are time contexts represented? What is poorly
captured? How temporal consistency can be improved? Is it possible to
build bridges with Allen's interval algebra?

The intern will first reproduce some of the results reported in the
referenced studies, relying on the datasets proposed in these
works. From there, weaknesses will be identified and analyzed. Then,
we will focus on methods to mitigate current limitation of temporal
reasoning with LLMs.

The intern must have a strong interest in deep
learning. Implementation will be done in Python using the Pytorch
library

References

[1] Piyush Bagad, Makarand Tapaswi, and Cees G. M. Snoek. Test of
time: Instilling video-language models with a sense of time. In CVPR,
2023.

[2] Bhuwan Dhingra et al. Time-aware language models as temporal
knowledge bases. Trans. Assoc.  Comput. Linguistics, 10, 2022.

[3] Vasiliki Kougia et al. Analysing zero-shot temporal relation
extraction on clinical notes using temporal consistency. In
BioNLP@ACL, 2024.

[4] Angeliki Lazaridou et al. Mind the gap: Assessing temporal
generalization in neural language models.  In NeurIPS, 2021.

[5] Paul Röttger and Janet B. Pierrehumbert. Temporal adaptation of
BERT and performance on downstream document classification: Insights
from social media. In EMNLP (Findings), 2021.