Context The NanoBubbles ERC Synergy project's objective (https://nanobubbles.hypotheses.org) is to understand how, when and why science fails to correct itself. The project focuses on claims made within the field of nanobiology. Project members combine approaches from the natural sciences, computer science, and the social sciences and humanities (Science and Technology Studies) to understand how error correction in science works and what obstacles it faces. For this purpose, we aim to trace claims and corrections through various channels of scientific communication (journals, social media, advertisements, conference programs, etc.) via both qualitative and digital methods. Intership objectifs Scientific articles are now discussed in a variety of mediums. The social network Twitter is particularly favored by several professionals, such as journalists and scientists, as a way of staying updated about recent development in their field, publicly discussed their work with distant colleagues and engage outside parties in their discoveries. Citing scientific articles on Twitter is easily done using publishers sharing links. Studies focusing on the use of social network by scientists (Costas 2015, 2017), the propagation of scientific information (Mohammadi 2018, W ?uhrl 2021, Hou 2022) and how the use of Twitter may influence back research (Ortega 2017). These studies rely heavily on the hyperlinks present in Twitter posts or on tools providing data on the use of research in social networks like PlumX (Champieux 2015). However, a scientific article citation can be present in a tweet as a 'fuzzy mention' (e.g. I have read in a paper written by AUTHOR in 20XX that ...). These fuzzy mentions are hard to detect and need to be linked back to the article they refers to in order to be taken into considerations. The intern first task will consist in collecting a corpus of tweets containing such 'fuzzy mention' of scientific articles. Afterwards he will apply existing extraction technics and models, mainly Named Entity Recognition, in order to extract the information enabling to (1) determine that a twitter post does mention an article and (2) link this article to a bibliographic database. Skills - Being enrolled in a Master in Natural Language Processing, computer science or data science. - Good programming skills in Python, including experiences with natural language processing tools and methods, knowledge of machine learning methods and deep learning models. - Curiosity for scientometrics. - Ability to communicate and write in English is a plus. Scientific environment The work will be conducted within the Sigma team of the LIG laboratory (http://sigma.imag.fr). The recruited person will be welcomed within the team which offer a stimulating, multinational and pleasant working environment. Instructions for applying Applications must contain a CV + letter/message of motivation + master grades + letter(s) of recommendation (or names for potential letters), and be addressed to Cyril Labbé (cyril.labbe@imag.fr) and Martin Lentschat (martin.lentschat@univ-grenoble-alpes.fr). Applications will be considered on the fly. It is therefore advisable to apply as soon as possible. References - Champieux, R. (2015). PlumX. Journal of the Medical Library Association: JMLA, 103(1), 63. - Costas, R., Mongeon, P., Ferreira, M. R., van Honk, J., & Franssen, T. (2020). Large-scale identification and characterization of scholars on Twitter. Quantitative Science Studies, 1(2), 771-791. - Costas, R., van Honk, J., & Franssen, T. (2017). Scholars on Twitter: who and how many are they?. arXiv preprint arXiv:1712.05667. - Mohammadi, E., Thelwall, M., Kwasny, M., & Holmes, K. L. (2018). Academic information on Twitter: A user survey. PloS one, 13(5), e0197265. - Hou, J., Wang, Y., Zhang, Y., & Wang, D. (2022). How do scholars and non-scholars participate in dataset dissemination on Twitter. Journal of Informetrics, 16(1), 101223. - Wührl, A., & Klinger, R. (2021). Claim detection in biomedical Twitter posts. arXiv preprint arXiv:2104.11639. - Ortega, J. L. (2017). The presence of academic journals on Twitter and its relationship with dissemination (tweets) and research impact (citations). Aslib journal of information management, 69(6), 674-687.