Project Title: Unsupervised Document Data Understanding and Structuration https://careers.slb.com/jobaddetail.aspx?id=72948 6 MONTHS INTERNSHIP ABOUT US Come and Join SLB’s AI Lab in Paris. We are currently offering internships to bright minds specialized in Data Science and Artificial Intelligence. Discover a multinational company. We have brought a little bit of the Silicon Valley in Paris. Experience working within a team of young and fun passionate Data Scientists, tackling real business challenges, in tandem with business experts who are sitting at your desk. The Artificial Intelligence & Machine Learning Data Scientist helps develop software and processes that can be used for robotics, artificial intelligence programs and applications. In close collaboration with the business and métiers, the data scientist offers mathematical and statistical models from the collected data to augment, improve or speed up human decisions within the Oil & Gas sector. With SLB you will be given the opportunity to apply your expertise and deploy deep learning solutions at scale on real-world problems, supporting many areas of SLB’s business. SLB is the first Oil & Gas service company to move its processes and workflows in the cloud. This gives the Embedded AI Lab the perfect opportunity to leverage these innovative technologies and resources. As part of the Embedded AI Lab, you will be able to test, experiment and research with the bleeding edge environment with Petabits and Petabits of data. You will oversee applying research and delivery of Proof of Concepts solutions, responding to clear and specific business needs. LOCATION: ​Clamart, France​ ​​SRPC​ ​is the largest SLB technology and development center in Europe. Around 1200 scientists, engineers, and technicians, of more than 50 nationalities, design and manufacture equipment and systems for our energy services worldwide. Based in Clamart close to Paris, SRPC teams form a center of excellence for research and development of breakthrough technologies.​ SLB is recognized globally for its expertise in - Artificial intelligence - High-temperature electronics - Mechanical systems for extreme conditions - Physics of sensors and measurements - Software development - Applied mathematics - Geology and Petrophysics Our strength comes from our passion for innovation and our multicultural population. DESCRIPTION AND SCOPE A substantial volume of valuable information is concealed within documents in various formats such as PDF, Microsoft Word, and images, often buried in multi-level folders. With the advent of NLP (Natural Language Processing) and Document AI techniques, we have the capability to locate specific files or folders through heuristic-based matching. Additionally, we can extract and structure information from document content using information retrieval methods. However, this process demands considerable domain knowledge and manual effort. Specifically, there is a need to understand the hierarchy of folder levels, and to define rules or label extensive datasets for effective document information extraction. For this internship, our objective is to progress a step further to diminish the reliance on manual efforts, such as rule definition and data labeling, advancing towards a (nearly) unsupervised approach for information extraction and structuration. We aim to develop a robust, general solution that, given a large dataset, can discern the structure of folders/files and identify the key elements and their corresponding values within the content. Rather than relying on human heuristics, we aim to directly identify patterns within a large dataset by comparing folder structures, file content, layouts, and so forth. For instance, by analyzing files situated in identical folder positions and applying an analysis reminiscent of TF-IDF to text or visual layouts, we might be able to discern which parts represent consistent keys and which parts are the corresponding values. Furthermore, instead of requiring a substantial volume of labeled datasets, we will explore whether the system can ascertain potential key-values from only a handful of seed documents (labeled documents). If it works, we will explore whether the system can apply this knowledge to other analogous documents (e.g., similar documents provided by distinct companies), even if layouts or keywords differ. DELIVERABLES ​​The intern will receive guidance from AI research engineers from SLB AI Lab and will be fully integrated into the NLP team, collaborating with subject matter experts. The internship will commence with a review of the state-of-the-art, followed by an understanding of the business context, before transitioning to research, prototype building, and testing phases. The intern will be able to engage in high-level theoretical and applied research in this field. He/she will contribute to developing new products and collaborate with project experts and data scientists within the AI Lab and from various SLB teams around the globe. REQUIRED SKILLS - ​​Master Degree - (Penultimate or Final year) ​ - NLP, applied mathematics, probability & statistics, deep learning. - Programming language: Python, PyTorch or Keras/Tensorflow - Oral and written communication skills in English - Good motivation, autonomy, teamwork, and ingenuity HOW TO APPLY ​​Send your CV to zzhang54@slb.com And apply on SLB Careers website: https://careers.slb.com/jobaddetail.aspx?id=72948