You need to sign in or sign up before continuing.

PhD Position F/M Foundation Models and Natural Language Interaction for Human-Robot Collaboration

Updated: 17 days ago
Location: Villers les Nancy, LORRAINE
Job Type: FullTime
Deadline: 18 Apr 2026

19 Mar 2026
Job Information
Organisation/Company

Inria, the French national research institute for the digital sciences
Research Field

Computer science
Researcher Profile

First Stage Researcher (R1)
Application Deadline

18 Apr 2026 - 00:00 (UTC)
Country

France
Type of Contract

Temporary
Job Status

Full-time
Hours Per Week

38.5
Offer Starting Date

1 Sep 2026
Is the job funded through the EU Research Framework Programme?

Not funded by a EU programme
Reference Number

2026-09879
Is the Job related to staff position within a Research Infrastructure?

No

Offer Description

The HUCEBOT team is dedicated to advancing algorithms for human-centered robots: robots that are not working autonomously in isolation, but that instead react, interact, collaborate, and assist humans. To do so, these robots need to intertwine a multi-contact whole-body controller, a digital simulation of the interacting humans, and machine learning models to predict and respond to human movements and intentions. In a crescendo of complexity, the team  tackles scenarios that involve collaboration with cobots, assistance with exoskeletons, and collaboration with humanoid robots. The application domains span from industrial robotics to space teleoperation.

The main robots of the team are the Tiago++ bimanual mobile manipulator, the Unitree G1 humanoid, and the Talos humanoid robot. The team also works with Franka cobots and exoskeletons.

The team currently consists of about 25 members, including permanent researchers, PhD students and post-doctoral students.

Serena Ivaldi, head of HUCEBOT, is holding the chair in Robotics and AI of the Cluster IA ENACT project (https://cluster-ia-enact.ai/ ) that is funding this PhD thesis. In the chair, she wants to push the research in Natural Language to assist humans in different scenarios of collaboration with robots, where safety is paramount. The ambition is to create a foundation that bridges natural language commands into interpretable commands for the robot, leading to robot actions that are contextualized and intrinsically safe. 

Most work on VLM/LLMs for robotics focused on generating sequences of actions and plans from high level goals, offline, only targeting autonomous robots isolated from humans. A critical limitation to deploy VLM/LLMs for robots collaborating with humans is their ability to be used online, in a human-in-the-loop scenario, to generate suitable motions and "safe" robot policies.

Here, we use VLM/LLMs to generate a robot's motions online in collaborative scenarios where safety is critical: active exoskeletons and mobile manipulators assisting humans in object manipulation. The human vocally commands the robot interactively, online, to control the generation of its motion at the low level: start, stop, direct, and change its low-level parametrization (e.g., compliant behavior, the velocity, the maximal torque assistance, etc.). 
Extension of paradigms and comparison with existing and fine-tuning of VLAs is also considered, as this is part of the ongoing research of the team.

The first objective is to design the robot's controller with the natural language interaction feature in mind: the human's commands, corrections and Approximate Numerical Expressions must be translated into meaningful quantities, coherent with the physics of the problem. What do "faster", "a bit higher", "little to the right", and "more assistance" mean?

The second objective is to design new multimodal models fusing VLM/LLMs and multimodal pipelines to predict the human's intent and minimize the need for corrections. Natural language instructions may be incomplete or unclear, but cameras and microphones (or other sensors) could provide sufficient contextual information to generate an appropriate motion. For example, "take that" could be easily translated into "grasp the bottle", if it is the only item in front of the robot. "Move a bit to the right" needs clarifications, but also estimation of physical quantities that are context dependent. 

The third objective is to detect emergency commands, leveraging both LLMs and audio processing models for nonverbal communication, and generating suitable robot's reactive behaviors. Humans are often unable to speak clearly when they interact with a robot: sometimes, fear takes over and they do not speak at all, or they mumble, or scream, when they could just say a clear "stop". Detecting emergency commands is critical to be able to deploy the robots into the real world. For example, "Watch out", "Attention!" are difficult to translate into precise motions, and require one-shot evaluations because of the urgent nature of the command.

The PhD student will carry out research in the aforementioned objectives, and will benefit from our collaboration with E. Zibetti (Paris 8, SHS), expert in Approximate Numerical Expressions for Psychology, and D. Sadigh (Stanford University), leading the research in LLMs for robot actions.

Real-world demonstrations with real robots and real humans interacting with the robots are mandatory in this PhD.

Main activities: implement, test and develop novel algorithms for real robots that use language models and foundation models. Write papers and present them at conferences. Write, test, validate and document its associated software. Experiments with real robots are mandatory.

The PhD will also be involved in the activities organized by the Cluster-AI project ENACT, which may involve dissemination actions, meetings and presentations to relevant stakeholders (Europe, France, industries, etc).


Where to apply
Website
https://jobs.inria.fr/public/classic/en/offres/2026-09879

Requirements
Skills/Qualifications

Good skills in Python (Pytorch). Ideally, prior experience with LLM, VLM and Foundation Models.

Good knowledge of robotics.

Languages: English (English is the official language of the team and many members do no speak French).

Proactivity and curiosity, daily communication, ability to work in a team are fundamental. 


Specific Requirements

The ideal candidate is fascinated by the recent developments in artificial intelligence and robotics, especially Foundation Models, LLM, VLM, OpenVLA. He/She wants to experiment with these new techniques, develop their skills, and experiment with state-of-the-art robots. 

IMPORTANT: candidates must upload their CV, motivation letter and all documents listed in this page: https://team.inria.fr/hucebot/job-offers/  
Applications that do not contain these documents will not be considered.


Languages
FRENCH
Level
Basic

Languages
ENGLISH
Level
Good

Additional Information
Benefits
  • Subsidized meals
  • Partial reimbursement of public transport costs
  • Leave: 7 weeks of annual leave + 10 extra days off due to RTT (statutory reduction in working hours) + possibility of exceptional leave (sick children, moving home, etc.)
  • Possibility of teleworking (after 6 months of employment) and flexible organization of working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural and sports events and activities
  • Access to vocational training
  • Social security coverage

€2300 gross/month


Selection process
Website for additional job details

https://jobs.inria.fr/public/classic/en/offres/2026-09879

Work Location(s)
Number of offers available
1
Company/Institute
Inria
Country
France
City
Villers lès Nancy
Geofield


Contact
City

LE CHESNAY CEDEX
Website

http://www.inria.fr
Street

Domaine de Voluceau - Rocquencourt
Postal Code

78153

STATUS: EXPIRED

  • X (formerly Twitter)
  • Facebook
  • LinkedIn
  • Whatsapp

  • More share options
    • E-mail
    • Pocket
    • Viadeo
    • Gmail
    • Weibo
    • Blogger
    • Qzone
    • YahooMail