We conduct three types of activities: education, research and development, and advocacy.
Our projects aim to mitigate the risks of frontier AI development.

R&D

Our research program aims to identify and correct the problematic behaviors of AI models by developing monitoring tools and relevant solutions. Our model analyses and evaluations will help inform the decision-making of their designers, and promote the innovation and industrialization of advanced AI safety techniques.

We aim to highlight current risks, but also to explore the challenges that future AI models could pose, so that our research facilitates responsible progress in the field.

Our first project aims to develop a scalable supervision system for agents based on LLMs. Current LLM agents are already showing various types of failure modes. By learning to detect them now using less advanced models, we can start iterating to create robust and scalable surveillance systems for future agents.

The stages of the project are as follows:
  • Dataset design

Creation of a comprehensive dataset of LLM agent traces containing unexpected behaviors in these agents, such as prompt injection, deception, and excessive autonomy. The dataset is divided to leave out behavioral classes intended for testing supervision systems.

  • Scalable supervision

Experimentation with different supervision architectures to monitor advanced LLM agents using less advanced models, in order to detect hidden anomalies without explicit prior knowledge of these behaviors.

  • Collection of real examples

Creation of an open-source tool that easily integrates into existing LLM agent architectures, facilitating feedback loops and testing robustness in real conditions through the collection of data from the community.

Education

Despite massive investments in AI in recent years, the opportunities to train in AI safety remain very inadequate given the scale of the challenges we face. To fill this gap, we offer programs in various formats aimed at training researchers and engineers in the latest advances in the field.

Accredited courses in ENS

We teach at ENS Paris and the Master MVA at ENS Paris-Saclay a course on AI safety, named ”Turing seminar”. This accredited course includes presenting articles, completing research projects, and occasionally organizing debates and discussions. Sessions are designed to enrich the educational experience and encourage dynamic and in-depth interaction with the topics.

ML4Good Bootcamps

These intensive 10-day bootcamps are designed for students very talented in math and computer science, from France and elsewhere, in order to strengthen their skills in machine learning and AI safety.

The objective is to make them aware of these themes through presentations and readings, to engage them in projects related to AI safety, and to encourage them to continue their careers in this essential but neglected field.

Advocacy

The capabilities of artificial intelligence are improving quickly, but safety aspects are lagging behind. It is therefore crucial to highlight the need for trustworthy AI research, since the state of the art is already insufficient for industrialization in many fields (health care, transport, defense, etc.), and the risks of future models are even more concerning. This is why, along our research and education work, we are raising awareness and disseminating information to the general public and AI-relevant actors.

Events

At the interface between awareness and research, we organize hackathons focused on AI safety challenges. These events allow us to explore this field and develop solutions for safer AI. These hackathons come in various formats, some introductory, others more advanced and focused on research.

We also organize a variety of events (talks, round tables, workshops) addressing issues related to the progress of AI. The topics cover, among other things, current and future challenges, technical challenges, and governance.

Publications

We publish articles, reports, and summaries to inform researchers, decision-makers, and citizens on the evolution of AI.

Sign up for our newsletter