Generative artificial intelligence systems rely on models trained on enormous volumes of data—often gathered through web scraping—to produce text, images, music, or code. These datasets frequently include protected works. This reality raises a central question: to what extent is it lawful to use copyrighted content to train an AI?

To address this, the European Union has introduced legal adjustments within the framework of the Copyright in the Digital Single Market (CDSM) Directive. The directive permits Text and Data Mining (TDM) under specific conditions and allows rights holders to object through explicit, machine-readable opt-outs.

The European Regulation on Artificial Intelligence (AI Act) goes further: it requires developers of general-purpose AI models to publish summaries of the training data used and to ensure the traceability of generated content, which must be automatically detectable.

Several legal and technical mechanisms for rights reservation are under consideration, including:

  • the use of websites,
  • TDM protocols,
  • certification initiatives such as C2PA.

Faced with these challenges, new opportunities are emerging—particularly in the field of direct licensing. Experiments are underway in the press and scientific publishing sectors. Yet these initiatives depend on effective and reliable rights-reservation mechanisms as well as clearer legal visibility for involved stakeholders.

The variety of approaches and the advent of technologies like Real-time Augmented Generation (RAG)—which blend generated content with external sources in real time—add additional layers of complexity.

  • provide creators with practical tools to manage their rights in the face of AI;
  • promote transparency and traceability of generated content;
  • support the conclusion of suitable licensing agreements;
  • reflect on possible protection for AI-generated works.

Mai 2025

Guillaume Mortreux, cabinet Santarelli, Juriste Marques

Guillaume Mortreux

Partner | European Trademark & Design Attorney