The remarkable development and democratization of Generative Artificial Intelligence (GAI) present significant opportunities across numerous sectors, while also raising concerns related to potential risks, especially in intellectual property matters.

The question of copyright protection arises at two stages: upstream, during the training phase of data used by the GAI, and downstream, when the GAI generates content based on this data.

Generative Artificial Intelligence (“GAI”) is a specific AI system capable of autonomously creating new data, images1, texts2, music, and videos using machine learning models and instructions from a human user3.

It currently has several applications, particularly in business, for professional content creation, graphic design, operations optimization through predictive models, and customer support via chatbots.

The generation of this sought-after content is made possible through the parameterized use of massive archive data, especially using data indexing4 and extraction techniques5 carried out online.

The enthusiasm for GAI matches the concerns it raises:

  • Not only regarding manipulation risks and threats to liberties, which the European regulation 2024/1689 on artificial intelligence, effective as of August 1, 2024, aims to address6,
  • But more specifically, regarding the risks of copyright infringement, relating both to the extensive data used during GAI training and the use of the new content generated

Among the numerous challenges raised by the emergence of GAI, the issues of legality and control over the “input” data used during GAI training7 that are subject to copyright, and (I) the possible copyright protection of “output” content generated by GAI (II) are particularly prominent.

1

The Legality of Using Training Data: The Fragile Balance Between Copyright Compliance and Support for GAI Development

Among the massive data used during GAI training to generate new content based on user “prompts” or requests, some are protected by copyright, particularly images, texts, sounds, and music that display an “original” character, reflecting the unique, free, and creative choices of their authors.

In principle, the mere reproduction—even partially—of “input” data protected by copyright for the generation of “output” content by GAI would require the prior authorization of the author of the input data. Without this authorization, the author could file an infringement claim against the GAI provider or user.

In practice, identifying GAI’s use of training data protected by copyright is challenging, given the opacity of most of these systems for the public. This is even more difficult when the content generated by GAI, visible only to the user, does not reproduce the characteristics of the training data under copyright protection.

Afin de préserver la compétitivité des entreprises européennes innovantes opérant dans le secteur des IAG, et de trouver un juste équilibre avec le respect des droits des auteurs, le Règlement sur l’IA légitime l’application, aux IAG, de l’exception « de fouille de textes et de données »6, pour autoriser, sans contrepartie financière, la collecte et la reproduction de données d’entraînement accessibles en ligne et protégées par le droit d’auteur7.

Cette exception permet aux fournisseurs d’IAG de s’affranchir de toute autorisation dès lors que l’auteur ou ses ayants droit n’ont pas opposé leur droit de retrait.

En pratique, la mise en œuvre de l’opt-out ou de la défense de ses droits d’auteur se révèle difficile dès lors qu’il est très compliqué pour l’auteur de vérifier l’usage de ses œuvres.

L’application de l’exception de « fouilles de textes et de données » est également vivement critiquée par les auteurs et ayants-droits, dès lors que l’IAG n’avait pas été spécifiquement envisagée au moment de son introduction par la directive UE 2019/790 du 19 avril 2019 et que l’application de cette exception à l’IAG ne serait pas conforme, selon ses détracteurs, au « Triple test » imposé par les traités internationaux et la règlementation européenne8 selon lequel l’exception ne pourrait s’appliquer que dans « certains cas particuliers » qui ne portent pas atteinte à « l’exploitation normale de l’œuvre » et ne cause pas « un préjudice injustifié aux intérêts légitimes des titulaires de droits ».

Les contenus massifs générés par les IAG à bas coûts viendraient en effet concurrencer les œuvres des auteurs et porter atteinte à leur exploitation normale, en les privant de revenus espérés, tout en leur causant un préjudice injustifié sans mécanisme compensatoire.

In response to these concerns, the AI Regulation imposes a transparency obligation on GAI developers and providers to inform users about the origin and nature of the data used13 and to enable authors to identify the use of their works.

Under this requirement, GAI providers must disclose a sufficiently detailed summary of the training data used by their system, though the specifics of this requirement are yet to be defined.

In France, the Higher Council for Literary and Artistic Property (CSPLA) was tasked in April 2024 with establishing a list of information that GAI providers must discloseepending on the cultural sectors involved, to allow authors and neighboring rights holders to exercise their rights14.

The details of the information obligation for AI model providers are expected to be clarified soon, along with the timeline for implementing such an obligation, considering that many GAIs have already been trained on massive online datasets.

The CSPLA has also been tasked with proposing legal mechanisms ensuring fair remuneration for rights holders by sector.

In the United States, the use of pre-existing works by GAI has led to at least 20 ongoing lawsuits against GAI providers, where the application of “fair use”—a copyright exception—is also being debated. In Germany, a decision issued by the Hamburg District Court on September 27, 2024 confirmed the application of the “text and data mining” exception to training data15, and further emphasized the need for transparency in the use of such data for their authors.

Alternative means of ensuring respect for copyright on training data are also being considered and proposed at the European level, including technical measures, the establishment of a pre-certification mechanism for GAI providers targeting the European market, and the marking of AI-generated content to make it identifiable, such as via a “tag.”

2

Copyright Protection for GAI-Generated Content

The generation of content by GAI based on processing input data during training raises the question of whether it can be protected by copyright.

According to the personalist conception of French copyright law, a creation entirely generated by GAI—by nature devoid of personality—without the “free and creative” choice of a human individual, could not benefit from copyright protection. This view is shared by other legal cultures, as evidenced by some—albeit rare—decisions in the United States, despite evident disparities in copyright approaches across different jurisdictions.

As a result, neither GAI itself nor the GAI provider—although potentially holding rights related to the GAI software—could be eligible for copyright protection in France on the productions generated through this system.

However, if GAI is used as a tool to assist in creating a work that reflects the personal choices of the human author, and their respective contributions are identifiable, the recognition of copyright for the GAI user on this work is theoretically possible. A parallel can be drawn with a camera, a technical tool enabling the creation of works that can be protected by copyright.

Nevertheless, to claim such protection in France, the personal contribution of the GAI user would, in theory, have to go beyond the mere formulation of a request (“prompt”), no matter how detailed, and involve “downstream” control and an original contribution to the final generated content, which must be the result of “free and creative choices.”

At this stage, GAI-generated content still remains the product of random choices and uncontrollable algorithmic calculations, with the user’s role often limited to providing an idea guiding the GAI system, which is not protectable in itself.

No judgment has been ruled yet in France, but this approach tends to be upheld in the United States, where the personal contribution of the human author is assessed at all stages of content production by GAI, including at the moment of “output” of generated data (texts, images, videos, sounds). In contrast, China has been more open to protecting GAI-generated content as long as significant human input is observed, even if it is only at the level of input data and the request.

The evolution of case law and legislation on these matters should clarify the legal solutions to adopt and provide a more secure environment for GAI providers, users, and authors of intellectual works.

November 2024

Lorraine Bazin, avocate SANTARELLI

Lorraine Bazin

Attorney-at-law

  1. Natural or legal persons who, for consideration, leverage their notoriety with their audience to communicate electronically to the public content intended to promote, directly or indirectly, goods, services, or any cause whatsoever.
  2. Law No. 2020-1266 of 19 October 2020 regulating the commercial exploitation of the image of children under sixteen on online platforms
  3. Article L.132-2 of the Consumer Code
  4. Article L.131-27 of the Criminal Code
  5. Paris Court of Appeal, 23 February 2024, RG No. 23/10389