The European Union is still caught in an AI copyright bind

Economy, Featured, Innovation, Technology, World

The European Union is still caught in an AI copyright bind

Tweaks to EU copyright requirements in the AI Act Code of Practice will maintain the AI growth momentum, for now – but will also create new challenges.

As part of the implementation of the European Union’s Artificial Intelligence Act (Regulation (EU) 2024/1689), the European Commission has published several implementation guidelines, including the final Code of Practice (CoP) for the Implementation of the AI Act, issued in July 2025. It has also published detailed guidelines for reporting on data used for AI model training and obligations for providers of general-purpose AI models.

The CoP covers three issues: transparency about the construction of AI models, how providers should deal with model safety and security, and how to comply with copyright in AI training data.

Compliance with the first two is easy. Frontier AI models can by now fill in their own transparency forms and run regular safety and security checks. This will not increase regulatory compliance costs beyond what AI modelers are already doing.

Copyright is a thornier issue, however. For frontier AI models, more training data improves performance (Kaplan et al, 2023). Copyright obligations reduce the quantity of available data and, through licensing requirements, increase the price of model training data. Data access conditions also affect the EU’s global competitiveness in AI.

While some CoP copyright provisions make sense – a prohibition on reproducing copyright-protected content in model outputs; training on lawfully accessible data only – others are problematic. This includes being transparent about the origins of datasets used for model training and managing the rapidly rising number of copyright holders who have withdrawn (‘opted-out’) their data from model training (Longpre et al, 2024) in accordance with the EU Copyright Directive (Directive (EU) 2019/790).

EU regulators are caught between EU copyright law and global competition between national AI regulations. They cannot modify the law in the short run to accommodate the data needs of AI models. But full application of the law would endanger EU access to the best AI models and services and erode competitiveness. The CoP is the latest attempt to square that circle.

The illusion of full transparency

Together with the AI Act, the CoP is intended to help improve transparency about the use of model training data. It assumes that this will facilitate licensing of copyright-protected data, thereby giving copyright holders a fair share of AI revenues. That wishful thinking ignores the problem of high transaction costs and the licensing fees that negotiating deals with millions of online rightsholders would generate for AI developers. For the multitude of small web publishers, transaction costs may exceed the value of licensing fees.

The model used for online advertising isn’t easily transferrable to AI licensing. Nor does collective licensing offer a solution (Geiger and Iaia, 2024), replacing individual pricing with single collective prices determined by intermediaries. Shifting the problem from the licensee to an intermediary copyright management organisation does not solve the issue of tracking and assigning value to content use.

Setting the overall collective license price for hundreds of AI models would also result in fragmentation of the EU AI regulatory landscape as national copyright management organisations in EU countries would propose their own rules and pricing. Spain and Germany, for example, have tried already. In an ideal world with a global copyright authority, with access to all AI training data inputs, model outputs and revenues, an AI model might solve this vast computational problem – assuming an agreement on how much AI revenue should be redistributed to training data producers (Martens, 2024). High transaction costs exist precisely because such a copyright authority does not exist and provide a justification to grant copyright exceptions (Posner, 2004). This could have been applied to AI training data.

EU policymakers found another way out of this conundrum. The CoP guidelines for reporting training data require only a summary list of the most relevant online domains from which training data was scraped⁴. Small online publishers, for which transaction costs are more likely to exceed licensing fees, can be omitted. This twist is in line with the European Commission’s goal to “improve the copyright framework to address new challenges”. But it also creates new challenges.

It may result in biased training datasets – especially penalising for smaller EU language and cultural communities. Leading AI models are already biased in favour of large English-language communities and lack cultural diversity (Zhang et al, 2025). These CoP copyright provisions undermine the AI Act’s aim to reduce bias in AI models.

Stretching the CoP beyond copyright law

However, this approach still leaves some issues unresolved. The CoP defines AI training data very broadly as all data used for pre-training, fine-tuning and reinforcement learning, regardless of whether the data is protected by intellectual property rights. This includes personal data protected by privacy rights, synthetic data generated by the model developer and data ‘distillated’ or extracted from other AI models. Models are not protected by copyright (Henderson and Lemley, 2025). Developers appear to be increasingly secretive about the production of synthetic training data, often extracted from other AI models, because it has become an important factor in their competitiveness. Again, a vague CoP formulation rescues developers: only a brief narrative of data sources is required. That gives considerable discretion to the AI Office, responsible for the implementation of the EU AI Act.

The rapid evolution of AI models is diminishing the importance of copyright-protected model training data. The latest generation of ‘reasoning’ models relies more on reinforcement learning with synthetic data, often extracted from other AI models. Copyright, however, remains important at the other end of the AI model lifecycle, beyond training, when models retrieve data in real-time from external sources to respond to user queries. However, this data is not covered because copyright provisions in the AI Act and CoP are limited to data used for AI training, not data collected after training.

Unequal treatment of training and post-training data led to another CoP provision that stretches beyond EU copyright law. The CoP instructs AI model developers to ensure that compliance with data mining opt-outs does not negatively affect the findability via search engines of these opted-out contents (Kim et al, 2025). In other words, data that is out of bounds for AI bots and models should still be collectable by search bots operated by the same company, to maintain the flow of engine traffic, online sales and advertising revenue to publishers’ websites.

This is intended to help web publishers push back against the fall in traffic to their sites induced by AI answer engines. However, it discriminates between machine learning and human learning. Machines will be able to retrieve less information than human users of search engines. This reduces the quality of AI answers and forces human users to invest more time and cognitive effort in constructing their own replies by clicking and reading relevant pages in search engines, rather than obtaining ready-made answers from AI models. That keeps human learning costs artificially high, compared to what they could be with a level playing field in data access between search and answer engines. Ultimately, this would have the consequence of reducing the efficiency of human learning and slowing down innovation in society.

Rapid convergence between search engines and AI answer technologies may soon make this CoP provision obsolete. Data collected by search bots and AI bots will converge on the same page. Google, Microsoft and OpenAI already offer search and answer services jointly. Web publishers and e-commerce platforms realise that the shift from search to answer is inevitable in the AI age, especially with the fast rise in AI agents, and are looking for other ways to generate revenue.

The EU’s AI copyright regime can only stand if AI models trained under more liberal regimes in other constituencies can be kept out of the EU market. For that purpose, the AI Act includes an extra-territoriality clause, a very contentious provision because copyright law is essentially territorial law (Quintais, 2025).

Countries with less-restrictive copyright regimes tend to do better in AI innovation (Peukert, 2025). Japan, for example, applies copyright to use of media data ‘for enjoyment’ by consumers, but allows a copyright exception for intermediate use for AI training. In US copyright law, transformative use of media content for purposes other than originally intended may constitute an acceptable exception to copyright protection. US courts seem to be gradually moving in that direction for AI training data (Martens, 2025), and President Trump certainly supports the trend.

In July, the US administration published its AI Action Plan (The White House, 2025). It recognises that US-China geopolitics and America’s wish to achieve global dominance is the main driver of the AI race. It wants to remove all barriers, including regulatory barriers, that stand in the way of achieving that goal. Should US courts accept AI models as transformative use of training data and grant an exception to copyright protection for these data, the EU would need to align with the US on copyright (the alternative in extremis would be for the EU to drop out of AI altogether). Developers would not re-train models with data acceptable in the EU market, reinforcing a situation in which hardly any leading AI models have been trained in the EU so far.

Attempts to change EU copyright law may also backfire, potentially leading to even more detailed transparency requirements for all AI training inputs and a presumption in the absence of full transparency that relevant works have been used for AI training purposes, which could trigger infringement procedures.

Policy conclusions

Subtle weakening of copyright enforcement in the CoP, limiting it to the most relevant data used for AI training, has enabled most major AI developers to sign up to the CoP, except Meta and xAI¹². Signatories will have been comforted by vague formulations that leave considerable margin for interpretation by the AI Office. AI regulators are caught between EU copyright law and AI regime competition with other countries. Attempts to reform that law may take many years and might backfire. Muddling through may be the only feasible option in the short run.

A more satisfactory policy would require a debate on the role of AI in enhancing learning, research and innovation (Senftleben et al, 2025), and the conditions under which it can access data to feed that process. Copyright is the wrong starting point for that debate. It was originally designed to promote creativity and innovation. But in the AI era it has become a reactionary force that diminishes innovation potential, controlled by media industries that represent less than 4 percent of GDP. Staying on top in the global AI competition requires an efficient pro-innovation regime.

A better solution would be for copyright law, or at least it’s application to AI models, to take some inspiration from pro-innovation features in patent law. The innovative content of patents is publicly available and anyone can learn from it, build new innovation around or on top of it (Murray and Stern, 2007). But only the patent holder has the right to commercial reproduction of the original invention. That accelerates the spread and accumulation of knowledge growth and still protects original innovators.

Applying this approach to data for AI model training, and to post-training data retrieval, would permit AI models and users to learn from all legally accessible content and data. This could be achieved by widening the interpretation and application of the copyright protection exemption for digital data processing in Art 4§3 of the EU Copyright in the Digital Single Market Directive (Directive (EU) 2019/790). How close AI model outputs may come to the original is an AI model output question, not a data-inputs question. Moving the policy debate to permissible AI outputs would at least clear the way for unrestricted learning and transformative use on the inputs side.

Source : Bruegel