Tech versus Humans? AI-powered language models and IP Battles – GDE 29 Report

The emergence of AI-powered language models (one of the first ones being ChatGPT, rapidly followed by other models) profoundly affects the relation between AI and Intellectual Property. GDE 29 tackled these uncertainties.

20 octubre, 2023
Datos, Economía, GDE Report, Global Digital Encounters, Propiedad Intelectual, Resúmenes y Propuestas de Fide

Date: September 15th 2023

Speakers

Dr. Carmelo FONTANA, Academic fellow at Bocconi University.
Prof. Dr. Thomas MARGONI, Research Professor of Intellectual Property Law, Centre for IT&IP Law (CiTiP), Faculty of Law, KU Leuven
Prof. Javier FERNÁNDEZ-LASQUETTY, Partner at Elzaburu, Professor of IP at the IE Law School. Member of Fide’s Academic Council. Academic Coordinator

Objectives of the 29th Global Digital Encounter:

In the Transatlantic Area, and all over the world, the emergence of AI-powered language models (one of the first ones being ChatGPT, rapidly followed by other models) profoundly affects the relation between AI and Intellectual Property. Public authorities, businesses, and consumers face challenging questions on redefining the role of IP in this innovative context. Further digitalization of society is expected to spur new evolutions. What kind of consensus may exist in various parts of the world and international fora for such changes, and what kind of IP transformation/battles are connected to this matter? For this Session of our Global Digital Encounters, Senior Speakers who specialize in this cutting-edge and thrilling topic, with a variety of opinions, will provide a worldwide picture of the probable future of IP protection and battles connected to AI-powered language models.

GDE-29-final-report-tech-versus-humans-ai-powered-language-models-and-ip-battles Descarga

Report

Prof. Laurent Manderieux demonstrated in his upbeat welcoming speech that the hot topic of Artificial Intelligence (AI) is at the heart of our present and future. As a result, FIDE and the Transatlantic IP Academy will concentrate their discussion at this 29th Global Digital Encounter on how AI, particularly generative AI, has completely changed the face of issues connected to intellectual property and far beyond.

In today’s discourse, there exists a significant avenue for enhanced comprehension and knowledge of this emergent topic. It is incumbent upon international digital forums to address this paramount issue and to engage some of the foremost experts in this field from around the world. Professors Carmelo Fontana from Bocconi University and Thomas Margoni from Leuven University both possess extensive expertise in this topic, having engaged with it in numerous capacities and diverse formats.

Moderator Prof. Javier FERNÁNDEZ-LASQUETTY noted that today’s discourse focuses on the evolving dynamics of AI and its implications for intellectual property. Historically, we have been examining AI-related questions for about seven to eight years. In 2019, a significant congress titled “A Dialogue Between AI and IP” was held in Alicante (Spain), which produced notable insights.

It is evident that the current state of AI has evolved significantly from its iteration around 2015-2016. The introduction of LLMs (Language Learning Models) has profoundly transformed this domain, spurring concerns in diverse areas such as IP, personality rights, and data protection. The focus of this event, however, will be IP, specifically in the context of LLMs. Given the vastness of potential topics, we aim to engage in a fluid conversation around salient issues prompted by this emerging technology.

A pressing concern is the matter of copyright associated with LLM outputs, especially in light of recent decisions by the US Copyright Office. Their recent verdict on the copyright protection of LLM-generated works seems consistent with prior rulings, suggesting some potential for copyrighted material, albeit with ambiguities that remain.

Question 1: Given the necessity of human intervention in these systems, can LLM outputs be copyright protected?

Speaker Prof. Dr. Thomas MARGONI pointed out that the foundational legal framework concerning computer-generated works was developed already decades ago, when Prof. Pamela Samuelson in her seminal work on the allocation of ownership rights in computer generated works (1985). was among the first to identify the issue.

If a work is purely computer-generated (i.e., not computer assisted), the space for copyright protection is very limited if not completely absent. Historically, copyright law has has emerged as a form of protection for human ingenuity and the various theories underpinning this approach (Locke and fairness; Kant and personality; Bentham and utilitarianism) in various ways and with perhaps different goals have always centred on the role of human creation. The reasons for this stance can be both philosophical and historical in nature in the sense that the space for computer generated works or AI generated works in, say 1710, was obviously very limited! Throughout the history of copyright, there has always been an implicit emphasis on human creativity.

However, when we discuss computer-assisted creations where AI plays a supplementary role, the traditional copyright principles hold. If there is a discernible element of human creativity, that element can be and usually is protected by copyright, while AI’s contribution is normally not. Difficulties may lay on how to properly distinguish them, but this is a different type of question. The recent US decisions more or less seem to align with such an understanding. It’s conceivable that, in the future, we as a society may want to recalibrate our legal frameworks if there arises a compelling reason to grant copyright to entirely computer-generated works. Such a shift would be predicated on normative considerations and our vision for society, but a clear case for such a shift would need to be made. At present it is far from clear that there is any need to protected AI generated works.

Speaker Dr. Carmelo FONTANA responded that, ultimately, the question of copyright protection for AI-assisted works hinges on the degree of human intervention and how technology is employed. To elucidate this, one can draw a parallel to the “monkey selfie” case involving photographer David Slater. If human intent can be attributed to a calculated process, be it in the photographer’s mind or theoretically within a machine’s parameters, then the resultant work may warrant copyright protection.

For instance, while a simple language translation performed by an AI may not raise IP issues, the situation becomes complex when AI is tasked with more intricate transformations, like converting bullet points into comprehensive documents. The foundational element here is the degree of human intent and guidance provided to the AI. The crux of the matter, however, is whether such AI contributions align with copyright’s primary objectives—to stimulate creativity and preserve culture. The resolution to this conundrum largely depends on the evolving ways technology is utilized and its eventual commercial implications.

Moderator Prof. Javier FERNÁNDEZ-LASQUETTY shared the same ideas but expressed concerns over recent decisions taken by the US Copyright Office. He highlighted the evolving nature of technology and its implications for copyright considerations. Recalling a study from around 2018 on copyrightable subject matter in AI, he noted three potential areas of human integration: data selection, machine operation, and output decision-making.

However, with modern systems like LLMs, the machine pre-loads data and operates autonomously, limiting human intervention largely to output selection. Current debates focus on the length and depth of prompts provided to AI. The recent US Copyright Office ruling was stringent, suggesting mere directions aren’t enough to establish significant human intervention.

Question 2: Does copyright law need to be adapted to address evolving technology?

Speaker Prof. Dr. Thomas MARGONI contended that the challenge lies in the purpose and goals of regulating AI in copyright: whether it is for market regulation, guiding the tech sector, aiding creators, or establishing governance models for AI through property rights. These are big questions that go far beyond “simple” copyright issues and extend to the ways a legal system intends to regulate a technology such as AI, which, perhaps more than others in the past, has the potential to become an essential tool for any type of future business, technology, or human relation.

Staying within the copyright field, it is however important to develop a full understanding of how current copyright rules can apply to AI, LLMs, and other forms of technology. For instance, if a user enters a prompt in an AI app interface (when you as a user “ask” something to ChatGPT) and shows an author’s own intellectual creation, it might very well be copyright-protected under current rules. The output could then be viewed as a copy in part or perhaps even as a derivative of that input, depending on applicable law and on the specific technological processes involved. Perhaps, this is an important aspect to point out. AI is a complex ecosystem where different approaches, algorithms, and implementations operate differently. What is valid for one is not necessarily valid for a different one.

However, most interactions with AI tend to be basic requests. The real challenge will be proving the human element and comparing the input with the output to determine similarities or derivations. The concept of “human causality” (Hugenholtz & Quintais 2021) might assist in discerning whether there has been adequate human intervention to grant authorship. The practical application and proof will likely pose challenges in courts, but theoretically, the principles are somewhat clear.

Speaker Dr. Carmelo FONTANA expressed skepticism towards establishing rigid baselines, such as specific lengths or structures, to determine what is copyright protected or considered as a work of art. Instead of focusing solely on technology like LLMs or general adversarial networks (GAN), the emphasis should be on the creator’s intention to create a specific output.

Drawing an analogy from the world of art, he references the works of Italian painter Lucia Fontana, who created prominent masterpieces with simple cuts on canvases, emphasizing intentionality over the method of creation. As technology advances and amplifies the reach of human capabilities, the primary focus should be on discerning the original intention of the author or artist when evaluating the protectability of the creations. Ultimately, this will be a task for courts to undertake.

Moderator Prof. Javier FERNÁNDEZ-LASQUETTY suggested that current attempts, such as Dabus, to classify new types of AI creations have created a binary situation. Historically, copyright has granted rights to non-traditional creators.

Speaker Prof. Dr. Thomas MARGONI highlighted the debate around assigning property rights to AI-generated works. He questioned the intent behind such allocations, noting the diverse interpretations of copyright between continental Europe and common law countries. While Europe views copyright as protecting the author’s personality, common law countries emphasize a utilitarian approach. Some jurisdictions, like the UK, provide specific rights for computer-generated works, allocating rights to those who make the necessary arrangements. Such a framework might offer a way to address AI-generative outputs.

A recent study on the «reCreating Europe» project found that in the UK, these provisions were seldom used. The limited adoption might be due to the UK’s previous EU membership, which could have made such provisions incompatible with the CJEU rulings. If the UK’s experience is indicative, there’s been little urgency to protect AI-generated content, which seems to confirm the theoretical framework discussed earlier.

Speaker Dr. Carmelo FONTANA pointed out that it is essential to understand the evolving value chain in AI and the relationships between different players. From service providers, programmers, deployers to end users, many interact with and derive value from these technologies. At the same time, the market’s economics remain uncertain. How platforms will sustain themselves is unclear, though a subscription model might be viable. As these platforms become cost-effective, transferring exclusive creative rights to users may support our core aims: fostering the creation, safeguarding culture, and equitably dispersing societal benefits.

Contrasting the emergence of the press and copyright, AI technology’s progression is significantly swifter. The global scale and rapid pace introduce unique challenges and economic considerations, suggesting that traditional copyright or a tailored right might not be the apt solution.

Moderator Prof. Javier FERNÁNDEZ-LASQUETTY argued that the historical evolution of copyright, from the creation of the printing press to the 19th century with figures like Victor Hugo and Balzac, shifted the narrative towards copyright as a result of human creativity. This has shaped our current understanding and differences in solutions across regions, such as the UK. The immediate issue at hand is infringement, specifically concerning large language models (LLMs). There are concerns about LLMs infringing on third-party rights during training and use. The crux of the issue is determining the liability.

Question 4: Do LLMs infringe on the IP rights of third parties?

Prof. Dr. Thomas MARGONI points out that preserving the human creative process and creative industries’ business models based on that is vital in our AI-driven era. The key to that may in fact not lie in regulating the final product or output of AI but in understanding better the role of training data which forms the beginning of the AI value chain. In Europe, we have two new text and data mining exceptions. Despite not being perfect, they provide a degree of clarity in an otherwise uncertain field. Such clarity is especially beneficial for research organizations preventing them from being deterred by legal ambiguities. As discussed in a recent paper, individuals, SMEs, and other commercial players can also benefit from these provisions when they are not “reserved” by the right holders (see https://academic.oup.com/grurint/article/71/8/685/6650009). Big corporations, like Google or Microsoft, can navigate such uncertainties due to their vast data and financial resources.

One important aspect of these exceptions is the provision allowing the retention of copies for verifiability, which aligns more with, public policy goals (transparency and accountability) and research needs (replicability) than traditional copyright aims. However, by carving out these exceptions, we implicitly affirm that outside these boundaries, there’s an economic right to train models. Training a model, for many, equates to text and data mining, a broad definition under the exception that likely covers general AI uses. However, we should be mindful that there’s a distinction between using TDM for data analysis, like predicting epidemics, which should not fall under copyright, and using AI to produce market substitutes for copyrighted works. TDM is a basic research technique and as such should not be regulated by copyright. The use of Generative AI to compete in creative and cultural markets seems a more regular element in copyright discourse. Article 4, the general-purpose text and data-mining exception (CDSM Directive), is crucial as it gives authors the choice to permit or prohibit their works from being used for training AI. Commercial practices seem to be developing a TDM.txt or a AI.txt standard not too different from what in the good old days a Robot.txt was used for indexing web pages by Internet search engines

Current technological developments could provide this control, ensuring authors’ rights and their relationship with their work are respected. Some may seek remuneration, while others might focus on the personal and emotional connection to their creations. Ironically, while the text and data mining exceptions might not perfectly cater to text and data mining, they offer intriguing possibilities for regulating generative AI uses.

Speaker Dr. Carmelo FONTANA asserted that the provision for text and data mining has fortuitously become technology-agnostic, proving its future-proof nature by covering applications beyond its initial scope. Once data is mined, it is often operationalized within systems. A significant challenge lies in standardizing opt-out mechanisms. If managed through individual communications, the process could become overly complex and burdensome for platforms. Platforms are actively working on effective tools for these opt-out choices that benefit rights holders. Another complex issue arises post-data-mining: once texts are used under the exemption and assimilated with other data, they become part of an intricate system or algorithm that has undergone learning.

Extracting this “learned wisdom” is challenging. Asking technology to “unlearn” is akin to asking someone to forget a joke they found funny. To effectively remove data following an opt-out would be costly and potentially ineffective. It may require resetting the algorithm or introducing manual removal, both of which may not achieve the rights holder’s intended outcome. This presents a challenge that large platforms will need to address in the future.

Prof. Dr. Thomas MARGONI pointed out a hypothetical scenario in which an AI model is found to be infringing copyright because the model has used a certain database or a piece of protected work. A part of the aspect of damages, the difficulty associated with injunctive relief remedies would be the need to re-train the whole system with training data that does not contain that specific protected work. This would have not only major economic but also major environmental implications.

It is a known fact that the computing power of these LLMs does not only require time and economic resources but also the use of energy which may also have a significant environmental impact. That probably also plays a role in why we currently have only a handful of highly advanced LLMs coming from usually vertically integrated platforms offering these services.

Moderator Prof. Javier FERNÁNDEZ-LASQUETTY pointed out that in the end, this is a question of proportionality, a very central question in the field of IP in general.

Speaker Dr. Carmelo FONTANA agreed that the question is all about proportionality in this case. There might be damages involved and a remunerative award granted to the author. In case there was an injunctive relief involved, the injunction might also have an impact on third parties’ interests. There might be other parties who actually want the AI platform to be trained with their data, and if the injunction is enforced, perhaps the training data is not recoverable anymore, and third parties’ interests are thus affected.

This type of governance of the ultimate effect of technology would not be in the best interest of society. It can also be debated whether the mere fact that a piece of information is copyright protected and included in the training data, that the use of such a piece of information would be such relevant and significant use that the use of it would be tantamount to copyright infringement. Ultimately, this is a complicated question. When the technology matures, it is also perceivable that the legislation in question is able to strike the right balance and the rights involved should be respected by the other stakeholders.

Moderator Prof. Javier FERNÁNDEZ-LASQUETTY summarized that the issue of proportionality is a crucial theme and that there is certain tension between different interests. From a certain point of view, it could also be problematic to conclude that the author’s IP rights have been infringed but at the same time rely on environmental impact as a reason not to enforce those rights.

Prof. Dr. Thomas Margoni added that it is interesting how, for some reason, we accept that copyright should alone be regulated. His most recent work argues that this is something copyright cannot do or cannot do alone. As shown by other initiatives, e.g., the AI Act proposal, AI governance is largely external to the copyright. What is missing is the development of proper interfaces between copyright and AI regulation.

Economic and regulatory theory purports that copyright and AI governance are moved by different incentives and policy objectives. Accordingly, using one to achieve the goals of the other would be a logical and procedural mistake. For example, the proposed Art. 28b of the Parliament version of the AI Act, requires developers of generative AI models to document and make publicly available a summary of the training material when the training material is covered by copyright. It can be questioned whether this is a way to guarantee more transparency in AI. Why is the requirement imposed only on copyright-protected material – one also could think of personal data or other types of protected content to be covered by the disclosure requirement. If the objective of this is not to enhance the transparency but rather to only operationalize the right holder opt-out, then the objective of having a fair, transparent and replicable AI environment is lost. There is an increasing mismatch between these objectives, and it is unclear why it is taking place.

Speaker Dr. Carmelo FONTANA added that Art. 28 of the proposed AI Act makes sense in a closed environment. So, if the training data is based on a library or a closed database, then the model proposed in Art. 28 could be a way to operationalise and also increase transparency. On the other hand, when the model operates on a larger scale, for example, when the source of the data mining is the entire internet, then the burden of identifying whether the web page in question contains protected works, and the author of those works, if there is a licensee and whether or not the license has expired – the due diligence task would become practically impossible to achieve.

On top of that, there are important trade secret issues that should not be disregarded. If the developer of an AI model is required to ensure transparency by disclosing training data and the algorithm, they might end up disclosing more information to their competitors than what they really would like to disclose.

Dr. FONTANA is openly critical of the proposed Art. 28 which was not included in either the Commission or the Council proposal. It is easy to imagine why Art. 28 ended up in the Parliament version, but it is difficult to see how it all would work in practice – unless it is about some bespoken platform that has been trained on licensed data in a licensed library. If the TDM exception is invoked and suddenly also the protected rights are to be identified, then Art. 28 could frustrate the whole purpose of having the TDM exception.

Speaker Dr. Carmelo FONTANA drew an analogy to an ordinary hosting service provider scenario where the rightsholder approaches the hosting service provider (e.g. platform) claiming that they are the owner of the rights to a certain work and provide an extract or a sample of that work and the hosting service provider is required to check whether or not the protected work in question appears on their services and act expeditiously if there is a match and the content is to be taken down.

Question 5: How is the Data Act being used to create communities of data?

Prof. Dr. Thomas Margoni asserted that, by using the language of the Commission, these are so-called Common European Data Spaces. These are subject to strategic investment and many initiatives by the current European Commission. There are around 10 vertically identified data spaces already in different sectors, such as mobility, circular economy, cultural heritage, open science, smart cities, and health data spaces. The creation of these data spaces is based on legislation like the Data Act, the Data Governance Act, the Open Data Directive, and even older legislation such as the free flow of (non-personal) data regulation. This is a very ambitious project to create a market, and infrastructural and regulatory conditions for a data ecosystem to emerge, and also to embed the EU core values into it.

If we assume that modern AI can only exist when it has access to data, then regulating data is basically equal to regulating AI. This may also affect the incentives created by property rights. In one of Prof. MARGONIs’ papers, it is argued that there is a shift emerging from a property-based to a governance-based approach in the field of data law. So, the Data Act is creating rights for nonpersonal data that have been seen only in the field of personal data previously. For example, a right to data portability requires the manufacturer of an IoT product not only to disclose the data it collects but also to transfer it to even a potential competitor for the purposes of right to repair, consumer freedom, avoidance of consumer lock-in, etc.

If the Open Data Directive is taken as an example, it creates almost a taxonomy of public bodies required to make data usable in various ways. One of the reasons indicated in the recital is to make high-quality data publicly available. In terms of AI training, open access to high-quality public data is of utmost importance. If payment is needed for access to high-quality data, the large online platforms are set on a competitive advantage, as they already have access to large amounts of data and huge economic resources. It is interesting to see many recent EU legislative initiatives, such as the DSA or the CDSM Directive almost targeted at “punishing” these very large online platforms for their dominant position in the market, but on the other hand, by misuse of copyright law, the problem sort of returns through a backdoor.

Questions from the audience

Several questions were raised in the chat. One question was whether an AI model, which acts as a simultaneous interpreter, could be infringing on moral rights, as the translation of an AI model perhaps would not reflect what the person in question was really saying.

Also related to the question of proportionality, one participant pointed out that balance is a key issue, and copyright as a property right is not without its limits. For example, in Spain there is a principle of «ius usus inocui» which establishes that property rights are not absolute. In the Mario vs. Google case (STS nº 172/2012 of 3 April), the Spanish Supreme Court on the one hand discussed the U.S. fair use doctrine but on the other hand, referred to the principles of Spanish law and concluded that there are limits to rightsholders’ proprietary rights.

The third question centered around drawing an analogy between AI outputs and the protection of photographs through copyright or related rights.

Prof. Manuel DESANTES REAL concluded the 29th Global Digital Encounter by pointing out that the discussion will certainly continue on this topic. It is of concern if the solutions to these problems are sought outside of the framework of the Berne Convention, as national differences are very likely to occur since property laws in general are not as harmonized as the IP rules.

Report written for the Global Digital Encounters by GDE Support Team members BAIYANG XIAO and EETU HUHTA

Si te ha resultado interesante el artículo,

te invitamos a compartirlo por Redes Sociales

Deja un comentarioCancelar respuesta

Este sitio usa Akismet para reducir el spam. Aprende cómo se procesan los datos de tus comentarios.

Laboral/ Pensiones/ Emprendimiento

Buscar por categoría

Noticias

Categorías:

Últimas Noticias

Boletín de Fide

Nuestra Literatura

Propuestas de Fide

Resúmenes de Fide

Nuestra Bliblioteca Literaria

Obra destacada

Nuestros Blogs

Nuestros videos

Nuestros Podcasts

Tech versus Humans? AI-powered language models and IP Battles – GDE 29 Report

Report

Question 1: Given the necessity of human intervention in these systems, can LLM outputs be copyright protected?

Question 2: Does copyright law need to be adapted to address evolving technology?

Question 3: Is there a need to establish a new category within copyright laws specifically for creations that utilize artificial intelligence and related tools?

Question 4: Do LLMs infringe on the IP rights of third parties?

Question 5: How is the Data Act being used to create communities of data?

Questions from the audience

Deja un comentarioCancelar respuesta

Cambia de idioma

Descubre más desde Fundacion Fide

Contacto