Photo Credit: Justin Sullivan/Getty Images

A U.S. federal judge has ruled that OpenAI must disclose internal communications and documents in a significant copyright infringement case brought by authors and publishers against the famous artificial intelligence company. This ruling stands as a major setback for OpenAI in the discovery phase of the lawsuit litigation. 

The lawsuit involves allegations that the ChatGPT-maker used copyrighted literary works without authorization to train its famous chatbot’s language models. This consolidated class action lawsuit was brought forward on behalf of prominent authors including but not limited to John Grisham, Jodi Picoult, and Douglas Preston, alongside The Authors Guild, with all of them seeking redress for copyright infringement under the U.S. Copyright Act.

The case also includes copyright infringement claims from The New York Times (NYT) against OpenAI, arguing that the tech giant unlawfully used content from NYT to train its ChatGPT language models, which is now negatively affecting its business and journalism efforts.

In light of these accusations, Judge Ona T. Wang, the federal judge in charge of this case, ordered OpenAI to turn over internal Slack messages and emails that were discussing the deletion of large datasets known as “Books1” and “Books2,” which allegedly contained pirated materials from sources like Library Genesis that were used to train ChatGPT. 

The ChatGPT-maker had initially claimed these communications were protected by attorney-client privilege, but the judge rejected these arguments, ruling that most of the communications are subject to discovery. 

The judge also determined that OpenAI could not simultaneously claim that datasets were deleted due to non-use while also asserting that the reasons for deletion are privileged information. This contradiction, Judge Wang notes, subverts and undermines OpenAI’s privilege assertion. 

In her ruling, Judge Wang wrote that “OpenAI has waived privilege by making a moving target of its privilege assertions,” shedding light on the company’s inconsistent legal positions. 

Additional Disclosure Orders in A Series of OpenAI Copyright Case

Beyond the internal communications about deleted datasets that OpenAI must produce, the AI giant must also produce approximately 20 million anonymized ChatGPT user logs to The New York Times, as demanded by the organization. 

This is in response to another copyright case the international news organization brought against OpenAI, claiming that many ChatGPT users are using the chatbot to get around the NYT’s paywall, which is, again, putting a dent on their business and journalism efforts. 

In response to this, the AI giant argues that the release of 20 million ChatGPT user logs poses significant privacy risks and that over 90% of the transcripts are unrelated to infringement claims. 

“This demand disregards long-standing privacy protections, breaks with common-sense security practices, and would force us to turn over tens of millions of highly personal conversations from people who have no connection to the Times’ baseless lawsuit against OpenAI,” Dane Stuckey, the Chief Information Security Officer at OpenAI wrote in a statement.

However, Judge Wang rules that these chat logs are essential to determine whether ChatGPT outputs have reproduced copyrighted material from the NYT and other news organizations. The Judge also dismissed OpenAI’s concerns based on the fact that the company’s de-identification protocols adequately protect user privacy. 

The Bigger Fight Over AI Training Data

This isn’t OpenAI’s first rodeo. Similar OpenAI copyright cases from music labels, news outlets, and more abound, with all questioning if AI giants can really get away with vacuuming up the internet for free with no repercussions, while claiming it’s only for “research and development” purposes. 

While OpenAI claims to use this data fairly and for transformative benefit for all, using the excuse that AI doesn’t copy because it learns patterns, Authors counter and remain steadfast in the fact that without their work, ChatGPT wouldn’t exist. 

Now the significance of this discovery ruling by Judge Wang lies in its potential to establish willful copyright infringement. If the evidence demonstrates that OpenAI knowingly deleted datasets in anticipation of litigation, this could dramatically increase damages exposure.

Per copyright law in the U.S., willful copyright infringement can result in damages up to $150,000 per work infringed. Given that OpenAI allegedly trained models on hundreds of thousands of books, potential liability could reach into billions of dollars. 

Additionally, if the court also finds that OpenAI destroyed evidence with the anticipation of litigation, jury instructions in future trials could presume that the destroyed evidence would have been detrimental to OpenAI’s defense, which could ultimately influence future litigation lawsuits on AI data training.

Share.

I’m Precious Amusat, Phronews’ Content Writer. I conduct in-depth research and write on the latest developments in the tech industry, including trends in big tech, startups, cybersecurity, artificial intelligence and their global impacts. When I’m off the clock, you’ll find me cheering on women’s footy, curled up with a romance novel, or binge-watching crime thrillers.

Comments are closed.

Exit mobile version