OpenAI Counters New York Times Lawsuit Claims on Use of Paywalled Content

AI company says it is working to fix the regurgitation of content but claims the newspaper isn’t telling the full story
OpenAI Counters New York Times Lawsuit Claims on Use of Paywalled Content
OpenAI CEO Sam Altman during a meeting at the Station F in Paris, on May 26, 2023. (Joel Saget/AFP via Getty Images)
Matt McGregor
1/12/2024
Updated:
1/12/2024

OpenAI responded to a copyright infringement lawsuit filed by The New York Times by describing its use of the media outlet’s content as negligible.

“Just as humans obtain a broad education to learn how to solve new problems, we want our AI models to observe the range of the world’s information, including from every language, culture, and industry,” the company said in a press release.

“Because models learn from the enormous aggregate of human knowledge, any one sector—including news—is a tiny slice of overall training data, and any single data source—including The New York Times—is not significant for the model’s intended learning.”

The company argued that using content available on the internet to train its models constitutes fair use “as supported by long-standing and widely accepted precedents.”

“We view this principle as fair to the creators, necessary for innovators, and critical for US competitiveness,” the company said.

In December 2023, The New York Times filed a lawsuit (pdf) against Microsoft and OpenAI arguing that the technology firms illegally drew from the newspaper company’s content to build their large-language models (LLM).

“While defendants engaged in wide-scale copying from many sources, they gave Times content particular emphasis when building their LLMs, revealing a preference that recognizes the value of those works,” the lawsuit states.

However, according to OpenAI, there are multiple entities and academic institutions that have submitted comments to the US Copyright Office arguing for fair use.

“Other regions and countries, including the European Union, Japan, Singapore, and Israel also have laws that permit training models on copyrighted content, an advantage for AI innovation, advancement, and investment,” OpenAI said.

Legal rights, however, are less important than being “a good citizen,” OpenAI said, which is why it provided an option for companies to opt out of allowing the GPTBot to access their content.

Regurgitation: ‘A Rare Failure’

OpenAI called its plagiaristic “regurgitation” of paywalled content “a rare failure” that it’s working to fix.

“So we have measures in place to limit inadvertent memorization and prevent regurgitation in model outputs,” OpenAI said. “We also expect our users to act responsibly; intentionally manipulating our models to regurgitate is not an appropriate use of our technology and is against our terms of use.”

The lawsuit argued that there are numerous examples of how the AI programs copied The New York Times’s content verbatim, in addition to attributing incorrect information to the media source.

“Using the valuable intellectual property of others in these ways without paying for it has been extremely lucrative for defendants,” the lawsuit states. “Microsoft’s deployment of Times-trained LLMs throughout its product line helped boost its market capitalization by a trillion dollars in the past year alone.”

But OpenAI said The New York Times isn’t “telling the full story.”

Their last communication was on Dec. 19, OpenAI said, which came after a series of negotiations focusing on “a high-valued partnership.”

“We had explained to The New York Times that, like any single source, their content didn’t meaningfully contribute to the training of our existing models and also wouldn’t be sufficiently impactful for future training,” OpenAI said. “Their lawsuit on December 27—which we learned about by reading The New York Times—came as a surprise and disappointment to us.”

The media publication informed OpenAI of regurgitations of content—which the company said it takes seriously—but didn’t provide examples, OpenAI said.

“Interestingly, the regurgitations The New York Times induced appear to be from years-old articles that have proliferated on multiple third-party websites,” OpenAI said. “It seems they intentionally manipulated prompts, often including lengthy excerpts of articles, in order to get our model to regurgitate.

“Even when using such prompts, our models don’t typically behave the way The New York Times insinuates, which suggests they either instructed the model to regurgitate or cherry-picked their examples from many attempts.”

‘Without Merit’

Overall, OpenAI stated that the complaint is “without merit,” but still hopes for a productive partnership with publication.
The company said it respected the newspaper, pointing to it being the first to report on “neural networks” in a July 13, 1958, article titled “Electronic ‘Brain’ Teaches Itself.”

The article describes a Navy computer named Perceptron that was being designed to “perceive, recognize and identify its surroundings without human training or control.”

Now, over 60 years later, AI has become a reality, while simultaneously bringing about ethical dilemmas and legal issues surrounding its use.

According to the BakerHostetler law firm, there has been “a flurry of copyright litigations” since the rise of AI, with 10 lawsuits currently filed and more expected.

“Generative AI raises challenging (and sometimes existential) questions about copyright protection, liability, and enforcement,” the firm said. “Content creators, generative AI developers, and end users are monitoring how these issues play out in courts and trying to adapt their own conduct to minimize risk without unnecessarily forgoing the benefits of this technology.”

In response to The Epoch Times’ request for comment, Ian Crosby, partner with Susman Godfrey and lead counsel for The New York Times, said: “The blog concedes that OpenAI used The Times’s work, along with the work of many others, to build ChatGPT.

“As The Times’s complaint states, ‘Through Microsoft’s Bing Chat (recently rebranded as ‘Copilot’) and OpenAI’s ChatGPT, defendants seek to free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.’ That’s not fair use by any measure.”