TL;DR: The New York Times's legal resistance against AI firms represents a defining battle over intellectual property and training data rights. This conflict forces global content creators to re-evaluate how they license their work and protect their archives from unauthorised AI scraping.

The collision between generative artificial intelligence and high-quality journalism has escalated from a technical debate into a restructuring of the media industry. Legacy publishers are asserting control over their proprietary data, setting boundaries for how tech companies use original reporting. See our Full Guide to understand how these dynamics alter digital publishing. In his address at the WAN-IFRA World News Media Congress, New York Times Publisher A.G. Sulzberger warned that AI companies are strip-mining news websites without permission. The rapid expansion of models from OpenAI, Anthropic, Google, and Meta threatens to dry up the economic foundations of original, first-hand reporting.

Why is The New York Times suing OpenAI and Microsoft?

The New York Times filed a copyright infringement lawsuit against OpenAI and Microsoft to prevent these technology companies from training artificial intelligence models on its copyrighted journalism without authorization or financial compensation.

The publisher argues that AI developers built their commercial products by copying millions of Times articles. By training models like GPT-4 on this data, tech companies create synthetic competitors that answer user queries using information gathered by Times journalists. This practice siphons off direct traffic and subscription revenue. Sulzberger described this practice as a brazen theft of intellectual property. Creative industries globally employ over 50 million people and generate $12 trillion in annual economic value. The Times seeks to establish a legal precedent that protects these creators. In 2026, the outcome of this litigation will likely dictate whether AI developers must purchase licenses for all copyrighted training inputs. This lawsuit rejects the concept of "fair use" for commercial AI training, forcing a major realignment in how tech companies source their training data. Furthermore, the lawsuit highlights the risk of "hallucinations," where AI models output false information and attribute it to The New York Times, damaging the publisher's brand reputation.

Tech companies are siphoning traffic through synthetic search summaries

Generative search engines reduce traffic to publisher websites by summarizing original reporting directly on search result pages, bypassing the need for users to click source links.

Search engines now use large language models to generate direct answers to complex queries. Instead of directing searchers to the journalists who uncovered the facts, these tools present the information as their own. This dynamic eliminates the traditional referral traffic that sustains digital ad-supported business models. A.G. Sulzberger warned that this siphoning of audiences threatens the future of original investigative journalism. Gathering news requires significant capital and physical risk. When search engines stop sending users to publisher websites, the financial foundation that funds original investigative reporting collapses. Creators cannot compete with synthetic systems that repackage stolen goods. To counter this, publishers are forming alliances to demand transparent revenue-sharing agreements and blocking web crawlers to protect their intellectual property. Industry data shows that zero-click searches, where users find answers directly on the search page, already account for over 50% of web searches, a figure expected to rise as search platforms deploy more advanced conversational agents.

How can content creators protect their intellectual property from AI training?

Content creators protect their intellectual property by updating their website robots.txt files to block AI web crawlers, pursuing licensing agreements, and joining legal coalitions to enforce copyright laws.

Blocking crawlers like GPTBot or Google-Extended stops AI companies from scraping current content. However, this does not address the historical archives already ingested during initial model training. Publishers are therefore negotiating direct licensing deals. Companies like Axel Springer and the Associated Press have signed multi-million-dollar agreements to license their archives to OpenAI. These deals establish a dual-revenue stream where publishers license historical content for training while allowing limited real-time retrieval. Smaller creators must document their copyrighted materials and monitor LLM outputs for direct plagiarism. Joining industry trade groups helps independent publishers gain leverage in collective licensing negotiations with tech conglomerates, ensuring they do not get left behind as AI consumption scales up in 2026. B2B content creators should also update their Terms of Service to explicitly forbid scraping for AI training, providing a clear legal basis for future cease-and-desist actions.

Key Takeaways

  • The New York Times's lawsuit rejects "fair use" defences for AI training, aiming to establish licensing as the mandatory legal standard.
  • Generative search summaries threaten the publisher business model by siphoning search traffic through zero-click answers.
  • Content creators must actively deploy technical blocks like robots.txt and update terms of service to safeguard intellectual property.