Skip navigation

How is OpenAI Faring in the Delhi High Court Against ANI? A Case Analysis

Jun. 26, 2025   •   Suraj

Student's Pen  

In what may be the first of its kind lawsuit in India, Asian News International (ANI) sued OpenAI for using its data to train Large Language Models (LLMs) without permission or licensing.

The copyright infringement suit alleged that OpenAI had scraped the publicly available, but copyrighted, data of the News Agency despite turning down their offer of a licensing agreement, similar to those with publishers like News Corp and The Guardian. Aggrieved, ANI decided to knock on the doors of the Hon’ble Delhi High Court in November 2024.

By the order of 19th November 2024, the Court formed undermentioned preliminary issues[1]:

I. Does storage of news by the OpenAI (defendant), which ANI (plaintiff) claims to be protected under the Copyright Act, 1957, for training its software, i.e., ChatGPT, amount to infringement of the plaintiff’s copyright.

II. Whether using the stored data to generate responses for its users amounts to infringement of the plaintiff’s copyright.

III. Can the defendant claim the defence of ‘fair use’ given under Section 52[2] of the Copyright Act, 1957 (Act).

IV. Do the Courts in India have jurisdiction to entertain a case of OpenAI which has servers in the United States of America.

(Issues paraphrased)

Under the Act, the news content when presented with originality and expressiveness—such as editorials, investigative articles, or unique reporting—can be protected under Section 13(1)(a), which grants copyright to original literary works. ChatGPT’s verbatim or substantially similar reproduction of ANI’s content, and work of ANI that comes up when a user searches for it in the chatbot, fulfils the infringement conditions.

To better understand the technicalities, the High Court appointed two experts or amici curiae and asked them to submit reports. In January 2025, both the experts, Dr. Arul George Scaria, a Professor at NLSIU Bengaluru, and Adarsh Ramanujan, an Expert Litigator, submitted their reports, the summary of which is given hereinunder.

REPORTS BY AMICI CURIAE

Adarsh Ramanujan’s Report

Adarsh Ramanujan concluded that storing of copyrighted material amounts to infringement, as the process of collecting public data inherently involves reproduction under Section 14(a)(i) of the Copyright Act, 1957. Mentioning the detailed process, he asserted that although there is a stage of non-expressive use during the training of LLMs—specifically during tokenisation and vectorisation—the preceding and subsequent stages involve expressive use, which still qualify as infringement.

Ramanujan laid out a detailed framework of the AI development process:

(1) data collection

(2) tokenisation and vectorization, and

(3) model training

Tokenization and Vectorization

In brief, tokenization is the process where ChatGPT breaks down collected text into smaller units called tokens. This data is then assigned a numerical value called vector and the process is called Vectorization. The model learns and generates text because ChatGPT ultimately processes the language in mathematical form and reproduces it in the form of a probability relation.

OpenAI’s use of facts from ANI’s content is contended to be non-expressive use of the data which is not covered under the Copyright Act. ANI however maintains that their creative expression, along with the data, is being reproduced verbatim. Click here to understand the complete concept of tokenizatoin and vectorization.

Dr. Arul George Scaria’s Report

Dr. Arul George Scaria offered a view, emphasizing that the training process of LLMs involves primarily non-expressive use of copyrighted material, which does not constitute infringement. He acknowledged that there might be occasional instances of expressive use with significant similarity, but insisted these are rare. He further submitted that temporary or permanent storage of copyrighted content for the purpose of machine learning is permissible, as long as it is not for public dissemination or expressive reproduction.

Scaria also addressed the issue of fair use under Section 52. He emphasized that the development of LLMs is practically flawed without access to copyrighted material, suggesting that overly strict copyright enforcement could affect technological progress.

WHERE DO BOTH PARTIES STAND

Defendants have over two dozen similar litigations pending across the globe. Earlier in March 2025, in a similar case, a U.S. Court ruled in favour of Thomson Reuters (Westlaw)[3], holding Ross Intelligence (AI) accountable of copyright violation for ai training. This case may serve as a reference point for our case in question.

In India, ChatGPT has a big market. Millions of dollars are at stake in the case itself, excluding billions of dollars in market value. OpenAI and other startups have resorted to very compelling arguments to present their case.

Facts And News Have “Thin Copyright”

Representing defendants, Mr. Amit Sibal argued that ANI’s content used in search results did not amount to infringement because using publicly available data for non-expressive purposes, such as training AI models, does not infringe copyright. News that is freely available cannot be copyrighted in such a way. He said that news content enjoys only a “thin copyright.” Discovering or reporting a fact does not give someone a monopoly over that fact. Even if facts are similar in two write-ups, that alone doesn't mean infringement has occurred.

Non-Expressive Use of Data

OpenAI claimed that it used ANI’s data only in a non-expressive manner, meaning the original expression was never directly reproduced. Using tokenisation and vectorisation, the data is broken into numerical codes. This process is purely mathematical and does not copy any original content—hence, it is not copyright infringement.

The Cost of Innovation

On the issue of false attribution, Sibal suggested that misleading AI responses may result from manipulated queries. He added that OpenAI has removed such misattributions as they come to light.

Mr Sibal argued that forcing AI companies to pay for using public online data would make operations costly and slow innovation. These costs would eventually burden the users. It was also argued that limiting access to public data would lower the quality of AI models, as their learning base would shrink. Echoing the concerns of other small and novel startups, it was argued that small Indian companies would not survive if they had to pay licensing fees. This could hinder India’s ability to compete with international AI firms.

The plaintiffs have, similarly, raised some crucial points which the Court needs to ponder over. Additionally, on 17th February 2025, major industry players such as T-Series, Saregama, and Sony intervened, raising concerns over the unauthorised use of copyrighted music and recordings in AI training. While the Court refused to broaden the scope of the case, it is certain that the final judgement of this case will affect the future of AI litigation, investment, and innovation in the country.

Unauthorized Use of ANI’s Content

Representing the Plaintiffs, Advocate Sidhant Kumar, argued that OpenAI violated copyright law by using ANI’s news content to train its AI models without permission. He said that the availability of content online does not mean it can be freely used without consent.

Lack of Human Creativity in AI Output

Mr. Kumar asserted that the output generated by ChatGPT lacks originality, as there’s no human involvement, no skill, judgment, or purpose-driven rearrangement. Since it is a purely automated process, the AI-generated content cannot be considered original or protected by copyright. Citing the Eastern Book Company v. D.B. Modak case[4], he said that any adaptation without independent originality would infringe the original author's rights

Expression Is Protected, Not Facts

Countering the tokenization and lack of expressive work claim of the defendants, the Counsel for the plaintiffs explained that AI doesn’t just process facts, it studies the expression in the form of language and words. OpenAI’s tokenisation and vectorisation processes are also adaptations of ANI’s content. He argued that these processes lack the creative input needed to be considered derivative works under copyright law. Citing R.G. Anand v. Delux Films[5], he clarified that while facts cannot be copyrighted, their expression can. ANI’s journalists and editors create unique expressions of facts, and that expressive narration is protected. To emphasize protection granted to journalists, Mr. Kumar referenced the UK case Walter v. Lane, where shorthand notes of reporters were granted copyright protection.

WHAT DOES THE FUTURE HOLD

As mentioned earlier, this is the first-of-its-kind case in India. AI has intermingled with every sphere of life. What we are seeing in this case is only a fraction of the technology that needs to be regulated. News data being used to train LLMs is not the whole picture, but it will pave the way for future litigations on AI & Music, AI & Art, AI & Literature, and much more.

What makes this case important, apart from being the first, is the huge responsibility it carries to set things right. As of now, ChatGPT is providing the majority of its services at minimal to no cost. This has been made possible thanks to billions of dollars raised by the board to support the costs and operations of the business. If reports are to be believed, the startup, founded in 2015, plans to go the for-profit route down the line. The result of this shift would be increased costs and burdens on the entire AI industry.

How should the judiciary and policymakers tread this path? Would be interested in knowing your thoughts.

Here’s the case name and citation: ANI MEDIA PVT LTD v. OPEN AI INC & ANR. CS(COMM) 1028/2024, Delhi High Court.


REFERENCES

[1] ANI MEDIA PVT LTD v. OPEN AI INC & ANR. CS(COMM) 1028/2024
[2] Section 52: Certain acts not to be infringement of copyright

(1) The following acts shall not constitute an infringement of copyright, namely:

[(a) a fair dealing with any work, not being a computer programme, for the purpose of--

(i) private or personal use, including research;

(ii) criticism or review, whether of that work or of any other work;

(iii) the reporting of current events and current affairs, including the reporting of a lecture delivered in public;

Explanation.-- The storing of any work in any electronic medium for the purposes mentioned in this clause, including the incidental storage of any computer programme which is not itself an infringing copy for the said purposes, shall not constitute infringement of copyright.]
[3] THOMSON REUTERS ENTERPRISE CENTRE GMBH and WEST PUBLISHING CORP., Plaintiffs, v. ROSS INTELLIGENCE INC
[4] Civil Appeal No. 6472 of 2004
[5] 1978 AIR 1613


Liked the article ?
Share this: