


Last week, five of Canada’s most prominent news media outlets launched a lawsuit against OpenAI for copyright infringement, demanding what could amount to billions in damages. The suit follows similar cases brought earlier this year against the creator of ChatGPT by The New York Times and other media companies in the United States.
The claim in these lawsuits is that OpenAI “scraped” large amounts of content from media sites. This involved copying without permission. And the company is making a profit from it without compensating the original creators.
OpenAI has yet to respond to the Canadian lawsuit formally but insists that using news material to train its chatbot is “fair dealing” under copyright law and not an infringement.
A closer look at how chatbots are trained suggests that OpenAI may be right that “scraping” isn’t copying. But it may not be “fair dealing” either.
To be clear, the five media companies — Torstar, Postmedia, The Globe and Mail Inc., The Canadian Press and CBC/Radio-Canada — are also making two further claims.
OpenAI thwarted protective measures the news sites employ to block tools used to scrape their websites, and by doing so, breached the sites’ terms of service.
The news companies bringing the lawsuit rely on tools to “prevent unauthorized scraping of data” from their websites. An example is the Robot Exclusion Protocol, which manages how software like bots and web crawlers can access a site. These tools, along with paywalls and account restrictions, are meant to safeguard against unauthorized uses of their material.
The plaintiffs say that by reading their content online, site visitors accept the terms of use found somewhere in the background, and that since 2015, the terms have made clear that news material is for “personal, non-commercial use of individual users only.”
The crux of all three claims in the Canadian lawsuit is that by using their material — scraping content — OpenAI is copying their work and making unauthorized use of it for profit.
Copyright law in Canada and the U.S. allows for unauthorized copying or use of a protected work in some cases under the fair dealing or fair use exception. Courts consider a series of factors, including the purpose of the copying (commercial or educational), the extent of the copying, and its impact on the original work.
Soon after The New York Times launched its lawsuit, OpenAI argued that training its chatbot on news material found on the web does not involve unlawful copying. It falls under fair use, and they pointed to various legal experts and civil society groups that agree.