“We hereby find the defendant, John Doe, guilty of copyright infringement.” Those are words that made Doe’s heart sink. Doe is a journalist who used artificial intelligence, or AI, to help write news articles for his company. Unfortunately for Doe, AI took information from a local newspaper and directly copied them for Doe’s stories, thus committing copyright infringement. This story, although hypothetical, could become the reality for some journalists who begin using AI to generate their news. It begs the question of who is legally responsible for the actions of AI-produced content?
AI writing applications like ChatGPT, developed by OpenAI, use prompts and data to generate articles and cohesive paragraphs. Some news publications are using AI in the newsroom in order to help with data gathering, generating content, transcribing, etc. The U.S. Copyright Office is looking into how to handle distributing copyright protections to materials that used AI, and possible output infringement resulting from AI-generated material.
The Associated Press, or AP, was one of the first to incorporate AI into the newsroom to automate its core news report, according to the AP website. They created initiatives, like Local News AI, to test AI readiness in over 200 newsrooms, according to their website.
“AP also launched a Local News AI initiative, funded by the Knight Foundation, to help U.S. local newsrooms identify and adopt artificial intelligence-based solutions,” Nicole Meir wrote, the media relations manager for AP, via email.
As AI begins expanding to other newsrooms, there could be some potential for copyright infringement if AI produced content from copyrighted material, said Ronald Coleman, a partner at Dhillon Law Group, who is based in Newark, New Jersey. Coleman has been practicing law for 30 years and specializes in intellectual property rights.
“A copyright is a creative work that is reduced to some sort of tangible expression … there does have to be what the [U.S.] Copyright Office calls the ‘modicum of creativity,’” Coleman said.
The modicum of creativity is not a high bar, said Andrew Foglia, deputy director of policy and international affairs at the U.S. Copyright Office. For example, if someone writes a poem full of cliches, it could still be copyright protected, Foglia said.
“A classic example of something that doesn’t have the required level of creativity would be a phone book,” Foglia said.
Another requirement for getting a work copyright protected is there has to be human authorship, Coleman said. While the requirement for a human author is still true, the U.S. Copyright Office announced on March 16 that it is creating a new AI initiative. This initiative will outline the scope of copyright using AI-generated tools, and whether to allow the use of copyright materials for AI training, according to the U.S. Copyright Office’s website. The initiative outlines how to submit work to disclose the inclusion of AI tools through their new registration guidance, Foglia said.
“The Copyright Office issued registration guidance, a policy statement explaining how copyright law applies to work generated, using artificial intelligence and trying to explain to applicants how they can register their works and what they should disclose,” Foglia said. “Our registration guidance says that if you are registering a work that includes AI-generated content, you should notify the office about that in your claim. You should disclaim the generated content, and you should disclose it to us.”
Among Americans surveyed about their thoughts on AI-generated articles, only 16% considered it a major advance and 28% considered it a minor advance, according to the Pew Research Center. Although this might not be a huge development for every American, news organizations, such as the AP, use AI in the newsroom to manage information flows, produce audio-to-text transcriptions, help with text-to-text translations, write stories based on data sets, localize content and more, according to Meir’s email.
“AP started automatically generating corporate earnings stories in 2014 and has added automated game previews and recaps of some sports,” Meir wrote in the email.
Other uses for AI in the newsroom include data gathering, automated recommendations, commercial uses such as pay models, etc., according to Statista. The survey of 233 individuals from 33 different countries were asked to rate five different categories of AI used in the newsroom by “very important,” “somewhat important” and “not important.” Auto-recommendations ranked the highest with 40% saying it was very important.
“Additionally, utilizing AI for commercial uses such as targeting potential subscribers and optimizing paywalls was considered necessary for future business operations, and there was also a push for automation in the newsroom; for example the use of AI for assisted subbing or tagging,” the Statista survey reads.
Facts in of themselves are not copyrightable, only the original expression of facts can potentially be copyright protected, Foglia said.
Coleman said certain criteria must be met before an individual can be accused of copyright infringement, and includes the requirement for the material to be registered with the U.S. Copyright Office. This also means not every written work or published piece is subjected to copyright laws, so the creator may not be able to sue for copyright infringement. If someone is a copyright owner, they have full authority of how that work is used, Coleman added.
“[The] copyright owner has the exclusive right to reproduce, distribute, perform, publicly display or make a derivative of work from his own creative work, and anyone who does any of those things without permission may be infringing the copyright,” Coleman said.
However, if the information is not copyright protected, the user of the information can’t be held liable for copyright infringement, and thus there are some ways AI can use outside information without infringing.
“Every AI-chat application is given a set of instructions and presumably those instructions will include, ‘don’t just go to New York Times and copy and paste,’” Coleman said. “It’s my understanding that AI chat will presumably go to public domain materials.”
This would include material that is on a public forum and not owned by individuals. AI relies on prompts, data and program coding in order to produce content, and it could produce original work depending on how advanced and trained the program was, said Theodora Chaspari, assistant professor of computer science and engineering at Texas A&M. These algorithms can perform well and produce original work depending on how they are trained, Chaspari said, but AI can also be unreliable if they haven’t been programmed and trained thoroughly.
“This is new technology, so the reliability may not have been truly examined. It’s important to understand how the content is generated,” Chaspari said.
AI-generated content is done using the Language Models of AI, or LM, said Ruihong Huang, Ph.D., associate professor of computer science and engineering at A&M.
“Those large language models will train with lots of texts, including texts from different sources like Wikipedia documents and news articles,” Huang said.
After AI takes the information from the internet, there is some filtering and cleaning that takes place to ensure there’s no low-quality data, Huang said.
When it comes to AI and copyright infringement, Foglia said they tend to talk about three different categories: authorship, training and output infringement. The first is authorship and whether work generated using AI could be copyrightable.
“So copyright law has a human authorship requirement,” Foglia said. “This is rooted in the Constitution and in the text of the Copyright Act.”
Second, is regarding training and whether a programmer needs permission to train AI using copyrighted material.
“When you train an AI, it generally ingests an awful lot of data,” Foglia said. “Oftentimes, that data involves books or movies or images, and sometimes those are under copyright. So the ingestion question has to do with what rules apply to that circumstance.”
Lastly, the U.S. Copyright Office looks at output infringement, Foglia said. If an individual publishes a work that looks similar to an existing copyrighted work, is that liability?
Ultimately, it’s the journalists’ responsibility to ensure they are not publishing somebody else’s copyrighted material, Coleman said, and they will be the ones held liable for copyright infringement.
“If your AI-generated article infringes, that’s going to be your problem,” Coleman said. “So, as a reporter I would, and as an editor, I would want to make sure my reporter did this. I would run down the quotes, run down the sentences and see where they came from.”
There is a common defense against copyright infringement known as Fair Use, which states that the copyrighted content can be used in limited amounts for the purpose of news reporting, criticism, scholarly reports, etc., Coleman said. However, there are still limitations to how journalists can use copyrighted material.
“If the AI’s output were itself infringing some other work, then the journalist might not be free to use the output — because she might be infringing the same rights that the AI infringed,” Foglia wrote in a follow-up email.
As far as the future of copyright protection and infringement surrounding AI-produced content, Foglia said they are still having conversations about the extent to allow materials using AI-generated tools to be copyright protected.
“Some of the questions about liability for the output of AI technologies are still sort of in development,” Foglia said. “Those are questions we’re hoping to explore as part of the Copyright Office AI initiative.”
Kenzie Finch is a journalism senior and contributed this article from the course JOUR 490, Journalism as a Profession, to The Battalion.