In a notable development in the copyright lawsuit involving OpenAI and the New York Times, concerns have been raised regarding the preservation of evidence related to artificial intelligence training data. The New York Times accused OpenAI of deleting portions of the ChatGPT training data that were linked to content suspected of copyright infringement. This accusation emerged after lawyers from the New York Times and the Daily News reportedly spent over 150 hours searching for evidence within the dataset provided by OpenAI.
A letter submitted to the Southern District Court of New York on November 14 indicated that the company’s engineers had deleted data from one of two virtual machines provided by OpenAI. OpenAI characterized this incident as an unintentional technical error; however, the New York Times and other plaintiffs contended that this deletion severely hampered their ability to gather evidence.
Initially, OpenAI had agreed to provide access for the plaintiffs to examine the ChatGPT training data in a secure environment, which was an essential step for publishers attempting to prove unauthorized use of their content in AI model training. Unfortunately, the deletion of data led to a loss of critical information for the lawyers involved.
Recent updates suggest that while the deleted data has been restored, it is no longer in a legally usable format, complicating the New York Times’s efforts to present evidence in court. This incident raises important questions about OpenAI’s capability and commitment to maintaining data integrity, particularly in the context of copyright infringement cases.
Reactions from various stakeholders highlight the seriousness of the situation. The New York Times has expressed that the data deletion significantly disrupted their inspection process and could indicate a lack of transparency. On the other hand, OpenAI maintains that the incident was merely a technical issue, unrelated to any intention to interfere with evidence gathering.
This lawsuit could set significant precedents regarding how AI companies manage data and adhere to legal obligations moving forward. As a response, publishers are considering further measures to safeguard their interests, while OpenAI may face increasing pressure to enhance its data storage and auditing practices to prevent similar occurrences in the future.