Though-experiment: ai steals content indirectly

by: Artur Dziedziczak

I recently started to think about current sources of data that are given to AI and the claim that "AI like GPT is not producing samples of data it was learning on but instead creates new content based on context".

So let’s start the experiment. Some person A scraps websites like IMDb for movie reviews and later feeds it to his AI. Next, he defines the output of AI. Basically, AI should output new reviews with the context of previously learned movies. Context is defined as a positive or negative review. So when you ask this AI to generate a review of Scott Pilgrim vs The World it would generate content with text that is completely different than all reviews written in IMDb but the context of those reviews is remembered. It’s important that this context is limited to the data sources.

So this AI is capable of generating all reviews for all IMDb movies but reviews are each time different. The thing is you ask your AI to make a review based on some parameter. Let’s say the overall rating of the movie. AI is aware of this rating and it always generates positive reviews for Scott Pilgrim vs The World.

Should it be right for person A to do it? It does not repeat content with "samples" but it repeats the context of the data. It repeats the general opinion of people which is the intellectual content of IMDb.