Is ChatGPT's use of people’s data even legal?

Emma McGowan 1 Feb 2023

In the world of AI and machine learning, language learning models are a hot topic. They can be used in a variety of applications, but as with any technology, there are also potential drawbacks and concerns.

In the world of AI and machine learning, the sudden massive popularity of language learning models is a hot topic. These tools — the current most popular of which is probably ChatGPT-3 — are language models that can answer specific questions and even generate code. They can be used in a variety of applications, such as chatbots, language translation, and text summarization. However, as with any technology, there are also potential drawbacks and concerns.

Privacy and ChatGPT 

One of the main concerns with these models is privacy, and it can be difficult for people to know if their data has been used to train a machine learning model. GPT-3, for example, is a large language model that has been trained on a vast amount of internet data, including personal websites and social media content. This has led to concerns that the model may use people's data without their permission and that it may be difficult to control or delete data that has been used to train the model.  

Another concern is the issue of "right to be forgotten." As the use of GPT models and other machine learning models becomes more widespread, people may want to have the ability to erase their data from the model.  

“People are furious that data is being used without their permission,” Sadia Afroz, AI researcher with Avast, says. “Sometimes, some people have deleted the data but since the language model has already used them, the data is there forever. They don’t know how to delete the data.” 

Currently, there is no widely accepted method for individuals to request the removal of their data from a machine learning model once it has been used to train the model. Some researchers and companies are working on methods to allow for the removal or "forgetting" of specific data points or user information, but these methods are still in the early stages of development and it's not yet clear how feasible or effective they will be. Plus, there are technical challenges to removing data from machine learning models, as the data may have been used to train the model and removing it may cause the model to lose its accuracy.  

Is ChatGPT legal? 

The legality of using personal data to train machine learning models such as GPT-3 can vary depending on the specific laws and regulations in a given country or region. In the European Union, for example, the General Data Protection Regulation (GDPR) regulates the use of personal data and requires that data be collected and used only for specific, lawful purposes.  

“GDPR is so much around purpose restriction,” Afroz says. “So you must use the data for the purpose you collected it for. If you want to use it for something else, you have to get permission. But language models are the opposite of that—the data can be used for any purpose. How can GDPR enforce this restriction?” 

Under GDPR, organizations are required to obtain explicit consent from individuals before collecting and using their personal data. There is a legal basis for processing personal data for scientific and historical research, but the controller must comply with GDPR's principles and rights, such as the right to be informed, right of access, right to rectification, right to erasure, right to object and right to data portability. It would appear, then, that the language learning models don’t comply with GDPR, which could become a major barrier to growth in the future. 

In the United States, there is no federal law that specifically regulates the use of personal data to train machine learning models. However, organizations are generally required to comply with laws such as the Health Insurance Portability and Accountability Act (HIPAA) and the Children's Online Privacy Protection Act (COPPA) if they collect and use personal data from individuals in certain sensitive categories. And in California — where the majority of big tech companies are located—companies are required to follow the California Consumer Privacy Act (CCPA), which has similar privacy requirements to GDPR. 

With all that said, the development of AI models such as GPT-3, is a constantly evolving field. As such, laws and regulations surrounding the use of personal data in AI are likely to change in the future, making it important to stay updated on the latest legal developments in this area. 

Is ChatGPT accurate? 

Another big concern about GPT models is misinformation and the lack of verification. It’s been widely reported that many language learning AIs  present information confidently but inaccurately. That lack of fact-checking could potentially increase the spread of false information, which is especially dangerous in sensitive areas like news and politics. Google, for example, is planning to use large language learning models to better serve customers, but it's not yet clear how they’ll handle the fact-checking element. 

While large language learning models have the potential to revolutionize the way we interact with technology and automate certain tasks, it's important to also consider the potential drawbacks and concerns. As the use of these models becomes more widespread, it's crucial to address the privacy concerns and find solutions for the "right to be forgotten" issue.

--> -->