OPRR Flash Depth Activities

Search Extension

ChatGPT needs compliance brakes

2023-02-06 22:00

Read this article in 22 Minutes

AI summary

View the summary

原文标题：《热度「狂飙」的 ChatGPT，亟待「合规刹车」》

Original article author: Xiao Sa's legal team

Key tips:

In the short term, ChatGPT and other chat AI based on natural language processing technology need to solve three major legal compliance problems:

First, the intellectual property rights of the replies provided by chat AI, among which the most important compliance problem is whether the replies produced by chat AI generate corresponding intellectual property rights? Is intellectual property licensing required?

Second, does the chat AI need to obtain the corresponding intellectual property rights authorization in the process of data mining and training the huge amount of natural language processing text (commonly known as corpus)?

Third, ChatGPT and other chat AI answer is that one of the mechanisms is to obtain a language model based on statistics through mathematical statistics on a large number of existing natural language texts. This mechanism leads to chat AI is likely to "talk nonsense in a serious manner", thus leading to the legal risk of spreading false information. Under this technical background, How to minimize the risk of false information spread in chat AI?

In general, our legislation on artificial intelligence is still in the pre-research stage, there is no formal legislative plan or related draft motion, relevant departments are particularly cautious in the field of artificial intelligence supervision, with the gradual development of artificial intelligence, the corresponding legal compliance problems will only increase.

1. ChatGPT is not a "generational AI technology"

ChatGPT is essentially a product of developments in natural language processing, and is still essentially just a language model.

At the beginning of 2023, global tech giant Microsoft invested heavily in ChatGPT, making it a "top stream" in the tech space. With the surge of ChatGPT concept in the capital market, many domestic technology companies have also begun to layout this field. While ChatGPT concept is hot in the capital market, as legal workers, we can not help but assess what legal security risks ChatGPT itself may bring, and what legal compliance path?

Before discussing ChatGPT's legal risks and path to compliance, we should first look at ChatGPT's technical rationale -- does ChatGPT, as the news suggests, give the questioner whatever question he or she wants?

ChatGPT, in the eyes of Sa's team, is far less amazing than some of the news claims. In short, ChatGPT is just an integration of natural language processing technologies such as Transformer and GPT. In essence, Chatgpt is still a language model based on neural networks, rather than an "advance of The Times".

As mentioned above, ChatGPT is the product of the development of natural language processing technology. In terms of the development history of this technology, it has roughly experienced three stages: language model based on grammar, language model based on statistics and language model based on neural network. The stage ChatGPT is in is the stage of language model based on neural network. To get a more straightforward understanding of how ChatGPT works and the legal risks it may pose, it is necessary to first illustrate how the statistical language model, the forerunner of neural network-based language models, works.

In the stage of language model based on statistics, AI engineers determine the probability of linking words successively by making statistics on a large number of natural language texts. When people propose a question, AI starts to analyze which words are highly likely to be combined under the language environment composed by the words that constitute the question, and then splices these words together with high probability. Return a statistically based answer. It can be said that this principle has been running through the development of natural language processing technology since its appearance. Even, in a sense, the language model based on neural network is also a revision of the language model based on statistics.

To take an easy to understand example, Sa's team typed the question "What are the tourist resorts in Dalian?" in the ChatGPT box. As shown in the picture below:

The first step of AI will be to analyze the basic morpheme "Dalian, which, tourism, resort" in the question, then find the natural language text set where these morphemes are located in the existing corpus, search for the collocation with the highest probability of occurrence in this set, and then combine these collocation to form the final answer. For example, AI will find the word "Zhongshan Park" in the corpus with high probability of "Dalian, tourism and resort", so it will return to "Zhongshan Park". For example, the collocation probability of the word "park" with garden, lake, fountain, statue and other words is the largest, so it will further return to "This is a park with a long history, There are beautiful gardens, lakes, fountains and statues."

In other words, the whole process is based on probability statistics based on the natural language text information (corpus) behind the AI, so the answers returned are "statistical results", which leads to ChatGPT's "serious nonsense" on many questions. As for the answer to the question "What tourist attractions are there in Dalian?", although there is Zhongshan Park in Dalian, there is no lake, fountain or statue in Zhongshan Park. Dalian does have a "Stalin Square" in its history, but Stalin Square was never a commercial square, nor did it have any shopping centers, restaurants or entertainment venues. Obviously, the information returned by ChatGPT is false.

Second, ChatGPT is the most suitable application scenario as a language model

Although the disadvantages of the statistics-based language model have been explicitly explained in the last part, ChatGPT is, after all, a language model based on neural network which has been greatly improved from the statistics-based language model. Transformer and GPT, its technical basis, are both the latest generation language models. ChatGPT is essentially a very deep modeling of natural language by combining massive amounts of data with a highly expressive Transformer model. The return statements are sometimes "nonsense", but at first glance they look like "human replies". Therefore, this technology has a wide range of applications in the scenario requiring massive human-computer interaction.

As it stands, there are three scenarios:

First, search engines;

Second, the human-computer interaction mechanism in banks, law firms, various intermediary institutions, shopping malls, hospitals and government affairs service platforms, such as the customer complaint system, guidance navigation and government affairs consultation system in the above places;

Third, the interaction mechanism of smart cars and smart homes (such as smart speakers and smart lights).

Search engines combined with AI chat technologies such as ChatGPT are likely to present a traditional search engine as the main plus neural network-based language model as the auxiliary approach. At present, traditional search giants such as Google and Baidu have accumulated profound accumulation of language model technology based on neural network. For example, Google has Sparrow and Lamda, which are comparable to ChatGPT. With the support of these language models, search engines will be more "humanized".

The application of ChatGPT and other AI chat technologies in the customer complaint system, the guidance and navigation of hospitals and shopping malls, as well as the government affairs consultation system of government agencies will greatly reduce the cost of human resources of relevant units and save the communication time. However, the problem lies in that statistically based answers may produce completely wrong replies. The resulting risk control risks may need further assessment.

Compared with the above two application scenarios, the legal risks of ChatGPT being used as the human-computer interaction mechanism of the above devices in the fields of intelligent automobile and intelligent home are much smaller, because the application environment of such fields is more private, and the wrong content of AI feedback will not cause big legal risks. Meanwhile, the accuracy of content in such scenarios is not high. The business model is more mature.

Iii. Discussion on legal risks and compliance paths of ChatGPT

First, the overall supervision picture of artificial intelligence in our country

Like many emerging technologies, the natural language processing technology represented by ChatGPT also faces the "Collingridge dilemma", which contains the information dilemma and the control dilemma. The so-called information dilemma is that the social consequences brought by an emerging technology cannot be predicted at the early stage of the technology. The so-called control dilemma means that when the adverse social consequences brought by an emerging technology are discovered, the technology has often become a part of the whole social and economic structure, resulting in the adverse social consequences cannot be effectively controlled.

At present, the field of artificial intelligence, especially natural language processing technology, is in a rapid development stage, which is likely to fall into the so-called "Collingridge dilemma", and the corresponding legal regulation does not seem to "keep pace". Our country currently has no national legislation on artificial intelligence industry, but there are some local legislative attempts. In September last year, Shenzhen City issued a special legislation on artificial intelligence industry, "Regulations of Shenzhen Special Economic Zone on Promoting Artificial Intelligence Industry", followed by Shanghai's adoption of "Regulations of Shanghai on Promoting the Development of Artificial Intelligence Industry". It is believed that similar legislation on artificial intelligence industry will be introduced in all regions soon.

In terms of the ethical regulation of artificial intelligence, the National Professional Committee for the Governance of the New Generation of Artificial Intelligence also published the Code of Ethics for the New Generation of Artificial Intelligence in 2021, proposing to integrate ethics into the full life cycle of AI research and development and application. Perhaps in the near future, The "Three laws of robotics" in Asimov's novel will become the iron laws of AI regulation.

Second, the legal risk of false information brought by ChatGPT

We should shift our focus from macro to micro, set aside the overall regulatory picture of the AI industry and the ethical regulation of AI, and pay attention to the actual compliance problems existing in the basis of AI chat such as ChatGPT.

The tricky part is the false information that ChatGPT responds to. As mentioned in the second part of this article, the way ChatGPT works makes it possible for its replies to be "straight nonsense." This false information, which seems to be true but is actually outrageous, is highly misleading. Of course, a false reply to a question like "What are the tourist attractions in Dalian?" may not have serious consequences, but if ChatGPT is used in search engines, customer complaint systems, etc., the false information it responds to can create extremely serious legal risks.

In fact, such legal risks have already emerged. Galactica, a language model for Meta services research that was launched in November 2022 around the same time as ChatGPT, was taken offline by users after only three days of testing due to mixed answers. If ChatGPT and similar language models are applied to search engines, customer complaint systems and other fields, they must be reformed for compliance under the premise that technical principles cannot be broken through in a short time. When it is detected that users may ask professional questions, users should be guided to consult the appropriate professionals instead of looking for answers from AI. Meanwhile, users should be significantly reminded that the authenticity of questions returned by chat AI may need further verification to minimize the corresponding compliance risks.

Third, the intellectual property compliance issues raised by ChatGPT

When turning from macro to micro, apart from the authenticity of AI reply messages, the intellectual property of chat AI, especially large language models like ChatGPT, should also be noticed by compliance personnel.

The first compliance problem is whether "text data mining" requires corresponding intellectual property rights licensing. As mentioned above, ChatGPT's working principle relies on a huge amount of natural language text (or speech corpus). ChatGPT needs to mine and train the data in the corpus and copy the content in the corpus into its own database. The corresponding behavior is often referred to as "text data mining" in natural language processing. When the corresponding text data may constitute a work, whether the text data mining behavior infringes the right to copy is still controversial.

In the field of comparative law, both Japan and the European Union have expanded the scope of fair use in their copyright legislation, adding "text data mining" in AI as a new fair use case. Although some scholars advocated that our country's reasonable use system from "closed" to "open" in the course of the revision of copyright law in 2020, this proposition was not adopted. At present, our copyright law still maintains the closed regulation of reasonable use system. Only the situations in 24 of copyright law can be identified as reasonable use. In other words, the current copyright law does not include the "text data mining" in AI into the scope of reasonable application, and text data mining still needs corresponding intellectual property rights authorization in China.

The second compliance conundrum is whether the responses generated by ChatGPT are original. When asked whether an AI-generated work was original, Sa's team believed that its criteria should not be different from the existing criteria. In other words, whether an answer was made by AI or humans, SA should be judged according to the existing criteria for originality. In fact, behind this question is another more controversial question, if AI produces original replies, then can the copyright owner be AI? Obviously, under the intellectual property laws of most countries, including China, the author of the work can only be a natural person, and AI cannot be the author of the work.

Finally, if ChatGPT spliced third party works into its reply, what about the intellectual property issues? Sa's team believed that if ChatGPT's reply splices copyrighted works in the corpus (although this is unlikely to happen according to ChatGPT's working principle), then according to China's current copyright law, unless it constitutes fair use, copyright owner's authorization is not required before copying.