Source: Negentropy Capital
In the past year, the cryptocurrency market has been in the shadow of a bear market, and it seems that nothing is enough to ignite the market's passion. But recently, the AI text generator ChatGPT has emerged, and its user base has exceeded 1 million in just five days. Tens of millions of users are eager to experience it, instantly igniting the global market's enthusiasm for AI technology.
However, even with unprecedented prosperity, general AI focused on natural language processing, represented by ChatGPT, still faces a serious shortage of training datasets.
It is reported that the prototype of ChatGPT, GPT-3, mainly uses five datasets, which are:
1、Common Crawl: A dataset containing over 45 TB of web page text, accounting for 60% of GPT-3 training data.
2、WebText2: A text dataset extracted from links on Reddit, accounting for 22% of GPT-3 training data.
3、Books1: A dataset of books collected from Project Gutenberg and other sources, accounting for 12% of GPT-3's training data.
4、Books2: A dataset of books sourced from the internet, accounting for 3% of the GPT-3 training data.
5、Wikipedia: A dataset containing all English Wikipedia articles, accounting for 3% of GPT-3 training data.
Even though these five datasets already contain rich content, they still contain too much noise, errors, duplicates, irrelevant or inappropriate content (especially Common Crawl, which provides 60% of the content and has low quality). These contents are also severely downweighted because they are not conducive to the learning and improvement of GPT.
However, there is currently no high-quality dataset of the same scale available for large language models to use. This has greatly hindered the actual development of ChatGPT.
MetaGPT's vision is to become the foundational infrastructure for AI training in the 0-1 segment using Web3 incentive models to address the aforementioned issues.
MetaGPT combines AI with Web3 in a clever way, and its innovative Train to Earn model incentivizes users to participate in the training and optimization of AI models in the Web3 industry. MetaGPT is about to launch Kryptal, an AI intelligent robot dedicated to using a more convenient and user-friendly question-and-answer approach to help users entering the Web3 field obtain the latest guidance on the encrypted industry knowledge base.
According to Cointime, on February 10th, 2023, Web3 investment firm Negentropy Capital officially announced a $2 million investment in MetaGPT.AI.
It is reported that MetaGPT is also about to release its latest artificial intelligence product, Kryptal, which provides vertical industry knowledge base training robots for the Web3 industry.
As mentioned earlier, there is a problem with the lack of high-quality datasets for AI training.
MetaGPT allows anyone to participate in providing feedback and adjusting parts of the unsupervised language model, and receive incentives through Train to Earn, in order to establish a more user-friendly general AI. The specific process is as follows:
These texts introduce what the task is, such as "write a poem" or "give me a joke", etc.
Then, the manual annotator provides the expected output for each prompt text, as well as a rating from 0 to 5 indicating the quality of the output.
In this way, a supervised dataset is obtained for training GPT-3 to generate outputs that better meet user expectations.
Design a feedback dataset that contains a large amount of user input, such as "give me a joke" or "give me a pie chart".
Then, the human feedback provider gives the best output for each user input, along with a rating from 0 to 1 indicating the rationality of the output.
In this way, a dataset for reinforcement learning is obtained, which is used to train GPT-3 to generate more reasonable outputs based on user input.
The participation of crowdsourced users can help with the manual feedback adjustment of large language models, as it can provide diverse, high-quality, and real-time data, thereby helping the model better understand and meet the user's intent. For example, we can use MetaGPT to allow users to rate or comment on the model's answers or suggestions, and then adjust the model's parameters or strategies based on user feedback, making the model more accurate, user-friendly, and useful.
In this way, we can align the model with users in various tasks to improve user satisfaction and trust.
Meanwhile, MetaGPT is a Web3+AI project that utilizes the open API of GPT-3 to allow anyone to train open-source fine-tuning models on it. After training a suitable fine-tuning model for a specific field, anyone can choose to charge or offer it for free, with all fees going to the model trainer. Free models will also be rewarded with equity tokens based on usage on MetaGPT.
MetaGPT aims to become a bridge connecting AI creators and users, promoting innovation and popularization of AI. This goal may be achievable through its creation of Train to Earn.
Negentropy Capital was established in early 2020 and is a globally diversified venture capital firm focused on the digital asset, cryptocurrency, and blockchain technology industries. Its main functional departments include strategic mergers and acquisitions, strategic investments, asset management, and external cooperation. It has three special funds: Metavers Ecological Fund, NFT Special Fund, and DeFi Special Fund. Negentropy Capital focuses on upstream and downstream investments in the blockchain industry, and is committed to the forefront layout of the blockchain industry, aiming to empower visionary cryptocurrency believers and teams through capital means.
This article is from a submission and does not represent the views of BlockBeats.
Welcome to join the official BlockBeats community:
Telegram Subscription Group: https://t.me/theblockbeats
Telegram Discussion Group: https://t.me/BlockBeats_App
Official Twitter Account: https://twitter.com/BlockBeatsAsia