IPFS (InterPlanetary File System) is a distributed system for peer-to-peer contentStorageand Transmission. Unlike traditional Internet service providers, this system is decentralized and not controlled by any centralized organization. Anyone can participate in the system, become a node, and contribute idle memory in their computing devices to store and share content (files).
For example, when you search for a term on Wikipedia, your computer asks Wikipedia's centralized server to share relevant content with you. In IPFS, your computer will ask other computer devices to share content with you.
IPFS was founded in 2015 by Protocol Labs. It is committed to building a system that connects various places and even the interstellar universe. This is also the origin of the name IPFS.
Content addressing (Content Addressing) is a method that uses the content itself rather than the location. Search method to search. Continuing the previous example, when you search for "Bitcoin" in TokenInsight, the browser will jump to a specific URL: https://tokeninsight.com/en/cryptocurrencies, which is the currency details of Bitcoin on TokenInsight. page. With IPFS, your computer asks other computing devices who owns the page and requests that other computers share the page with you. The content here can be files, web pages, applications or Metadata.
Quoting an official IPFS example: When you want to find a book in the library, you usually look for it by its title, not by its location. . Few people would say: I want the fourth book on the left in the third row of bookshelf No. 1 on the second floor. This way if the book is moved, it will be difficult to find it.
When a user uploads content on IPFS, the content will be marked with a unique identifier (CID) and stored on the node, and then the system will spread the content to the entire network (that is, stored in multiple network nodes) . In order to ensure the speed of the network, IPFS may divide the uploaded content into several pieces, and each piece is marked with a CID and stored separately. The system will then use a data structure called Merkle DAG to associate and organize the data.
If a user wants to find specific content, they can search for the corresponding CID and the system will find which actor owns the content and ask them to deliver the content to the requester.
IPFS content retrieval is through Content Identifier ( CID) to achieve. Each file/sub-file in the IPFS network will have a unique ID, that is, CID. The CID is a hash of each file's information, derived from the content itself. Therefore, the same content will produce the same CID, and any changes in the content will be reflected directly in the CID.
IPFS uses the sha-256 hashing algorithm by default, but supports many other algorithms.
In order to structure/process CIDs in the network, IPFS uses the Merkle DAG data structure. Merkle DAG stands for Merkle Directed Acyclic Graph, which is derived from DAG.
What is a Merkle DAG?
We can split the Merkle DAG into four words. First of all, Graph refers to a graph used to represent the relationship between different objects, including objects (also called nodes) and edges. Directed refers to the direction. Each edge has a direction, pointing to the next object. Acyclic means that there are no cycles in the graph, and each object does not point to the previous object. DAG in Chinese is directed acyclic graph. Merkle DAG is a type of DAG. Each object in Merkle DAG has a unique ID, which is the hash of the content.
To ensure the running speed of the network, IPFS will divide the content (files) uploaded by users into small blocks/sub-files (for example: 256KB each block), and each small block will be stored in a blocks, and then use Merkle DAG to connect each block. Merkle DAG can be used to show the relationship between each piece of content and its sub-files, and can even be used to categorize files (similar to folders and files).
Features of Merkle DAG:
In IPFS On the Internet, users can use the Distributed Hash Table to find the target content, that is, who owns the content you want and where his location is.
Hash Table refers to a database composed of keys and values. Each value has a unique corresponding key, so users can find the value through the key. The key here can be understood as CID, and the value can be understood as content. Distributed Hash Table means that there are multiple participants in the system, and each participant has a Hash Table. Users can view the Hash Tables of participants to find the content they want.
Distributed Hash Table is suitable for finding information in large amounts of data, because each key format is consistent, and each participant will classify the data to improve the efficiency of data retrieval. .
When users locate the target content they want, they will establish contact with the participant's computing device and obtain the target content (this process will be completed through the Bitswap module on IPFS). After receiving the target file, the user can confirm whether the CID of the received content and the target content are consistent.
Currently, some web3 projects already use IPFS as their storage infrastructure, spanning various fields , such as Filecoin (storage service provider), Audius (decentralized music service provider), Pinata (NFT hosting service provider), OpenBazaar (peer-to-peer e-commerce platform), Morpheus.Network (supply chain network service provider), etc.
You may also be interested in:
- What is Arweave?
- What is a distributed ledger?