OPRR Flash Depth Data

Search Extension

Data defines people: The quest for Decentralized cloud storage

2022-08-03 22:00

Read this article in 70 Minutes

Cloud storage ≠ cloud computing. Customers typically use both computing and storage services on the same platform.

Original title: Data is Human: The Quest for Decentralized Cloud Storage

By Hunter Lampson

The Block unicorn

Data defines human beings. Society's pursuit of technological innovation and the digitization of human life has created an explosive demand for data storage and retrieval. From the agricultural revolution, healthcare discovery and political profiling, to self-driving cars, protein folding and neural networks, data is a major enabler to help us discover new solutions to achieve our goals. It is the fundamental tool that limits and forces our ability to act with agent irreducible input, allows access and gives meaning to our digital and physical lives. Data defines human nature: We have to care a lot about how our data is stored, managed and owned.

Global Data Market

Today, more than 63 percent of the global population, or more than 5 billion people (7.7 billion according to Google), use the Internet, a number that will continue to grow by more than 10 percent a year.

But the cloud storage market is growing even faster, and the global data segment (the amount of data created, captured, copied, and consumed worldwide) is expected to grow at a 58% CAGR from 2015 to 2025. By 2025, The amount of data created, stored, and replicated will exceed 180 ZBS (1 zb equals 1024tb, and 1 TB equals 1024GB). If you stack enough 10 terabytes of hard drives to meet the world's data needs by 2025, that stack could reach the moon.

Figure 1. & have spent Global data circle size by year. Source: Uygun and Dngl, 2021

From an economic perspective, the cloud storage market was valued at around $76 billion in 2021; By 2028, it will reach $390 billion (at a CAGR of 26.2%). Despite this explosive economic growth, cloud storage vendors continue to consolidate their market share. As of the 22nd quarter, the big three cloud providers -- Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (which I affectionately refer to as the Big Three cloud services) -- accounted for 65% of the cloud computing market. The power that centralized cloud storage providers have combines their network effects, reputation, technology infrastructure, and balance sheets to the point where new competitors simply cannot compete.

Type of storage solution

1. The local

2. Centralized (Centralized) Cloud Storage (CCS)

3. Decentralized (decentralized) Cloud Storage (DCS)

Local storage and CCS providers -- the big three (Amazon, Azure, Google) and Alibaba Cloud, Box, iCloud, etc. -- are all characterized by a centralized storage approach. This means that information is stored and maintained in a single location (or a few locations), managed in a single database, and operated by a single entity, with both in-place deployment and CCS solutions at risk of a single point of failure.

The proliferation of CCS solutions calls for a historical review of the economics of on-site data storage. At first, users stored data on their own hardware. This means that both data storage and maintenance want to be stored in the same physical physical location (such as the company's existing data servers), which I call phase 1.

Figure 2. & have spent There are three phases of data storage. Source: Hunter Lampson

Customers grew as the network effects of cloud storage enabled cheaper (and often more secure) storage capabilities, consumers and companies moved to centralized clouds (phase 2), and CCS solutions developed cloud computing, apis, and other SaaS offerings. Although centralized solutions are the simplest, cheapest, and most efficient option on the market, their basic limitation remains the same: one container is responsible for 100% of an entity's data. CCS solutions are an improvement over indoor solutions, but what was once the economic best has become expensive and prohibitive. Today, DCS providers are the cheapest and most secure storage solutions on the market.

Major weaknesses of the CCS solution

1. Lack of data ownership

When users upload data to a CCS provider, they no longer own their data. Apple's controversial decision (later reversed) to scan iCloud users' photos is a case in point. Apple has a strict privacy policy when data is stored on a given hardware product (iPhone, Mac, etc.). But importantly, once a user uploads a byte of data to iCloud, Apple considers that data to be in their domain -- and no longer in the user's domain. This precedent means that data stored locally belongs to the user, while data stored in the cloud belongs to the storage provider.

2. Prone to data leakage and interruption

It doesn't take much time for a large number of data breaches to occur among CCS vendors. Amazon, Azure, and Google have all suffered from this problem due to their single point of failure structures.

The centralized structure of these providers allows them to build large walled gardens and provide a higher level of security than in-house solutions. At the same time, the bigger and more centralized the database becomes, the more attackers covet it. Data outages are also common in CCS solutions, examples can be seen here: Amazon, Azure, Google.

3. Leaning towards censorship

CCS providers not only lose data uncontrollably, but also delete it deliberately. Just a few weeks ago, Bankless, a popular YouTube channel, was terminated without any warning, notice or reason. Google, which owns and stores YouTube content on its cloud service, thankfully restored the channel, but Google and other CCS providers had to remove the presence of certain data, which is detrimental to society.

4. The high cost

Perhaps the most critical drawback of the CCS solution is its high cost. While the cost of storing data has fallen by an average of 30.5% a year over the past 50 years, the price of CCS has remained constant for the past seven years. This is due to the cumulative network effects of CCS providers. Because of these network effects, the Big Three have come to dominate cloud computing. As their combined market share continues to grow, the Big Three act like an oligopoly with the ability to manipulate prices and keep out new entrants.

Figure 3. & have spent Data Storage Costs over time, source: Arweave Beige Book

Figure 4. & have spent Data storage costs for AWS, Azure, Google over time. Sources: AWS, Azure, Google, Hunter Lampson.

The main reason for the gap between storage prices and storage costs is the market dominance currently maintained by CCS providers, with DCS solutions taking a different path.

DCS solution

On top of the weaknesses of CCS, decentralized storage (DCS) has proven to be a paradigm shift in data storage (phase 3). DCS solutions utilize free disk space across geographically distributed node sets by matching supply and demand for storage space. This creates a more efficient market, reduces costs and eliminates the risk of single points of failure present in local and CCS solutions, which also return data ownership to the user.

Figure 5. Cumulative cost of storing 1 GB per year by platform. Sources: AWS, Azure, Google, Storj, SiaStats, Arweave Fees, File.app, Hunter Lampson.

While the geographic distribution of data centers and storage nodes is not the only factor determining network concentration, it is a useful litmus test. The distribution of nodes across space is also an important factor in determining the level of replication, retrieval, and protection of data. In general, the more nodes in the network, the faster retrieval and the stronger protection against natural disasters (when do we put storage nodes on the moon? !). Therefore, it is important to understand that node decentralization is a prerequisite for effective cloud storage.

What is revolutionary about the DCS solution compared to the CCS solution is how decentralized it is. There are more than 114 times more active nodes running on Sia, Storj, Filecoin, and Arweave than in data centers managed by AWS, Azure, and Google Cloud combined.

Figure 6. Total number of active nodes by service. Sources: Filscan, Viewblock, Storj, SiaStats, Peterson 2015, Baxtel, Google, Sam Williams, Hunter Lampson.

The number of nodes in Arweave is difficult to quantify because Viewblock provides statistics that treat each storage pool as a single storage node. In an offline conversation, Arweave founder Sam Williams told me that 59 current storage pools (based on Viewblock) can have hundreds or even thousands of nodes supporting them. As a result, Viewblock underestimates the actual number of nodes by about 10-100 times. For this reason, to be conservative, I use "500+" as the number of nodes. Also note that the active node count is an imperfect measure of decentralization, and the absolute number of nodes does not tell us who is running the nodes (and how many nodes each entity operates on).

To borrow from Spencer Applebaum and Tushar Jain, an important difference between DCS services is the difference between a contract-based storage solution and a permanent storage solution. Briefly: All DCS services currently on the market are contract-based models, with the exception of Arweave.

Contract-based storage model versus persistent storage model

Filecoin, Sia, and Storj use a contract-based pricing model -- the same one CCS currently uses. Contract-based pricing means that users pay continuously to store data, similar to a paid subscription (monthly/yearly payment). Despite their subtle differences, Filecoin, Sia, and Storj compete directly with existing CCS providers.

Arweave, on the other hand, offers a permanent storage model, meaning users pay a fee and in return their data will be stored permanently. Often laid-back and imprecise compared to other DCS and CCS providers, Arweave's fundamental feature that sets it apart from its competitors is data persistence.

Figure 7. Concept diagram of CCS and DCS solution. Source: Hunter Lampson

A closer look at Filecoin, Sia, and Storj helps us better understand how they differ from CCS providers and Arweave.

Figure 8. Key characteristics of the DCS solution. Sources: Filecoin, Storj, Sia, Arweave, CoinMarketCap, Crunchbase.

Filecoin

Filecoin 于 2020 年 10 月推出主网，是目前市场上采用最广泛、资金最充足的 DCS 项目。截至 2022 年 7 月 12 日，Filecoin 的完全稀释市值约为 11.9 亿美元，历史最高纪录为 123 亿美元。Juan Benet 是协议实验室 (Protocol Labs) 的创始人兼首席执行官，该公司开发了 Filecoin 及其底层技术——星际文件系统 (IPFS)。到目前为止，Filecoin 已经筹集了 2.582 亿美元的资金，其中大部分来自 2017 年底的首次 Token 发行 (IC0)。

To understand Filecoin, we must understand IPFS, a peer-to-peer (P2P) distributed system for storing and retrieving data. IPFS, built to address the shortcomings of the HTTP-based Internet, uses content addressing to classify data, meaning that information is requested and delivered based on its content rather than its location. This is achieved by issuing a content identifier (CID) for each data block, which is generated by hashing the contents of each file to make it immutable.

To locate the requested information (represented by a unique CID), IPFS uses a distributed hash table (DHT), which contains the network location of the node where the content associated with the CID is stored. When a user requests information from an IPFS node, the node checks its own hash table to see if it can locate (and then retrieve) the requested file. If the node does not contain the requested information, it can download the content from its peer node and deliver it to the user. In this model, information is replicated across multiple nodes, rather than having a single, centralized location in the HTTP model. This eliminates the risk of a single point of failure while improving retrieval speed because data is retrieved from multiple peers simultaneously.

Filecoin is an economy built on top of IPFS, a communication network used to store and transmit data. IPFS by itself does not incentivize users to store other people's data: Filecoin does. This is done through two unique proof mechanisms: copy proof (PoRep) and spatiotemporal proof (PoSt). PoRep is run only once to verify that the storage miners have what they say. For each on-chain PoRep, 10 SNARKs(concise non-interactive knowledge argumentation) were included, which demonstrated the completion of the contract. PoSt, on the other hand, is run continuously to demonstrate that storage miners are dedicating storage space to the same data over time. The on-chain interactions required to validate this process are data intensive, so Filecoin uses ZK-Snarks (Zero Knowledge Concise Non-Interactive Knowledge Argumentations) to generate these proofs and compress their data by a factor of 10.

Sia

Of the four DCS protocols discussed, Sia was the first to be released and was released in June 2015. Founded by David Vorick and Luke Champine at HackMIT in 2013, Sia has a strong user appeal and a fully diluted market capitalization of $190 million, an all-time high of $2.97 billion.

Sia was launched by Nebula Labs, which was founded in 2014. Sia divides the uploaded data into composite parts (in this case fragments) in a manner similar to Filecoin and distributes them to distributed hosts around the globe. Unlike Filecoin, Sia achieves this through a different proof of store (PoS) mechanism. This proof requires hosts to share a small amount of randomly selected data over time. The proof is verified and stored on the Sia blockchain and the host receives the Siacoin award.

Storj

Like Filecoin and Sia, Storj has gained significant traction since its launch in October 2018. Storj differs from Filecoin and Sia in that it does not rely on blockchain consensus to store data. Instead, Storj relies entirely on erasure coding and satellite nodes to store data to increase data redundancy and reduce bandwidth usage. Storj's exclusive use of erasure encoding means that data persistence (the probability that the data will remain available in the event of a failure) is not linearly related to the expansion factor (the additional cost of reliably storing the data). Therefore, on Storj, higher persistence does not require a proportional increase in bandwidth. Given node handovers (the rate at which nodes go offline (or leave the network)), erasable coding may be valuable in the long run because it requires less disk space and bandwidth for storage and repair, although it increases CPU runtime.

Storj also differs from Filecoin and Sia in its network architecture and pricing mechanism. On Storj, pricing is determined by intermediate storage users (including applications) and satellite nodes of storage nodes. Satellite nodes are responsible for negotiating prices and bandwidth utilization. Thus, Storj's pricing model is not entirely dependent on free market activities, but subject to concentrated power, as satellite operators represent a potentially centralized intermediary between nodes and end users.

Storj also has native integration with Amazon S3, which means that existing Amazon S3 users can migrate to Storj and use basic features without changing their code base, potentially reducing the friction associated with leaving the Amazon S3 ecosystem.

Arweave

Unlike Filecoin, Sia and Storj, Arweave offers permanent data storage. Founded by CEO Sam Williams and William Jones in June 2018, Arweave's fully diluted market capitalization reached $890 million as of July 12, 2022, At an all-time high of $4.18 billion.

Arweave 寻求以去中心化的方式提供永久数据存储，只需一次性付费，这是通过 Arweave 捐赠机制完成的。考虑到数据存储的成本在过去 50 年里以每年 30.5% 的速度下降，Arweave 认为今天每 GB/1 美元的存储购买力比未来每 GB/1 美元的存储成本更高。这个三角洲使 Arweave 的捐赠池成为可能，「本金」是用户支付的前期费用，「利息」是购买力随着时间推移 Token 价格的上涨。Arweave 的保守假设是存储价格每年下降 0.5%，这使得捐赠池能够长期生存。

Arweave's current cost of about $3.85 per gigabyte reflects the terminal value of data storage. In the short term, Sia and Filecoin (and even the Big3) are cheaper. But in the long run, Arweave becomes the smarter choice. Even in the short term, users will pay a premium for something no one else can offer: data persistence. For some people, the cost of permanent storage is relatively inelastic because certain files, such as NFTS, require permanent storage.

Arweave is powered by block weaving, a blockchain-like data structure in which each block is linked to the previous block and the recall block. A recall block is any previously mined block other than the most recently mined block. Thus, Arweave's structure is not just a chain that links successive blocks together -- it is a weave that links the current block to a previously mined block and another random block, the Recall block.

In order to mine new blocks and receive mining incentives, miners must demonstrate that they have access to the recalled blocks, and Arweave's Proof of Access (PoA) mechanism guarantees that for each newly excavated block, data from the recalled blocks are included. This means that to store new data, miners must also store existing data. The PoA also encourages miners to copy all data equally across nodes. When a less replicated block is chosen as a recall block, miners able to use that block will compete for the same reward in a smaller pool of miners. Other things being equal, miners who store less replicated blocks will get bigger rewards over time.

Built on top of block weaving is the permanent Web - similar to today's World Wide Web, but permanent. Arweave's Bockweave is the foundation layer that powers Permaweb; Permaweb is the layer with which users interact. Given that Arweave is built on top of HTTP, traditional browsers will be able to access all data stored on the web, enabling seamless interoperability.

attractive

While DCS solutions may be superior to CCS solutions in theory, they should be evaluated according to their usefulness in practice, and we can measure the attractiveness of each project by examining the following:

1. Stored data

2. The node distribution

3. Search of interest

4. The power of ecosystems

5. Revenue on the demand side

1. Stored data

Demand is directly measured by checking the amount of data stored over time and is regarded as a key KPI for DCS providers. On this metric alone, Filecoin dominates; As of this writing, Filecoin stores more than 90% of DCS data circles, up from 82.8% 90 days ago.

Figure 9. Proportion of the DCS data market stored. Sources: Storj Stats, SiaStats, Viewblock, File.app, Hunter Lampson.

Filecoin not only stores the most data, but it's also growing the fastest. Over the past 90 days, data stored on the Filecoin platform grew by 112%.

Figure 10. Data Storage Growth (last 90 days). Sources: Storj Stats, SiaStats, Viewblock, File.app, Hunter Lampson.

According to these are storage protocols, the amount of data stored is an important metric, although it has serious limitations. The amount of data stored tells us nothing about the benefit of the agreement or the data itself (how much it is worth, what it does, how long it will be stored, etc.). There is an ongoing debate between DCS and CCS vendors about how to characterize stored data because not all data is valued (and treated) equally. Some data are more important than others. Users may classify their storage providers based on this metric, so the amount of data stored paints only a partial picture.

There is also a lack of context on how the data will contribute to the protocol's demand-side revenue, which is particularly problematic when considering Filecoin, the only DCS service that provides substantially free storage. For this reason, users may use Filecoin to store data due to its current pricing (more on that later...). . While it's hard for me to find open sources on this (for obvious reasons), it's interesting that countless builders and researchers in this space - all of whom I have great respect for - have told me that Filecoin tends to partner with large institutions to offer free storage in order to manipulate their storage metrics. In theory, Filecoin can store infinitely more data than any other DCS protocol and still generate zero demand-side revenue.

2. Node distribution

While data storage is a direct measure of storage requirements, we can also look at indirect measures. Node distribution is important for understanding because it highlights the geographic components of demand-side and supply-side actors. We can evaluate this by looking at 1) the storage nodes and 2) the geographic distribution of search interest.

The more spatially distributed storage nodes are, the better. Higher dispersion (usually) results in greater decentralization and reduced retrieval time from nodes to end users. Higher dispersion also reduces the risk of unrecoverable data loss (usually due to environmental factors, such as natural disasters). Ideally, nodes are not arbitrarily scattered in space, but are related to storage requirements in space (perhaps equivalent to technology saturation times population density). Given that the US, China and Europe have the most concentrated storage needs, we expect them to have the most concentrated storage nodes. Therefore, it makes sense that the distribution of nodes in both CCS and DCS solutions is concentrated in the US, Europe and China. The distribution of DCS nodes is similar to the distribution of CCS storage centers, which is a positive sign that DCS solutions have reached an important level of market maturity.

Figure 11. Geographical distribution of DCS nodes. Sources: Filscan, Viewblock, Storj, SiaStats, Sam Williams.

Figure 12. Geographical distribution of CCS nodes (data centers). Source: Peterson 2015, Baxtel, Google.

3. Search of interest

If we think of the node distribution as a distribution in DCS supply, then we can, at least in part, think of the search interest distribution as a distribution in DCS demand. (This assumption is based on the fact that for each search for a DCS solution, the searcher is more likely to be a user of the storage space than a vendor.)

Based on this metric, Filecoin clearly currently has the highest search interest dominance globally, relative to Storj, Sia, and Arweave. Therefore, one might expect Filecoin to have the highest relative demand.

Figure 13. Relative search Interest Advantage by country/region. Source: Google Trends. Note: I use the term "Sia" instead of "Siacoin"

These assumptions are based on current supply and demand indicators, but similar conclusions can be drawn in retrospect. Filecoin has been the most searched DCS solution since mid-2021. Notably, Arweave in August 2021 and Storj in November 2021 and December 2021 nearly surpassed search interest in Filecoin.

Figure 14. Relative search interest over time. Source: Google Trends

While interest search can be a useful metric, it has serious limitations. This metric shows us how a single user uses Google to get information about each project, search interest tells us nothing about the actual protocol requirements.

It's easy to conclude that because Filecoin has raised the most money to date, they probably have the most money to spend on marketing. Perhaps, then, the marketing budget alone explains the variation in search interest within each project. Perhaps interest search dominance is a better predictor of funding than agreement requirements -- who's to say? In addition, Filecoin has easy-to-understand, keyword-heavy domains, such as Web3. Storage and non-functional testing. During storage, data may also be biased. Users searching for "Web3 Storage" may encounter Filecoin's services, based solely on search engine optimization and the domain they own.

Another limitation of the variability of interest search is that it can be highly uncorrelated with storage requirements. For example, if a user intends to move hundreds of terabytes of data to a DCS provider, their search activity (one search) will not reflect their actual storage needs. It's also possible that external variables, such as the extent to which CEX (centralized trading platforms) like Coinbase market these individual tokens, play an important role here.

4. The ecosystem

Because DCS solutions exist at the infrastructure layer, their ecosystems typically represent user needs because their users (consumers, companies, developers) can choose which ecosystem to use or build. The power of an ecosystem comes from 1) projects built on agreements and 2) existing projects that work with them. Given the maturity of their partnership and the rate at which new projects are being added, Filecoin has the strongest ecosystem. In the last 18 months, Filecoin's ecosystem has grown from 40 projects to more than 300.

Filecoin has an impressive list of partners, Among them :Chainlink, Polygon and Polygon Studios, The Graph, Near, ConsenSys, Brave, ENS, Flow, Hedera, ChainSafe, Ceramic, Livepeer, Audius, Decrypt, MoNA, and Skiff.

Figure 15. Filecoin ecosystem. Source: Messari

To help grow their ecosystem, the Filecoin Foundation invests heavily in its ecosystem and grant programs. Protocol Labs, the team behind Filecoin, has made 46 direct investments to date, deploying more than $480 million to ecosystem projects including Decrypt, Syndicate, ConsenSys, and Spruce.

Figure 16. Protocol Labs investment over time. Source: Crunchbase

Next to Filecoin is Arweave's ecosystem, with Filecoin's nearly 300 partners compared to Arweave's roughly 60. Although many partners can benefit from both platforms - Mirror and Skiff, for example, can offer users both Filecoin and Arweave - other projects, such as Solana, are unlikely to use both. This means that many of the most critical web3 infrastructure projects -- protocols, dapps, NFT platforms -- will find product-market fit storage protocols that are ideologically aligned with Filecoin or Arweave, depending on the specific use case. The strength of each ecosystem will play a crucial role in the long-term viability of each platform, so the ability to win the hearts and minds of old and new builders is Paramount.

It's worth noting that, relative to Filecoin, Arweave's ecosystem builds more new projects on the platform -- because they rely on technology to survive -- than projects that selectively leverage existing technology. This also explains why Filecoin works with more established projects, not because the Filecoin partnership has grown faster and more successful, but because Filecoin partners (such as Cloudflare and Opera) have been around longer. Arweave's partners, by contrast, are typically early-stage companies that built their businesses from scratch online. Some of Arweave's notable partners include Solana, Polkadot, The Graph, Mirror, Bundlr, Glass, KYVE, Decent Land, ArDrive.

Figure 17. The Arweave ecosystem. Source: @axo_pas (on Twitter)

Since 2020, Arweave has poured nearly $55 million into 15 ecosystem projects, including Mask, Fluence and Pianity. Through their Open Web Foundry Acceleration program, Arweave helps developers create permanent Web applications and has invested through their community-run ecosystem fund, ARCA DAO.

Sia and Storj have smaller ecosystems, with about 30 and 13 projects respectively. Despite the small size of their ecosystem, Sia and Storj have an excellent partnership. Some of Storj's partners include CoinMarketCap, Crypto.com, Kraken, Filebase, Render, Akash, and Quant, Storj's partners include Microsoft Azure, Fastly, Couchbase, and Pokt.

Importantly, Storj's strategy is built around capturing existing Amazon S3 users, including large incumbents. As a result, many of Storj's partners are likely to resist a public listing. Storj's partners may not see any benefits of being listed as such. In contrast, new Web3-native projects built on Arweave may indeed benefit from being listed as partners to demonstrate their immersion in the ecosystem. Different promotional motivations make ecosystem comparisons challenging because we lack complete data sets.

Today, Sia is primarily used by Filebase(the first Amazon S3 compatible dApp) and Arzen(a decentralized storage application for consumers).

5. Demand side income

Data storage may be the most direct measure of user engagement, but demand-side revenue measures the value of user engagement -- or a project's ability to monetize user engagement. As Sami and Mihai (both MessariCrypto analysts) explain in their article on Filecoin's revenue model, demand-side revenue is a useful metric for infrastructure projects, Because it measures what people pay to use the network (in this case, what they pay to store data). Importantly, demand-side revenues do not include block incentives paid to miners.

While demand-side revenue data for Arweave, Sia, and Storj are available on Web3 Index, demand-side revenue data for Filecoin are hard to find (if anyone can find them, I'd love to see them). Therefore, we cannot include Filecoin in the demand-side revenue comparison.

关于 Filecoin 的收入，我们所知道的是，尽管其平台上存储的数据在增长，但他们的收入却保持持平。这可能是由于两个原因：首先，HyperDrive 更新将存储入网率提高了 10-25 倍，导致对数据区块空间的需求较低 (我们将在后面看到，这损害了 Filecoin 的 Token 价值)。其次，在不产生更高收入的情况下存储更多数据表明 Filecoin 实际上是免费存储数据的。因此，矿工得到的是一个不可持续的区块奖励，大约为每个区块 20.56 FIL，这将随着时间的推移而减少，在不久的将来，Filecoin 将需要提高存储价格，以激励矿工参与到网络中来。

Figure 18. Filecoin's protocol revenue remains independent of ecosystem growth. Source: Messari

But for Storj, Sia and Arweave, we can see demand-side gains being generated over the last 90 days.

Figure 19. Demand side Revenue for Storj, Sia, and Arweave (last 90 days). Source: Web3 Index

Because Filecoin, Sia, and Storj are contract-based (AD hoc) solutions where users pay for continuous data storage, we can assume that a non-zero portion of their demand-side revenue is recurring. By contrast, we can assume that 100% of Arweave's demand-side revenue is non-recurring, since users only need to pay once to store data permanently. That means the only way for Arweave to generate demand-side revenue is to store net new data. This is a huge challenge for Arweave and reminds us of the shortcomings of comparing Arweave to DCS.

What may prove to be a moat -- or competitive disadvantage -- is the difference in demand-side revenue efficiency (roughly equal to price) between DCS platforms. I define demand-side "revenue efficiency" as the amount of demand-side revenue generated per byte of data stored. Over the past 90 days, Storj and Sia generated demand-side revenue of $96.50 and $89.90 per terabyte uploaded, respectively, while Arweave generated about $10,200 per terabyte uploaded. This pricing model is another fundamental difference between Arweave and its DCS competitors: Arweave charges a premium for services with unique features. It also means that Arweave can store 113 times less data than DCS and still generate the same demand-side revenue as its DCS competitors. This suggests that Arweave should not store the same amount of data as other solutions because its services and pricing mechanisms are not comparable.

Token 估值

methods

Storj, Sia, Arweave 和 Filecoin 被理解为 1) 实用 Token (礼品卡) 和 2) 交换媒介 (货币) 的组合。实用 Token 的估值基于其预期的未来效用;货币的价值取决于供求关系。实用 Token 的持有者可以用它们兑换服务——在这种情况下，就是云存储。为特定服务兑换实用 Token 的能力使其在结构上类似于传统礼品卡或代金券。但是，与礼品卡和代金券不同，实用 Token 是程序化提供和自主的，而礼品卡和货币是商业或政府提供的，(几乎) 总是以不同的货币 (通常是法币) 发行。程序化供应保证了指定的供应时间表，这使我们能够精确的计算 Token 供应。(最近 9.1% 的 CPI 数据向我们展示了程序性货币供应的强大力量) 我们将此与传统货币主义理论相结合，推导出每个 Token 的内在价值。

需要明确的是:我感兴趣的是使用传统货币理论和贴现现金流分析对 Token 价格随时间的变化进行估值。我不打算评估协议本身的价值 (即产生的总收入 [尽管收入很重要…])，也不打算评估存储矿工或存储供应商的利润。我也认识到这些协议产生的非凡文化价值，特别是当它们 (不可避免地) 成为公共产品时。也就是说，每个协议的 Token 价格可能不考虑这些，所以我也不考虑。

首先，一个重要的区别：传统的公开证券 (如$AAPL: Apple) 和 Token (由协议发行) 代表不同的东西。尽管协议产生资产流，但它们不会像苹果那样产生现金流。因此， Token 不应与公开股票混为一谈。 Token 代表使用/交易的权利;公开发行的股票代表所有权。( Token 不仅可以代表对实用程序/事务的权利——例如，它们还可以包括治理权。) 随着时间的推移， Token 的价格估值需要包含不同的机制：货币理论和贴现现金流分析。

Token 的估值模型

我用来评估 Token 价格的主要框架是 Chris Burniske 在他的开创性作品：Cryptoasset Valuations（加密资产估值）中提出的模型。Chris 认为，与其建模传统的 DCF，还不如保持相同的结构，用交换方程代替现金流，这样我们就可以推导出每个 Token 的当前效用价值。然后，我们对未来效用价值应用贴现率来推导今天的内在价值。

替换为交换方程：MV = PQ 帮助我们融入 Token 的货币性质。正如 Chris(以及无数其他人) 所承认的，这个模型有它的局限性 (所有的预测模型都有)，但它可能是我们拥有的最好的模型。鉴于缺乏完全有效的市场，以及预测未来固有的较大误差范围，该模型最好用于说明产生 Token 价值的各种杠杆。

"Crypto asset valuation primarily consists of solving M, where M = PQ/V. M is the size of the monetary base needed to support a crypto economy of size PQ and speed V," Chris wrote.

Block unicorn 注释：M = 资产基础规模，V = 资产的速度，P = 提供的 Token 资源的价格，Q = 正在供应的 Token 资源的数量。

Token 估值模型：投入

To estimate M, V, P, and Q, I will use the following method:

Mathematical derivation of the input

1. Maximum supply

2. The circulation

3. Compound annual growth of circulation

4. Storage cost (1 $/GB/ year) or (1 $/GB)

5. Annual decline in storage cost (CAGR)& NBSP;

6. Size of the data storage market

7. Annual growth of data (CAGR)

This is shown by the average annual decline in storage costs ($/GB/ year) for the Big Three over the past decade, as shown below (Figure 20):

Figure 20. Data storage costs and CAGR for AWS, Azure, and Google. Sources: AWS, Azure, Google, Hunter Lampson.

Subjective assumption

1. 持有 Token 的百分比 (未流通的公共供应量的百分比)

我假设，在 2021 年，50% 的 Token 被持有。这种假设源于这样一个事实：从历史上看，大约一半的 Coinbase 用户将比特币严格地视为一种投资，而另一半则将其视为一种交易媒介。

2. 每年持有的 Token 变化百分比

我假设从 2022 年开始， Token 持有的比例将以每年 1% 的速度下降。随着市场趋于平衡，价值增值的潜力变小，因此流通中的 Token 数量将增加 ( Token 持有率下降)。这是难以估计的—同样，它最好被理解为有助于 Token 估值的杠杆。

Speed of 3.

假设每个 Token 的增长速度是 20%，鉴于比特币的速度历史上一直在 14% 左右，我在这里使用 20% 是保守的做法。

4. TAM(Get global data market share)

I assume that Arweave can handle 10% of the global data market, while Filecoin, Sia, and Storj can handle the remaining 90%. Permanent data storage is a brand new market, so it's hard to determine what percentage of the existing data market it can handle, so I'm using 10% here and want to be conservative. Temporary data storage -- today's dominant storage solution -- must account for 100% of the data market. If we assume that 10% of the existing data market will transition to Arweave, the remaining 90% is left to Filecoin, Sia, and Storj to handle.

5. Obtain the maximum percentage of TAM

I assume the maximum percentage TAM gets is 1% of Arweave, Sia and Storj. Therefore, I assume that Arweave captures 1 percent of the 10 percent of the global data market (Arweave captures a total of 0.1 percent of the global data market and Sia and Storj capture 0.9 percent each). Given its traction and maturity, I assume Filecoin captures 25% of the available TAM(25% of 90% = 18% of the global data market).

6. A turning point

I assume that 2024 is the year that each network reaches an inflection point, and that this year reaches 10% of the maximum TAM gain percentage, which is almost impossible to predict - another illustrative lever.

7. Saturation/year

I assume saturation/year (the time it takes for the network to go from 10% to 90% of the maximum percentage of TAM) is 10 years for Arweave, Sia and Storj, and 4 years for Filecoin, another impossible prediction.

8. The discount rate

I assume a discount rate of 40%, which is the industry standard for assets with this level of risk.

The following is a brief view of the inputs and subjective assumptions of all fixed and variable mathematical derivations:

图 21. Token 估值模型中使用的固定和可变数学衍生投入和主观假设的简明视图。资料来源：CoinMarketCap、Uygun 和 Dngl，2021 年，Chris Burniske，Hunter Lampson。

Different inputs for Filecoin:

1. Data storage cost decline (CAGR) = 0%

2. Assume storage cost ($/GB/ year)= $0.002/GB/ year

In the table, I explicitly labeled Filecoin data storage cost decline (CAGR) and assumed storage cost ($/GB/ year) as a very subjective assumption, although explicit data on this is available. I did this because Filecoin's current pricing is too low to be sustainable.

首先，让我们从假设存储成本 ($/GB/年) 开始。目前，Filecoin 上的存储成本约为 0.0000017 美元/GB/年，或存储成本的 0.0011% 在三大提供商上。正如我上面所讨论的，Filecoin 的定价模式是不可持续的，因为它是由区块奖励大量补贴的。自从他们 2 亿多美元的首次 Token 发行以来，Filecoin 补贴了他们网络上的存储成本。随着他们放弃补贴，我们可以预期他们的存储成本将在当前水平上增加。在同等条件下，存储成本的增加，在固定的需求下，使$FIL 更有价值 (与任何 Token 一样)，但我们可以假设，随着 Filecoin 不可避免地提高价格，其网络上的存储需求可能会减少，降低$FIL 的内在价值。

很难说该团队将如何执行提高价格，即使价格仍然低于三巨头。如果我们以当前定价约 0.0000017 美元/GB/年运行该模型，则 2022 年的内在价值约为 0.00 美元/FIL，再次表明 FIL 今天的定价模型是不可持续的。因此，我估计未来 10 年 Filecoin 存储成本为 0.002 美元/GB/年 (比三大存储价格便宜 100 倍)(假设数据存储成本下降 [CAGR] 为 0%)。这保持了 Filecoin 的价格竞争力——使它们比三大解决方案便宜 100 倍——同时为 Token 价格提供了显著的价值。把这种投入看作个人对 Filecoin 可持续发展的期望，甚至是要求。

Token 估值模型：投入

图 22. Token 估值模型投入。资料来源：CoinMarketCap、Hunter Lampson

Under the same conditions, the leverage of the model is as follows:

图 23. 产生 Token 价值的各种杠杆的图示。资料来源：Hunter Lampson

这个表指的是给定每个杠杆变化的一般投入，而不是保证，当杠杆增加/减少到任意高/低的数字时，投入总是正确的。例如，考虑速度：平均而言，随着速度的增加， Token 价格下降。但一个任意低的速度水平，例如速度为 0，将意味着 Token 每年交易 0 次，因此需要 0 的货币基础来满足生态系统。也就是说，通过避免结尾，表中引用的总体趋势是有用的。

found

Arweave 的经济效益是最具防御力的，其驱动因素是相对较低的AR Token 供应量和相对较高的存储成本 ($/GB)。这部分建立在我之前的结论之上，即 Arweave 是 DCS 产品中需求端收入效率最高的，这意味着它可以存储比同类产品少 113 倍的数据，并产生相同的需求端收入。此外，我假设 Arweave 捕获了全球数据圈的 0.10%，这个假设足够保守，是合理的。如果实现这一目标，2032 Token 价格预计将比目前的水平上涨+182.91x。尽管 Arweave 较高的相对定价可能会强化其单位经济效益，但它也可能是阻碍用户采用的致命弱点，用户将最终决定 Arweave 的服务是否物有所值。

Even if we assume that users are willing to pay these extra fees, in theory, they must be persuaded to use the product in practice. Because Arweave's product is fundamentally different from its competitors, switching costs may be too high and the service too unique to win over new users. Despite Arweave's potential advantages, high costs and reliance on entirely new markets could prove insurmountable. As mentioned earlier, the only way for Arweave to generate demand-side revenue is to store new data. On the face of it, Arweave does not appear to have demand-side benefits per bit of data - something that all CCS and DCS competitors could benefit from. Instead, I think Arweave benefits from demand side gains that are involuntarily repeated. Instead of charging users permanently, Arweave receives "permanent recurring demand-side revenue" up front, which may be one of Arweave's most valuable donation mechanisms.

目前，由于其低廉的价格，Filecoin 的经济是最不可靠的。给定一个固定的 Token 供应，公用事业的成本越低，支持它的货币基础就必须越小。这种观点将低定价定义为 Token 价值的消极属性，而不是积极属性。同样有可能的是，Filecoin 的低定价为其被广泛采用奠定了基础。低定价也可能是 Filecoin 的关键区别，这可能是一个必要的护城河。

However, I am concerned about the important role pricing power will play in determining Filecoin's future. As Tushar and Spencer point out, Filecoin(along with Sia and Storj) is competing directly with the big Three in the temporary storage market. A price war with the Big Three could be disastrous. If Filecoin can keep prices low without unsustainable subsidies, its maturity, ecosystem strength, and industry-wide clout will make it the most viable challenger to the Big Three. If it turns into a price war, as it probably will, things could get ugly.

根据该模型，根据当前定价和 2032 年价格预测之间的差值，Sia 的 Token 经济学使其价值比 Storj 高出 4.5 倍。通常情况下，Sia 和 Storj 被归为 Filecoin 的弟弟。鉴于 Sia 和/或 Storj 的生态系统不那么强健，很难想象在不久的将来，它们会取代 Filecoin 在这一领域的主导地位。尽管如此，Sia 和 Storj 的 Token 经济比 Filecoin 的 Token 经济更有吸引力。定价权对 Token 估值和每个项目的长期生存能力都是不可或缺的。

Limitations and reflections on future research

1. Cloud storage Cloud computing. As Christine Deakers points out, many cloud storage users simultaneously use cloud computing for the data they store. The DCS solution must address this issue. Filecoin has already started building its virtual machine -- and other DCS solutions may follow suit.

2. DCS solutions need more integration. As Mark Gritter points out, most iot applications require not only distributed storage but also decentralized databases. If DCS solutions do not have native integration with traditional time series databases, this can be a major barrier to adoption.

3. DCS solution should allow location selectivity. One example that Mark Gritter mentioned is self-driving cars. The stream of sensor data collected by autonomous vehicles must be stored in a decentralized manner to achieve the lowest possible latency. If the data uploader (car and car company) is unable to select a nearby location to store the data, the DCS solution may not work well for this use case.

The last

(1) While cloud computing is different from cloud storage, it is reasonable to make a set of assumptions: First, the companies that provide cloud computing services (such as the Big Three) tend to provide such services on top of the data they store. In other words: customers often use both computing and storage services on the same platform.

Second: We can assume that as cloud computing companies capture more market share, they benefit from a better unit economy at an increasing rate. The bigger a company is, the more effectively it negotiates hardware pricing, which lowers costs for customers and attracts more users, further enhancing its negotiating power. So, when I mention that the big Three have a 65% share of the cloud computing market, we can assume that they have a similar amount of cloud storage market share.

(2) In this article, I use the terms "secure" and "secure" to describe highly replicated data on a distributed node-set, which results in higher data redundancy, more consistent uptime, and reduced likelihood of censorship and single point of failure risk.