Detailed performance bottlenecks and challenges of blockchain modules: network module, consensus module and execution module

2022-11-22 17:03

Read this article in 21 Minutes

Blindly pursuing high performance on paper will only result in a centralized crash chain. Only by truly facing and solving the problems in performance optimization is the right way to improve performance.

Original author: Chenxing Li

Blockchain performance optimization is a Very hot topic. However, due to the complexity of the blockchain system, the threshold for systemic understanding of performance optimization is very high, which provides room for "false performance standards". In the past, there was a "million tps" great leap forward, and later there was a "800,000 tps" downtime chain.

Therefore, I hope to expand on the performance bottlenecks and challenges that determine the various modules of the blockchain, and look at the moisture behind those beautiful data.

1. Network module

As a go In a centralized system, network communication is the foundation of the entire system, and some people call it Layer 0.

I abstract the network module into three layers: network facility layer, node connection layer, and broadcast protocol layer. Each layer is the foundation of the next layer, and the performance of each layer is the upper limit of the performance of the next layer.

The bandwidth and delay of the network module form the basis of the tps and finality delay of the blockchain system.

1.1 Network facility layer

Bandwidth: mainly depends on the development of network infrastructure and the configuration requirements of blockchain nodes. In the past few years, the network configuration requirements of the public chain were generally 20Mbps to 100 Mbps. By 2022, Aptos has already required 1 Gbps network bandwidth. In short, the higher the bandwidth requirement, the higher the node threshold and the more centralized.

Delay: There is an optimization limit for delay, which is the speed of light. The transmission delay in the Internet is greater than the speed of light delay. The intercontinental node latency measured by Conflux can reach 200-300ms. If it is the kind of "computer room chain" in which all nodes are in one data center, the delay can be ignored.

1.2 Node connection layer

node connection layer The message broadcast in the network is mainly realized through the communication between neighbor nodes.

Bandwidth: In general, the node connection layer can obtain the bandwidth close to the network facility layer. You can also choose to sacrifice bandwidth to reduce latency: for example, when you want to broadcast a message, send it to all neighbors at the same time (the bandwidth requirement is doubled), instead of sending one and then sending the next.

Delay: Message broadcast delay is related to the number of nodes, the more nodes, the higher the delay.

Currently Bitcoin and Ethereum There are probably thousands of nodes in the workshop. According to our experiments, if the entire network has 10,000 nodes all over the world, the median broadcast delay is 3-6 seconds, and the maximum can be 15 seconds. With some protocol optimizations, the maximum latency can be cut in half again.

However, some public chains claiming that the confirmation delay is 1~2 seconds can only support fewer nodes.

1.3 broadcast protocol layer

The node connection layer is only responsible for forwarding data blocks, regardless of what the data is. The broadcast protocol layer defines specific blocks and transaction forwarding rules.

Bandwidth: mainly lies in how to reduce redundant transmission. Just imagine, if every neighbor sent you the same transaction, wouldn’t it be a waste? Shrec, a forwarding protocol designed by Conflux, increases the tps of broadcast transactions by 6 times under the same network bandwidth by reducing redundancy.

However, as long as the bandwidth of the network facility layer is high enough (such as 1Gbps), even if it is not optimized, it will not become a bottleneck here.

Delay: Some consensus protocols will amplify the delay of the broadcast protocol layer several times. For example, the block interval of Bitcoin needs 5 times the delay of the broadcast protocol layer, and the confirmation needs 6 piece. Therefore, optimizing latency here is critical. In 2016, Bitcoin reduced the block broadcast delay from 120 seconds to less than 10 seconds through the compact block design.

The compact block does not contain complete transactions, only the first 6 bytes of the transaction hash, because these transactions have been broadcast in the network and received by most nodes. This can speed up block broadcasting, so that the broadcast protocol layer can obtain a delay close to that of the node connection layer. After 2017, high-performance public chains have basically adopted this design.

2. Consensus module

The consensus protocol is the most complex and delicate part of the blockchain system. It coordinates the nodes that do not trust each other and provides trusted decentralized services for upper-layer applications. For a long time, the performance optimization of the consensus module has been a hot spot.

Bandwidth: The flaws of the Satoshi Nakamoto consensus lead to a very low level of consensus bandwidth, otherwise it will increase network forks and reduce system security.

New after 2017 The protocol can basically make full use of the bandwidth, which is no longer a problem.

However, some projects confuse the tps of the consensus module and the tps of the blockchain system, and call the full use of bandwidth "infinitely scalable", as if the network bandwidth is unlimited.

Delay: The delay of consensus refers to how long it takes for a block to be finalized. The confirmation delay of the Nakamoto consensus is very poor, and it needs about 30~60 times the delay of the broadcast protocol layer. Subsequent PoW protocols such as Bitcoin-NG, OHIE, etc. have not optimized this delay. Prism optimizes latency by 23x and Conflux by 3x. I have limited understanding of the PoS protocol, and it is estimated that it will take about 5 times the delay.

However, there is a big difference between PoW and PoS protocols: PoW refers to the maximum delay, PoS refers to the median delay, and there may be a 3-fold difference between the maximum delay and the median delay. Therefore, the general delay performance of PoS consensus is better. If there are few nodes, it is not impossible to enter 10 seconds. As for Ethereum, which is slower than the PoS consensus, it can only be said to be a strange thing.

The consensus module is the most serious part of "parameter false standard". For example, if you need to wait for 6 blocks to meet the security requirements, the project party tells you that 1 block is enough. Anyway, if no one attacks, it will not be exposed, and if there is no asset, no one will attack.

There is also a technology called sharding: group nodes and distribute transactions to each group. Each group only processes its own transactions and trusts other groups. By increasing the number of groups, this technology can easily obtain a high tps for bragging, but it is believed that other groups will bring security risks. Therefore, sharding is suitable for scenarios that do not require high security, such as domestic alliance chains.

3. Execution module

Ethereum So it can open up a world outside of Bitcoin because it creates programmable digital assets. Therefore, the transaction execution module is also an important part of the blockchain system. It is also a link that was overlooked in the early performance optimization.

Execution no longer distinguishes bandwidth And delay, only care about the number of transactions or computing tasks processed per unit time.

The efficiency of the execution modules is limited by the various resources of the computer system.

3.1 CPU resources

< p ql-global-para="true" line="NHap" ql-global="true">

In serial execution, the performance bottleneck of the CPU is very obvious. In the past 5 years, CPU single core performance has improved by less than 1 times. In EVM, if storage access is not considered, the fastest CPU can execute 100 million gas in about 1 second, which is 80 times the current performance of Ethereum (only a rough estimate of the magnitude).

Parallel execution is a key step in utilizing CPU resources. Some projects are trying to propose more parallel language models, such as Move.

According to a research on EVM parallelization in Conflux, the current parallelization potential of transactions on the Ethereum chain is 9 times tps.

However, parallelizing VMs has many challenges. For example, in an ideal situation, transactions are highly parallel; in the worst case, transactions depend on each other and can only be serialized. So how to design gas pricing and gas limit so that the ideal situation can make full use of parallel optimization, and the worst case can not keep up with the execution?

3.2 Storage access resources

Like the network facility layer, the performance here mainly depends on the development of hardware and the minimum configuration of blockchain nodes. Unless the data is cached in memory, the read and write performance when executing transactions cannot exceed the read and write performance of the hard disk.

Take Aptos as an example. The storage requirement of their nodes is 40K IOPS, and a transaction may involve the status modification of the two accounts of the sender and the receiver. In the worst case, the network can only support 2 million tps.

But their claimed tps is 160,000, so one can imagine how many undisclosed prerequisites there are behind it.

3.3 Verifiable storage structure

Verifiable storage structure is an important data structure for blockchain storage. It allows a light node to query the state of the chain from a full node it does not trust, which is the most important link in the blockchain trustless. In Ethereum, accessing the verifiable storage structure MPT is 10 times slower than accessing the database directly. Therefore, some blockchains simply remove the verifiable storage structure in exchange for better performance.

To conclude, the performance optimization of the blockchain is not a process of pursuing the limit, but a trade-off between security, efficiency and decentralization under various constraints.

Some trade-offs can be Optimized, such as the Nakamoto consensus, the contradiction between consensus bandwidth and security was later resolved.

Some trade-offs are unavoidable, if you ask for 256 GB of memory per node, you doom the number of independent participants to a small number.

Due to space limitations, there are still many security-related considerations that have not been mentioned. But the above content is enough to break a lot of cakes.