Flash Depth Data

Search Extension

Zero-knowledge Machine Learning (ZKML) : How will ZK and AI collide?

2023-04-06 22:00

Read this article in 16 Minutes

This article describes the motivation for building ZKML, current efforts, and potential applications.

AN INTRODUCTION TO ZERO-KNOWLEDGE MACHINE LEARNING (ZKML)

Original source: Worldcoin

Deep Tide TechFlow

Zero-Knowledge machine learning (ZKML) is a research and development field that is making waves in cryptography recently. But what is it and what is it for? First, let's break the term down into its two components and explain what they are.

What is ZK?

A zero-knowledge proof is a cryptographic protocol in which one party (the prover) can prove to the other party (the verifier) that a given statement is true without revealing any additional information other than that the statement is true. This is an area of research that is making great progress on all fronts, from research to protocol implementation and application.

The two main "primitives" (or building blocks) provided by ZK are the ability to create a proof of computational integrity for a given set of calculations, where the proof is much easier than performing the computation itself. (We call this attribute "brevity"). ZK proofs also provide the option to hide some parts of a calculation while keeping the calculation correct. (We call this property "zero knowledge").

Generating proof of zero knowledge is very computationally intensive, about 100 times more expensive than the original calculation. This means that, in some cases, zero-knowledge proofs cannot be calculated because the time required to generate them on the best hardware makes them impractical.

However, advances in cryptography, hardware, and distributed systems in recent years have made zero-knowledge proofs an increasingly powerful and computationally viable option. These advances have made it possible to create protocols that can use computationally intensive proofs, thus expanding the design space for new applications.

ZK use case

Zero-knowledge cryptography is one of the most popular techniques in the Web3 space because it allows developers to build extensible and/or proprietary applications. Here are some examples of how it could be used in practice (although note that many of these projects are ongoing) :

1. Extend Ethereum with ZK rollups

Starknet

Scroll

Polygon Zero, Polygon Miden, Polygon zkEVM. Polygon zero, polygon miden, polygon Zkevm

zkSync

2. Build privacy-protecting applications

Semaphore

MACI

Penumbra

Aztec Network

3. Identity primitives and data sources

WorldID

Sismo

Clique

Axiom

4. Layer 1 protocol

Zcash

Mina

As ZK technology matures, we believe there will be an explosion of new applications because the tools used to build these applications will require less domain expertise and will be easier for developers to use.

Machine learning

Machine learning is a field of research in artificial intelligence (" AI ") that allows computers to automatically learn and improve from experience without having to be explicitly programmed. It uses algorithms and statistical models to analyze and identify patterns in data, and then makes predictions or decisions based on those patterns. The ultimate goal of machine learning is to develop intelligent systems that can learn adaptively, without human intervention, and solve complex problems in everything from health care to finance and transportation.

Recently, you may have seen the advance of large language models, such as chatGPT and Bard, as well as text-to-image models, such as DALL-E 2, Midjourney, or Stable Diffusion. As these models get better and are able to perform a wider range of tasks, it becomes important to know which model performs these operations, or whether the operations are performed by humans. In the next section, we will explore this idea.

ZKML's motivation and current efforts

We live in a world,AI/ML generated content is becoming increasingly difficult to distinguish from human generated content. Zero-knowledge cryptography will enable us to make statements such as: "Given a piece of content C, it was generated by model M applied to some input X." We will be able to verify that an output is generated by a large language model (such as chatGPT) or a text-to-image model (such as DALL-E 2) or any other model for which we have created a zero-knowledge circuit representation. The zero-knowledge property of these proofs will allow us to also hide parts of the input or model as needed. A good example is the application of machine learning models to sensitive data, where users can know how their data will turn out after model reasoning without disclosing the input to a third party (for example, in the medical industry).

Note: When we talk about ZKML, we are referring to a zero-knowledge proof of creating ML model inference steps, not about ML model training (which itself is already very computationally intensive). Currently, state of the art zero-knowledge systems plus high performance hardware are still several orders of magnitude away from proving large models such as the currently available large language models (LLMs), but some progress has been made in creating proofs of smaller models.

We did some research on the state of the art in zero-knowledge cryptography in the context of creating proofs for ML models, and created a collection of articles that aggregate related research, articles, applications, and code bases. zkml's resources can be found in the ZKML community's awesomene-Zkml repository on GitHub.

Modulus Labs team recently published a paper entitled "The Cost of Intelligence" which benchmarked existing ZK proof systems with multiple models of varying sizes. Currently, a proof system like plonky2, running for 50 seconds or so on a powerful AWS machine, can create proofs for models with about 18 million parameters. Here's a chart from the paper:

Another initiative aimed at improving the technical level of ZKML systems is Zkonduit's ezkl library, which allows you to create ZK proofs against ML models exported using ONNX. This enables any ML engineer to create ZK proofs for their model's reasoning steps and prove the output to any properly implemented validator.

Several teams are working on improving the ZK technology, creating optimized hardware for the operations that occur within ZK proofs, and building optimized implementations of these protocols for specific use cases. As the technology matures, larger models will be ZK proven on less powerful machines for short periods of time. We hope that these advances will lead to new ZKML applications and use cases.

Potential use cases

To determine whether ZKML is suitable for a particular application, we can consider how the characteristics of ZK cryptography will solve problems related to machine learning. This can be illustrated by a Venn diagram:

Definition:

1.Heuristic optimizationA problem solving approach that uses rules of thumb or "heuristics" to find good solutions to difficult problems, rather than using traditional optimization methods. Heuristic optimization methods aim to find good or "good enough" solutions in a reasonable amount of time, given the relative importance and difficulty of optimization, rather than trying to find the optimal solution.

2.FHE ML- Fully homomorphic encryption ML allows developers to train and evaluate models in a privacy-preserving manner; However, unlike the ZK proof, there is no way to prove the correctness of the calculation performed by cryptographic means.

Teams like Zama.ai are working in this area.

3.ZK vs Validity-- These terms are often used interchangeably in the industry because proof of validity is a ZK proof that does not hide some part of a calculation or its result. In the context of ZKML, most current applications take advantage of the proof of validity aspect of ZK proof.

4.Validity ML-- ZK proof ML model in which no calculations or results are kept secret. They prove the correctness of the calculation.

Here are some examples of potential ZKML use cases:

1. Computational Integrity (validity ML)

Modulus Labs

Based on the verifiable ML trading bot on the chain - RockyBot

Self-improving Visual blockchain (example) :

Enhanced intelligence features of the Lyra Financial Options Agreement AMM

Create transparent AI-based Reputation System for Astraly (ZK oracle)

Use ML for Aztec Protocol (zk-rollup with privacy features) to work on the technological breakthroughs needed for contract-level compliance tools.

2. Machine learning as a Service (MLaaS) transparency;

3.ZK anomaly/fraud detection:

This application scenario makes it possible to create ZK proofs against exploitability/fraud. The exception detection model can be trained on smart contract data and agreed by DAOs as an interesting metric to automate security procedures such as more proactive and preventive suspension of contracts. There are already start-ups working on ways to use ML models for security purposes in smart contract environments, so ZK anomaly detection proving seems like a natural next step.

4. Universal validity proof of ML reasoning: can easily prove and verify that the output is the product of a given model and input pair.

5. Privacy (ZKML)

6. Decentralized Kaggle: Prove that the accuracy of the model on some test data is greater than x%, without showing the weight.

7. Privacy-protected reasoning: Input medical diagnoses of private patient data into the model and send sensitive reasoning (e.g., cancer test results) to the patient.

8.Worldcoin:

Upgradability of IrisCode: World ID users will be able to self-store their biometrics in encrypted storage on their mobile devices, download the ML model used to generate IrisCode and create a zero-knowledge proof locally to prove that their IrisCode has been successfully created. This IrisCode can be inserted without permission by one of the registered Worldcoin users because the received smart contract can verify the proof of zero knowledge and thus the creation of IrisCode. This means that if Worldcoin upgrades the machine learning model in the future to create IrisCode in a way that breaks compatibility with its previous version, the user won't have to go to Orb again, but can create this zero-knowledge proof locally on the device.

Orb security: Orb currently implements several fraud and tamper detection mechanisms in its trusted environment. However, we can create a zero-knowledge proof that these mechanisms were active at the time the image was taken and IrisCode was generated, in order to provide better living guarantees for the Worldcoin protocol, since we can be absolutely sure that these mechanisms will be running throughout the entire IrisCode generation process.

In short, ZKML technology has a wide range of applications and is developing rapidly. As more teams and individuals join the field, we believe that ZKML's application scenarios will become more diverse and widespread.