Skip to main content

ETHGlobal New Delhi: Advancing Client-Side Privacy

· 6 min read
Moven Tsai
Developer on the Mopro Team

At ETHGlobal New Delhi this September, the Mopro and zkPDF teams sponsored two tracks focused on bringing general privacy to Ethereum. The hackathon delivered impressive projects that pushed boundaries in both infrastructure and application development.

Several submissions exceeded expectations with standout UX features. Deeplink integration enables seamless transitions between mobile apps and browsers, allowing native mobile proving across existing browser applications that require ZK (like age verification for websites). NFC integration demonstrated tap-to-prove and tap-to-verify capabilities, creating an experience as intuitive as Apple Pay. These implementations show the maturity of client-side ZK proving and its readiness for real-world adoption.

ZeroSurf demo: Privacy-preserving age verification with deeplink integration

Infrastructure Track: Client-Side Privacy

🏆 Grand Prize: AccessFI

View Project | GitHub

AccessFI reimagines event payments with NFC-powered privacy. Users receive P-256 compatible SECP256k1 NFC cards linked to their wallets, enabling instant tap-to-pay for tickets, registration, food, and merchandise while preserving privacy through deterministic encryption.

The system eliminates payment friction with 5-second NFC transactions that work on any EVM chain. Privacy is maintained through ZK proofs that verify user eligibility without exposing personal data. A single card handles all event interactions: tap-to-buy tickets, tap-to-register, tap-to-pay for concessions.

AccessFI Flow

Application Track: General Privacy

🥇 First Prize: zkETHer

View Project | GitHub

zkETHer implements a privacy-preserving protocol for ERC20 tokens, functioning as a non-custodial mixer. Users deposit fixed amounts by submitting cryptographic commitments to an on-chain Merkle tree, then withdraw to new addresses using ZK proofs generated on their mobile devices.

The protocol uses X25519 (ECDH) for key exchange, HKDF-SHA256 for deriving secrets, and Poseidon2 hash for commitments. Mopro enables computationally intensive ZK proofs to be generated directly on phones, making privacy accessible without specialized hardware.

The circuit implementation is robust, though real-world feasibility needs improvement for production deployment. The architecture demonstrates how mobile-first proving can bring mixer-style privacy to standard ERC20 tokens.

zkETHer demo: Privacy-preserving ERC20 mixer with mobile ZK proving

🥈 Second Prize: Wisk

View Project | GitHub

Wisk rethinks background verification for the digital age with zkPDF. Instead of sharing documents with third parties, users prove specific claims about government-issued certificates without revealing the full content.

The system integrates with India's DigiLocker to verify official PDFs. Using zkPDF, Wisk validates the government's digital signature embedded in PDFs and generates ZKPs for requested fields (name, PAN number, credentials). The entire process happens in the browser—raw documents never leave the user's device.

🥉 Third Prize: ZeroSurf

View Project | GitHub

ZeroSurf is a mobile browser with built-in ZK age verification using Anon Aadhaar. The smooth deeplink integration allows users to prove age requirements without revealing birth dates, enabling privacy-preserving access to age-restricted content.

The implementation showcases how deeplinks can bridge mobile browsers and ZK proving apps, creating frictionless user experiences for privacy-preserving authentication.

Key Takeaways & Future Explorations

The hackathon revealed several promising directions for client-side privacy:

UX innovations like NFC and deeplinks proved that privacy-preserving technology can match the convenience of traditional systems. These features should be modularized within Mopro to improve developer experience. We'll invite teams that built these integrations to contribute reusable components.

Photo-identity integrity emerged as a recurring theme across multiple projects, adding security layers to identity verification. Integrating solutions like Rarimo's Bionetta and zkCamera with mobile-native proving through Mopro could strengthen this approach.

We're excited to see more UX innovations emerge in future hackathons. Whether it's tap-to-prove bringing native mobile experiences, smooth deeplink transitions between apps and browsers, or entirely new interaction patterns—the goal is providing developers with easy-to-use building blocks. By modularizing these patterns in Mopro, we can transform what took teams days to build during the hackathon into features that take minutes to integrate.

Beyond these UX enhancements, there are also two fundamental challenges worth exploring:

zkTLS and Mobile Proving

zkTLS enables portable, verifiable data from any HTTPS connection without server cooperation. Using multi-party computation (MPC), zkTLS allows users to prove statements about web data—like account balances, transaction histories, or credentials—without revealing the underlying information or requiring platform APIs.

TLSNotary leads the MPC-based approach, using garbled circuits to split TLS session keys between users and notaries, ensuring neither party can forge proofs alone. This creates portable proofs of web data while preserving privacy.

Mobile integration remains an open challenge. While TLSNotary works well on desktop, coordinating MPC between mobile apps and browsers presents unique technical hurdles. Solving this would unlock powerful use cases: proving income from banking apps, verifying social media reputation, or demonstrating transaction history—all without sharing credentials or raw data.

Unified ZK Registry System

The ZK identity landscape is fragmented. Projects like Anon Aadhaar, passport-based zkID solutions, and zkPDF each maintain separate on-chain registries. Users face redundant verifications, and developers must integrate with each system independently.

ERC-7812 proposes a solution: a singleton on-chain registry using Sparse Merkle Trees to store commitments to private data. Statements can be verified via ZK proofs without revealing underlying information.

With unified client libraries built around ERC-7812 and integrated with Mopro, developers would call one API after generating proofs on-device, regardless of proof type. The real power emerges in cross-application identity: a user proves their age once with Anon Aadhaar in one Mopro app, committing to ERC-7812. Later, a Mopro app using different schemes verifies that commitment without re-proving. The unified registry enables seamless credential reuse across the mobile applications while preserving privacy.

Announcing the ETHGlobal Cannes 2025 Mopro Track Winners: Advancing Mobile Proving

· 6 min read
Moven Tsai
Developer on the Mopro Team

ETHGlobal Cannes brought together some of the brightest minds in blockchain development, and we're thrilled to announce the winners of the Mopro track. Over 36 intense hours, developers pushed the boundaries of client-side proving with real-world applications on mobile device.

With innovative submissions ranging from protocol-level integrations to consumer-facing applications, Mopro track showed how mobile ZK proving is becoming increasingly practical. Teams leveraged Mopro's mobile proving capabilities to build everything from tamper-proof media verification to post-quantum secure smart accounts with EIP-7702, proving that privacy-preserving technology can be both powerful and user-friendly.

The Power of Client-Side Proving

Before diving into the winners, it's worth highlighting what made this hackathon special. While many ZK applications rely on server-side proving or trusted execution environments (TEEs), Mopro enables developers to generate zero-knowledge proofs directly on mobile devices. This approach puts privacy control back in users' hands and opens up new possibilities for decentralized applications.

The submissions demonstrated two key trends: protocol developers integrating zkVM such as RISC-0 and libraries like Gnark with Mopro, and application developers building user-facing products that leverage existing ZK circuits. Both approaches are critical for the ecosystem's growth.

The Winners

🥇 First Place: Mobiscale - Photo-ID Verification with Apple's Secure Enclave

Project: Mobiscale
GitHub: ElusAegis/MobiScale

Mobiscale achieved what many thought impossible in a hackathon setting: a complete end-to-end proof of photo-ID verification running entirely on a mobile device in under 90 seconds. The team cleverly combined Apple's Secure Enclave for facial recognition, RISC-0 for TEE attestation verification, and Noir with the Barretenberg proving backend for ECDSA signature validation.

What makes this project remarkable is its practical approach to identity verification. By computing cosine similarity between a passport photo and a live selfie within the TEE, then proving this computation happened correctly using ZK proofs, Mobiscale demonstrates a privacy-preserving liveness check that could be deployed in the near future. The integration with Mopro shows how mobile proving can complement hardware security features to create robust identity solutions.

🥈 Second Place: Zkipper - PQ-Secure EIP-7702 Smart Accounts

Project: Zkipper
GitHub: ZKNoxHQ/ZKipper

The ZKNox team showcased their cryptographic expertise at the hackathon with "Zkipper," a project that turns the ARX chips in ETHGlobal wristbands into transaction signers, enabling post‑quantum–secure smart accounts via EIP‑7702.

The technical achievement here is twofold: First, they successfully integrated Gnark with Mopro within the 36-hour timeframe, which is a significant contribution to the ecosystem. Second, they implemented Falcon512 post-quantum signatures to secure smart accounts, preventing "Bybit-style" attacks by separating admin commands onto distinct devices. This approach shows how Mopro can enable hardware-based security solutions that are both quantum-resistant and user-friendly.

🥉 Third Place: 👀Proov! - Tamper-Proof Media with ZK

Project: 👀Proov!
GitHub: undefinedlab/PROOV_ZK

👀Proov! stood out not just for its technical implementation but for its exceptional UI/UX design. The team created a complete solution for tamper-proof media capsules, combining Mopro-generated proofs with AI-powered image summaries, decentralized storage on Walrus, and verification on Flow blockchain.

The project offers one-tap proof generation with instant cryptographic capsule creation, selective privacy controls, and future-disclosure capabilities. By embedding tamper-proof QR codes in image capsules, they made every photo machine-verifiable and portable across platforms like Instagram, X, and Telegram. This shows how ZKP can be packaged into consumer-friendly applications without sacrificing security or privacy.

Other Notable Submissions

Beyond the top three winners, the Mopro track at ETHGlobal Cannes attracted submissions that collectively show the diversity of ZK use cases on mobile. Teams explored verification and attestation use cases through projects like ProofOfParticipation (GPS-based event attendance), ZKAge Proof Mobile (age verification for restricted services). Bidet's privacy-preserving NFC tag game showcased gaming and social use cases, and ProofOfFunds showed financial privacy by letting users prove they meet a cryptocurrency balance threshold without disclosing exact amounts or wallet addresses.

These submissions integrated various technical stacks across different proving frameworks—Circom and Noir—and mobile platforms—Swift (iOS), Kotlin (Android), React Native, and Flutter—alongside Mopro’s proving capabilities. This integration validates the platform’s flexibility across use cases and tech stacks. The diversity of both applications and technical aspect highlights the ecosystem’s readiness for real-world deployment and shows how mobile proving can address privacy challenges across multiple industries.

Key Insights and Developer Feedback

The hackathon provided valuable insights into the state of mobile ZK proving:

What's Working Well:

  • Mopro's developer experience received consistently positive feedback
  • Teams successfully integrated various proving systems (RISC-0, Noir, Gnark) with Mopro
  • The ecosystem is mature enough for developers to build meaningful applications in 36 hours

Challenges Identified:

  • Writing circuits remains the biggest pain point for developers
  • On-chain verification varies in stability (Circom is more mature, Noir is catching up)
  • Developers need single-architecture templates for iOS-only or Android-only builds

Looking Forward: The Future of Client-Side Proving

The diversity of submissions—from geo-proof games adapted from zkVerify's proof-of-geolocation circuits to age verification systems—shows that mobile ZK proving is ready for mainstream adoption. While Mopro may not pursue the same level of direct adoption as protocols aimed at end users, it serves a critical role as an incubation platform for client-side ZK applications especially on mobile phone.

Based on developer feedback, we're prioritizing several improvements:

  • Enhanced Templates - Expanding variety for different use cases (Issue #503, Issue #438)
  • Single Architecture Support - iOS-only and Android-only bindings for cross-platform frameworks like Flutter (Issue #502) and React Native (Issue #501).
  • Improved DevEx - Better naming for custom bindings (Issue #500)
  • Documentation - Simplified architecture overview (Issue #498)

Acknowledgments

We'd like to thank all teams who participated in the Mopro track at ETHGlobal Cannes. Your innovation, dedication, and feedback are invaluable in advancing the state of mobile proving.

Special recognition goes to the ETHGlobal team for organizing an exceptional event and providing the infrastructure that makes these innovations possible.

Get Involved

The work showcased at ETHGlobal Cannes is just the beginning. If you're interested in building with Mopro or contributing to the ecosystem:

Metal MSM v2: Exploring MSM Acceleration on Apple GPUs

· 12 min read
Moven Tsai
Developer on the Mopro Team

Key Takeaways

  • Hybrid CPU-GPU approaches are essential for fully exploiting limited hardware such as mobile devices, improving MSM computation and accelerating proof generation.
  • To unify GPU acceleration efforts, WebGPU's vendor-agnostic API and WGSL offer promising solutions that compile to native formats like SPIR-V (for Vulkan on Android) and MSL (for Metal on Apple devices).
  • GPU acceleration for post-quantum proving systems could enable their widespread adoption.

Introduction

GPU acceleration harnesses the massive parallelism of Graphics Processing Units (GPUs) to dramatically speed up tasks that would otherwise overwhelm traditional CPUs. Because GPUs can execute thousands of threads simultaneously, they have become indispensable for compute-intensive workloads such as machine-learning model training and modern cryptographic algorithms.

This technology plays a crucial role in advancing privacy-preserving applications, as zero-knowledge proofs (ZKPs) currently face a significant bottleneck due to the high computational cost of their core operations. By accelerating these operations, we can generate proofs more quickly and cost-effectively, which is essential for the broader adoption of privacy-focused solutions across Ethereum and other blockchain platforms.

Currently, research on GPU acceleration for cryptography remains fragmented, with each platform relying on its own framework: Metal on Apple devices, Vulkan on Android, and CUDA on NVIDIA hardware. Aside from CUDA, most GPU frameworks lack mature ecosystems of cryptographic libraries (e.g., NVIDIA's cuPQC and cuFFT).

Therefore, Mopro is investing in GPU acceleration through related grants (Issue #21, Issue #22, and Issue #153), as it advances our mission to make mobile proving both accessible and practical.

A Primer on Multi-Scalar Multiplication

Multi-Scalar Multiplication (MSM) is an essential primitive in elliptic curve cryptography, particularly in pairing-based proving systems widely used for privacy-preserving applications. MSM involves computing a sum of the form Q=i=1n(kiPi)Q = \sum_{i=1}^{n}(k_i \cdot P_i), where kik_i are scalars and PiP_i are points on an elliptic curve, such as BN254. This computationally intensive operation is ideal for GPU acceleration.

Metal MSM v2 is an open-source implementation, licensed under MIT and Apache 2.0, that optimizes MSM on Apple GPUs using the Metal Shading Language (MSL). Building on its predecessor, Metal MSM v2 offers significant performance improvements through algorithmic and GPU-specific optimizations, laying the foundation for further research into mobile proving acceleration with GPUs.

Recap on Metal MSM v1

The first version of Metal MSM (v1) was an initial attempt to bring MSM computations on the BN254 curve to Apple GPUs, leveraging parallelism and optimizations like precomputation from the EdMSM paper by Bootle et al.1. While it showed the potential for GPU acceleration, profiling result revealed key limitations:

  • Low GPU Occupancy: At only 32%, the GPU was underutilized, leading to inefficient computation.
  • High Memory Footprint: Peak VRAM usage was excessive, causing GPU hang errors on real mobile devices for instance sizes ≥ 2^14.
  • Performance Bottlenecks: For an input size of 2^20 points, v1 took 41 seconds on an M1 Pro MacBook, indicating substantial room for improvement.

These challenges drove the development of a newer version, which introduces targeted optimizations to address these issues. For full context, refer to the detailed Metal MSM v1 Summary Report and Metal MSM v1 Profiling Report.

Metal MSM v2: What's New

Metal MSM v2 introduces key enhancements over v1, significantly improving performance and resource efficiency. It adopts the sparse matrix approach from the cuZK paper by Lu et al.2, treating MSM elements as sparse matrices to reduce memory usage and convert operations described in Pippenger's algorithm into efficient sparse matrix algorithms.

The implementation draws inspiration from Derei and Koh's WebGPU MSM implementation for ZPrize 2023. However, targeting the BN254 curve (unlike the BLS12-377 curve used by Zprize 2023) required different optimization strategies, particularly for Montgomery multiplications and for using Jacobian coordinates instead of projective or extended twisted Edwards coordinates.

Due to differences between shading languages (CUDA for cuZK, WGSL for WebGPU, and MSL for Metal), additional GPU programming efforts were necessary. For instance, dynamic kernel dispatching, which is straightforward in CUDA, required workarounds in Metal through host-side dispatching at runtime.

Key improvements include:

  • Dynamic Workgroup Sizing: Workgroup sizes are adjusted based on input size and GPU architecture using a scale_factor and thread_execution_width. These parameters were optimized through experimentation to maximize GPU utilization as mentioned in PR #86.
  • Dynamic Window Sizes: A window_size_optimizer calculates optimal window sizes using a cost function from the cuZK paper, with empirical adjustments for real devices, as detailed in PR #87.
  • MSL-Level Optimizations: Loop unrolling and explicit access qualifiers, implemented in PR #88, enhance kernel efficiency, with potential for further gains via SIMD refactoring.

Benchmarks on an M3 MacBook Air with 24GB memory show 40x–100x improvements over v1 and ~10x improvement over ICME Labs' WebGPU MSM on BN254, adapted from Derei and Koh's BLS12-377 work. While still slower than CPU-only Arkworks MSM on small & medium input sizes, v2 lays the groundwork for a future CPU+GPU hybrid approach.

How Metal MSM v2 Works

The general flow follows Koh's technical writeup. We pack affine points and scalars on the CPU into a locality-optimized byte format, upload them to the GPU, and encode points into Montgomery form for faster modular multiplications. Scalars are split into signed chunks to enable the Non-Adjacent Form (NAF) method, halving both bucket count and memory during accumulation.

Next, we apply a parallel sparse-matrix transposition (adapted from Wang et al.'s work3) to identify matching scalar chunks and group points into buckets. Then, using a sparse-matrix–vector product (SMVP) and the pBucketPointsReduction algorithm (Algorithm 4 in the cuZK paper2), we split buckets among GPU threads, compute each thread's running sum, and scale it by the required factor.

After GPU processing, we transfer each thread's partial sums back to the CPU for final aggregation. Since the remaining point count is small and Horner's Method is sequential and efficient on the CPU, we perform the final sum there.

The use of sparse matrices is a key innovation in Metal MSM v2, reducing memory requirements and boosting parallelism compared to previous approaches.

Understanding the Theoretical Acceleration Upper Bound

In the Groth16 proving system, Number Theoretic Transform (NTT) and MSM account for 70–80% of the proving time. According to Amdahl's Law, the maximum speedup is limited by unoptimized components:

Speedupoverall=1(1timeoptimized)+timeoptimizedspeedupoptimized\text{Speedup}_{\text{overall}} = \dfrac{1}{(1 - \text{time}_{\text{optimized}}) + \dfrac{\text{time}_{\text{optimized}}}{\text{speedup}_{\text{optimized}}}}

If 80% of the prover time is optimized with infinite speedup, the theoretical maximum is 5x. However, data I/O overhead reduces practical gains. For more details, see Ingonyama's blog post on Hardware Acceleration for ZKP.

Benchmark Results

Benchmarks conducted on an M3 MacBook Air compare Metal MSM v2 with the Arkworks v0.4.x CPU implementation across various input sizes.

Metal MSM v2 Benchmark Results

SchemeInput Size (ms)
212214216218220222224
Arkworks v0.4.x
(CPU, Baseline)
619692459423,31914,061
Metal MSM v0.1.0
(GPU)
143
(-23.8x)
273
(-14.4x)
1,730
(-25.1x)
10,277
(-41.9x)
41,019
(-43.5x)
555,877
(-167.5x)
N/A
Metal MSM v0.2.0
(GPU)
134
(-22.3x)
124
(-6.5x)
253
(-3.7x)
678
(-2.8x)
1,702
(-1.8x)
5,390
(-1.6x)
22,241
(-1.6x)
ICME WebGPU MSM
(GPU)
N/AN/A2,719
(-39.4x)
5,418
(-22.1x)
17,475
(-18.6x)
N/AN/A
ICICLE-Metal v3.8.0
(GPU)
59
(-9.8x)
54
(-2.8x)
89
(-1.3x)
149
(+1.6x)
421
(+2.2x)
1,288
(+2.6x)
4,945
(+2.8x)

ElusAegis' Metal MSM
(GPU)

58
(-9.7x)
69
(-3.6x)
100
(-1.4x)
207
(+1.2x)
646
(+1.5x)
2,457
(+1.4x)
11,353
(+1.2x)

ElusAegis' Metal MSM
(CPU+GPU)

13
(-2.2x)
19
(-1.0x)
53
(+1.3x)
126
(+1.9x)
436
(+2.2x)
1,636
(+2.0x)
9,199
(+1.5x)

Negative values indicate slower performance relative to the CPU baseline. The performance gap narrows for larger inputs.

Notes:

  • For ICME WebGPU MSM, input size 2^12 causes M3 chip machines to crash; sizes not listed on the project's GitHub page are shown as "N/A"
  • For Metal MSM v0.1.0, the 2^24 benchmark was abandoned due to excessive runtime.

While Metal MSM v2 isn't faster than CPUs across all hardware configurations, its open-source nature, competitive performance relative to other GPU implementations, and ongoing improvements position it well for continued advancement.

Profiling Insights

Profiling on an M1 Pro MacBook provides detailed insights into the improvements from v1 to v2:

metricv1v2gain
end-to-end latency10.3 s0.42 s24x
GPU occupancy32 %76 %+44 pp
CPU share19 %<3 %–16 pp
peak VRAM1.6 GB220 MB–7.3×

These metrics highlight the effectiveness of v2's optimizations:

  • Latency Reduction: A 24-fold decrease in computation time for 2^20 inputs.
  • Improved GPU Utilization: Occupancy increased from 32% to 76%, indicating better use of GPU resources.
  • Reduced CPU Dependency: CPU share dropped below 3%, allowing the GPU to handle most of the workload.
  • Lower Memory Footprint: Peak VRAM usage decreased from 1.6 GB to 220 MB, a 7.3-fold reduction.

Profiling also identified buffer reading throughput as a primary bottleneck in v1, which v2 mitigates through better workload distribution and sparse matrix techniques. See detailed profiling reports: v1 Profiling Report and v2 Profiling Report.

Comparison to Other Implementations

Metal MSM v2 is tailored for Apple's Metal API, setting it apart from other GPU-accelerated MSM implementations:

  • Derei and Koh's WebGPU MSM on BLS12: Designed for WebGPU, this implementation targets browser-based environments and may not fully leverage Apple-specific hardware optimizations.
  • ICME labs WebGPU MSM on BN254: Adapted from Derei and Koh's WebGPU work for the BN254 curve, it is ~10x slower than Metal MSM v2 for inputs from 2^16 to 2^20 on M3 MacBook Air.
  • cuZK: A CUDA-based implementation for NVIDIA GPUs, operating on a different hardware ecosystem and using different algorithmic approaches.

Metal MSM v2's use of sparse matrices and dynamic workgroup sizing provides advantages on Apple hardware, particularly for large input sizes. While direct benchmark comparisons are limited, internal reports suggest that v2 achieves performance on par with or better than other WebGPU/Metal MSM implementations at medium scales.

It's worth noting that the state-of-the-art Metal MSM implementation is Ingonyama's ICICLE-Metal (since ICICLE v3.6). Readers can try it by following:

Another highlight is ElusAegis' Metal MSM implementation for BN254, which was forked from version 1 of Metal MSM. To the best of our knowledge, his pure GPU implementation further improves the allocation and algorithmic structure to add more parallelism, resulting in 2x faster performance compared to Metal MSM v2.

Moreover, by integrating this GPU implementation with optimized MSM on the CPU side from the halo2curves library, he developed a hybrid approach that splits MSM tasks between CPU and GPU and then aggregates the results. This strategy achieves an additional 30–40% speedup over a CPU-only implementation. This represents an encouraging result for GPU acceleration in pairing-based ZK systems and suggests a promising direction for Metal MSM v3.

Future Work

The Metal MSM team has outlined several exciting directions for future development:

  • SIMD Refactoring: Enhance SIMD utilization and memory coalescing to further boost performance.
  • Advanced Hybrid Approach: Integrate with Arkworks 0.5 for a more sophisticated CPU-GPU hybrid strategy.
  • Android Support: Port kernels to Vulkan compute/WebGPU on Android, targeting Qualcomm Adreno (e.g., Adreno 7xx series) and ARM Mali (e.g., G77/G78/G710) GPUs.
  • Cross-Platform Support: Explore WebGPU compatibility to enable broader platform support.
  • Dependency Updates: Transition to newer versions of objc2 and objc2-metal, and Metal 4 to leverage the latest MTLTensor features, enabling multi-dimensional data to be passed to the GPU.

Beyond these technical improvements, we are also interested in:

  • Exploration of PQ proving schemes: With the limited acceleration achievable from pairing-based proving schemes, we're motivated to explore PQ-safe proving schemes that have strong adoption potential over the next 3–5 years. These schemes, such as lattice-based proofs, involve extensive linear algebra operations that can benefit from GPUs' parallel computing capabilities.
  • Crypto Math Library for GPU: Develop comprehensive libraries for cryptographic computations across multiple GPU frameworks, including Metal, Vulkan, and WebGPU, to expand the project's overall scope and impact.

Conclusion

Metal MSM v2 represents a leap forward in accelerating Multi-Scalar Multiplication on Apple GPUs. By addressing the limitations of v1 through sparse matrix techniques, dynamic thread management, and other novel optimization techniques, it achieves substantial performance gains for Apple M-series chips and iPhones.

However, two challenges remain:

  • First, GPUs excel primarily with large input sizes (typically around 2^26 or larger). Most mobile proving scenarios use smaller circuit sizes, generally ranging from 2^16 to 2^20, which limits the GPU's ability to fully leverage its parallelism. Therefore, optimizing GPU performance for these smaller workloads remains a key area for improvement.
  • Second, mobile GPUs inherently possess fewer cores and comparatively lower processing power than their desktop counterparts, constraining achievable performance. This hardware limitation necessitates further research into hybrid approaches and optimization techniques to maximize memory efficiency and power efficiency within the constraints of mobile devices.

Addressing these challenges will require ongoing algorithmic breakthroughs, hardware optimizations, and seamless CPU–GPU integration. Collectively, these efforts pave a clear path for future research and practical advancements that enable the mass adoption of privacy-preserving applications.

Get Involved

We welcome researchers and developers interested in GPU acceleration, cryptographic computations, or programmable cryptography to join our efforts:

For further inquiries or collaborations, feel free to reach out through the project's GitHub discussions or directly via our Mopro community on Telegram.

Special Thanks

We extend our sincere gratitude to Yaroslav Yashin, Artem Grigor, and Wei Jie Koh for reviewing this post and for their valuable contributions that made it all possible.

Footnotes

  1. Bootle, J., & Chiesa, A., & Hu, Y. (2022). "Gemini: elastic SNARKs for diverse environments." IACR Cryptology ePrint Archive, 2022/1400: https://eprint.iacr.org/2022/1400

  2. Lu, Y., Wang, L., Yang, P., Jiang, W., Ma, Z. (2023). "cuZK: Accelerating Zero-Knowledge Proof with A Faster Parallel Multi-Scalar Multiplication Algorithm on GPUs." IACR Cryptology ePrint Archive, 2022/1321: https://eprint.iacr.org/2022/1321 2

  3. Wang, H., Liu, W., Hou, K., Feng, W. (2016). "Parallel Transposition of Sparse Data Structures." Proceedings of the 2016 International Conference on Supercomputing (ICS '16): https://synergy.cs.vt.edu/pubs/papers/wang-transposition-ics16.pdf