Slides for P3642R1
Carry-less product: std :: clmul
- Document number:
- P3647
- Date:
2025-05-27 - Audience:
- SG22
- Project:
- ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21
- Reply-To:
- Jan Schultke <janschultke@gmail.com>
- Source:
- github.com/Eisenwave/cpp-proposals/blob/master/src/clmul-slides.cow
- →, ↓ : go to the next slide
- ←, ↑ : go to previous slide
Carry-less product:
std :: clmul
P3642R1
Introduction
Intuition: "carry-less" means we use XOR instead of plus.
Regular multiplication | Carry-less multiplication |
---|---|
- useful for CRC, AES-GCM, parsing, bit manipulation, …
- widespread hardware support (x86_64, ARM, RISC-V)
- a.k.a. "polynomial multiplication" and "XOR multiplication"
Motivating example
computes bitwise parity (inclusive)clmul ( x , - 1 u ) - i.e. for each bit in
,x /* is 1-bit count to right odd? */ ? 1 : 0 - can be used to check if character is inside/outside string in parallel
abc xxx" foobar " zzz" a " 000000001 000000 1 000001 0 1 // quotes 000000000 111111 1 000000 1 1 // clmul(quotes, -1u) 000000000 111111 0 000000 1 0 // clmul(quotes, -1u) & ~quotes
This technique is used to accelerate string parsing in
Hardware support
Operation | x86_64 | ARM | RV64 |
---|---|---|---|
Marked rows are integrated in this proposal.
Proposed design
names used because it is most common (Intel, LLVM, RV64, etc.)clmul - SIMD support could be separate paper
Implementation and wording
Implementation
- naive fallback implementation is trivial
- just need to wrap platform intrinsics when available
- portable support with
@llvm.clmul - could be wrapped in
__builtin_clmul
- could be wrapped in
Wording
- based on P3161R4, but easy to change
- see paper