Slides for P3642R1
Carry-less product: std::clmul

Document number:
P3647
Date:
2025-05-27
Audience:
SG22
Project:
ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21
Reply-To:
Jan Schultke <janschultke@gmail.com>
Source:
github.com/Eisenwave/cpp-proposals/blob/master/src/clmul-slides.cow

This document has custom controls:

  • ,  ↓ : go to the next slide
  • ,  ↑ : go to previous slide

Carry-less product:
std::clmul
P3642R1

Jan Schultke  |  Slides for P3642R1 — Carry-less product: std::clmul  |  SG22 Telecon 2025-06-04  |  Slide 1

Introduction

Intuition: "carry-less" means we use XOR instead of plus.

Regular multiplication Carry-less multiplication
x * 0b0110 == (x << 3) * 0 + (x << 2) * 1 + (x << 1) * 1 + (x << 0) * 0 clmul(x, 0b0110) == (x << 3) * 0 ^ (x << 2) * 1 ^ (x << 1) * 1 ^ (x << 0) * 0
Jan Schultke  |  Slides for P3642R1 — Carry-less product: std::clmul  |  SG22 Telecon 2025-06-04  |  Slide 2

Motivating example

abc xxx "foobar" zzz "a"
000000001000000100000101 // quotes
000000000111111100000011 // clmul(quotes, -1u)
000000000111111000000010 // clmul(quotes, -1u) & ~quotes

This technique is used to accelerate string parsing in simdjson.

Jan Schultke  |  Slides for P3642R1 — Carry-less product: std::clmul  |  SG22 Telecon 2025-06-04  |  Slide 3

Hardware support

Operationx86_64ARMRV64
clmul u64×4 → u128×4 vpclmulqdq
clmul u64×2 → u128×2 vpclmulqdq
clmul u64 → u128 pclmulqdq pmull+pmull2 clmul+clmulh
clmul u64 → u128 pclmulqdq pmull+pmull2 clmul+clmulh
clmul u64 → u64 pmull clmul
clmul u8×8 → u16×8 pmull
clmul u8×8 → u8×8 pmul
Marked rows are integrated in this proposal.
Jan Schultke  |  Slides for P3642R1 — Carry-less product: std::clmul  |  SG22 Telecon 2025-06-04  |  Slide 4

Proposed design

template<unsigned-integer T> T clmul(T x, T y) noexcept; template<class T> struct mul_wide_result { // yoinked from P3161R4: T low_bits; // Unified integer overflow arithmetic T high_bits; }; template<unsigned-integer T> constexpr mul_wide_result<T> clmul_wide(T x, T y) noexcept;
  • clmul names used because it is most common (Intel, LLVM, RV64, etc.)
  • SIMD support could be separate paper
Jan Schultke  |  Slides for P3642R1 — Carry-less product: std::clmul  |  SG22 Telecon 2025-06-04  |  Slide 5

Implementation and wording

Implementation

Wording

Jan Schultke  |  Slides for P3642R1 — Carry-less product: std::clmul  |  SG22 Telecon 2025-06-04  |  Slide 6
k thx bye („• ֊ •„)
Jan Schultke  |  Slides for P3642R1 — Carry-less product: std::clmul  |  SG22 Telecon 2025-06-04  |  Slide 7