Notebook

Prerequisites¶

Homomorphic encryption pdf
Modular arithmetic
Lattices

Theory¶

Homomorphic encryption -- Definition
Let (KeyGen, Encrypt, Decrypt, Evaluate) be a tuple of procedures $(KeyGen, E, D, V)$
Let $f \in \mathcal F$ be a function in a family of functions.

$(sk, pk) \gets KeyGen(1^\lambda, 1^d)$ - key generation, $\lambda$ is a security parameter, $d$ is a functionality parameter (degree of the polynomial / multiplicative depth allowed).
$c_i \gets E(pk, m_i)$ - Encryption of a message $m_i$. -- known as fresh ciphertexts.
$c^* = V(pk, f, c_1, ... c_n)$ - Evaluate the function $V$ on the ciphertexts. -- known as evaluated ciphertexts

Corectness
Correctly decrypt both fresh and evaluated ciphertexts $$D(sk, c^*) = m^* = f(m_1, ..., m_n)$$

The HE can be classified based on the type of functions $f$ that it supports.

Partially homomorphic encryption
Given $E(m_1)$, and $E(m_2)$ you can do limited operations. Ex: Only addition or multiplication.
Somewhat homomorphic encryption
Limited number of multiplications (Circuits of a maximum depth)
Given $E(m_1), ..., E(m_n)$ you can compute $E(f(m_1, ... m_n))$ where $f$ is a polynomial of a limited degree.
Fully homomorphic encryption
Unlimited multiplications and additions.

SHE over integers¶

DGHV paper - read and follow the paper for details
Public key compression for DGHV paper
FHE over integers with shorter public keys
An implementation

Symmetric version¶

Key generation
The secret key is an $\eta$-bit odd integer $p$

Encryption
Let $m \in \{0, 1\}$ be a bit. The ciphertext is an int whose residue mod $p$ which has the same bit parity as the plaintext. Randomly choose 2 integers, large random $q$ and small random $r$ $$E(m) = pq + 2r + m$$

Decryption $$D(c) = (c \bmod p) \bmod 2$$

When the noise $r$ is sufficiently smaller than the secret key $p$ this encryption scheme is somewhat homomorphic - It can be used to evaluate low degree polynomias over encrypted data. If parameters are chosen corectly ($r \approx 2^\sqrt \eta, q \approx 2^{\eta^3}$) the scheme may even be secure.

Evaluation
Given a multivariate function (circuit) that takes in ciphertexts, apply the addition and multiplication over the integers, and return the resulting integer.

Corectness $$D(E(m)) = (c \bmod p) \bmod 2 = (pq + 2r + m \bmod p) \bmod 2 = 2r + m \bmod 2 = m $$

Notice that this decryption can happen only if $2r \bmod p = 2r \iff 2r < p$

For the addition we have $$ \begin{align*} D(c_1 + c_2) &= (pq_1 + pq_2 + 2r_1 + 2r_2 + m_1 + m_2 \bmod p) \bmod 2 \\ & = (2r_1 + 2r_2 + m_1 + m_2) \bmod 2 \\ &= m_1 + m_2 \bmod 2 \end{align*}$$ with the similar constraint that $2r_1 + 2r_2 \leq p$

And for multiplication we have: $$ \begin{align*} D(c_1 \cdot c_2) &= ((pq_1 + 2r_1 + m_1)(pq_2 + 2r_2 + m_2) \bmod p) \bmod 2 \\ &= 4r_1r_2 + 2r_1m_2 + 2r_2m_1 + m_1m_2 \bmod 2 \\ &= m_1m_2 \end{align*} $$ with a similar constraint.

Remarks:

We have to be careful how we bound the noise, because it will decide how many circuits we can evaluate and decrypt correctly after. If the final noise becomes bigger than the secret key $p$ the decryption will no longer be correct.
The noise doubles when doing additions but grows exponentially when doing multiplications.

PK version¶

Turning the secret key scheme into a public key scheme is easy. The main idea is to base the PK security on the hardness of the approximate integer GCD. The public key will consist of many "encryptions of zero": $$x_i = q_i \cdot p + 2 \cdot r_i$$ where $q_i, r_i$ are chosen as above. The number of encryptions is denoted by $\tau$ and is a security parameter. So we generate a set $\{x_0, ... x_\tau\}$.Then reorder the set such that $x_0$ is the maxmimum from the set.

Then we encrypt $m$ by adding a subset-sum of $x_i$'s instead of $pq$. So we choose a random subset from $S \subseteq \{1, 2, ..., \tau\}$ and we compute: $$c = m + 2r + 2 \cdot \sum_{i\in S} x_i \bmod x_0$$

Security¶

Security reduction
The scheme security can be reduced to solving the approximate integer GCD:

Given many an integer $p$ and many $x_i = q_i + p + 2r_i$ find $p$

Intuition for the assumption: If you sample lots of numbers that are close to multiples of $p$ but not exactly multipes of $p$ they are indistinguishable from random.

Security parameters

To construct the SHE scheme the paper proposes many security parameters:

$\eta$ is the bit-length of the secret key $p$ (which is the hidden approximate-gcd of all the public-key integers),
$\gamma$ is the bit-length of the integers in the public key $x_i$,
$\tau$ is the number of integers in the public key
$\rho$ is the bit-length of the noise $r_i$ (i.e., the distance between the public key elements and the nearest multiples of the secret key)
$\rho^\prime$ secondary noise parameter used in encryption

These parameters are set under various constraints to stop different attacks and support HE.

Circuit / function homomorphic bounds
Consider a multivariate function $f$. Denote $|\vec f|$ its $l_1$ norm of the coefficient vector of $f$. Denote $d = \deg(f)$. $f$ can be evaluated if the following condition is satisfied: $$d \leq \dfrac {\eta - 4 - \log(|\vec f|)} {\rho^\prime + 2}$$

Proof:

"fresh" ciphertexts have nosie at most $2^{\rho^\prime + 2}$. A poly with degree $d$ will raise this noise to $(2^{\rho^\prime + 2})^d$
ciphertext output after Evaluate has noise at most $2^{\eta - 4} < \frac p 8$. Although $2^{\eta - 2} < \frac p 2$ is sufficient, later the former bound will be used to allow decryption with a shallow circuit (function)
So we want the following condition: $|\vec f| \cdot (2^{\rho^\prime + 2})^d \leq 2^{\eta - 4}$. Applying the logarithm we get to the inequality above.

Making the scheme more efficient¶

There are a few problems with our scheme

impractically big public keys
ciphertext size doubling with each multiplication

The following papers attempt to make the scheme more efficient:

Batching¶

Batch FHE over the integers

Making the scheme FHE¶

In this transformation, we add to the public key some extra information about the secret key, and use this extra information to “post process” the ciphertext. The post-processed ciphertext can be decrypted more efficiently than the original ciphertext, thus making the scheme bootstrappable.

We pay for this saving by having a larger ciphertext, and also by introducing another hardness assumption (basically assuming that the extra information in the public key does not help an attacker break the scheme).

More details can be found in the paper

Code¶

In [116]:

import math
import random

import numpy as np
from Crypto.Util.number import getPrime, getRandomInteger, getRandomNBitInteger
from tqdm.notebook import tqdm

In [91]:

# Utils
def randodd(n: int) -> int:
    """Generate n bit long odd number"""
    return 2 * (2 ** (n - 2) + random.randint(0, 2 ** (n - 2)) - 1) + 1


def quot_near(a: int, b: int) -> int:
    """nearest integer to a / b"""
    return (2 * a + b) // (2 * b)


def mod_near(a: int, b: int) -> int:
    """r = a mod b with r in [-b/2, b/2]"""
    return a - b * quot_near(a, b)

SHE Scheme¶

In [92]:

def generate_x(p: int, gamma: int, rho: int) -> int:
    # 2**gamma // p - 1 in bits is gamma - eta becuase p is eta bits long
    q = randodd(gamma - eta)
    # q = random.randint(0, 2**gamma // p - 1)
    r = random.randint(-(2**rho) + 1, 2**rho - 1)
    x = p * q + r
    return x


def generate_pk(p: int, gamma: int, rho: int, tau: int, trials=100) -> list[int]:
    while True:
        xs = [generate_x(p, gamma, rho) for _ in range(tau)]
        xs.sort(reverse=True)
        mod_x0 = mod_near(xs[0], p)
        if xs[0] % 2 == 1 and mod_x0 % 2 == 0:
            break
    return xs

In [93]:

def encrypt_sk(m: int, p: int, gamma: int, eta: int, rho: int) -> int:
    q = random.randint(1, 2 ** (gamma - eta - 1))
    r = random.randint(-(2**rho) + 1, 2**rho - 1)
    c = p * q + 2 * r + m
    return c


def encrypt_pk(m: int, xs: list[int], rho_: int) -> int:
    r = random.randint(-(2**rho_) + 1, 2**rho_ - 1)
    x0 = xs[0]

    # Select a random subset
    selected = [random.choice([True, False]) for _ in range(len(xs) - 1)]

    # Compute the sum of the selected subset
    sum_selected = 0
    for xi, t in zip(xs, selected):
        if t:
            sum_selected = (sum_selected + xi) % x0

    c = (m + 2 * r + 2 * sum_selected) % x0

    return c


def decrypt(c: int, p: int) -> int:
    # return (c % p) % 2
    return mod_near(c, p) % 2

In [235]:

lam = 42
eta = 988  # Secret key p bit length
rho = 26  # bit length of the noise
rho_ = 42  # Secondary noise parameter

gamma = 147456  # Bit length of integers in the public key
tau = 158  # number of integers in the public key

In [95]:

p = getPrime(eta)

In [112]:

m0 = 0
m1 = 1

# Generate secret key encrypions
c0 = encrypt_sk(m0, p, gamma, eta, rho)
c1 = encrypt_sk(m1, p, gamma, eta, rho)

In [113]:

# basic decryptions
assert decrypt(c0, p) == 0
assert decrypt(c1, p) == 1

# Homomorphic addition
assert decrypt(c0 + c0, p) == 0
assert decrypt(c0 + c1, p) == 1
assert decrypt(c1 + c1, p) == 0

# Homomorphic multiplication
assert decrypt(c0 * c0, p) == 0
assert decrypt(c0 * c1, p) == 0
assert decrypt(c1 * c1, p) == 1

In [114]:

# Where does it break?
n_iter = 0
c_sum = c0
while True:
    n_iter += 1
    c_sum = c_sum * c1
    if decrypt(c_sum, p) != 0:
        break

In [115]:

# Scheme didnt work after n_iter
n_iter

Out[115]:

In [107]:

(eta - 4) / (rho_ + 2)

Out[107]:

22.363636363636363

In [175]:

m0 = 0
m1 = 1

# Generate public key.
xs = generate_pk(p, gamma, rho, tau)

# Generate public key encryptions
c0 = encrypt_pk(m0, xs, rho_)
c1 = encrypt_pk(m1, xs, rho_)

In [176]:

# basic decryptions
assert decrypt(c0, p) == 0
assert decrypt(c1, p) == 1

# Homomorphic addition
assert decrypt(c0 + c0, p) == 0
assert decrypt(c0 + c1, p) == 1
assert decrypt(c1 + c1, p) == 0

# Homomorphic multiplication
assert decrypt(c0 * c0, p) == 0
assert decrypt(c0 * c1, p) == 0
assert decrypt(c1 * c1, p) == 1

In [177]:

# Where does it break?
n_iter = 0
c_sum = c0
while True:
    n_iter += 1
    c_sum = c_sum * c1
    if decrypt(c_sum, p) != 0:
        break

In [178]:

n_iter

Out[178]:

In [227]:

# Test the bound

# Consider the following poly:
# f(c0, c1) =  a0 * c0**d + a1 * c1

a0 = 10
a1 = 20
coef_norm = a0 + a1
d = 24  # Degree of poly

# Evaluate poly
c_ = a0 * c0**d + a1 * c1
m_ = (a0 * m0**d + a1 * m1) % 2

In [228]:

def degree_bound(eta: int, rho_: int, coef_norm: int):
    return (eta - 4 - math.log2(coef_norm)) / (rho_ + 2)

In [229]:

degree_bound(eta, rho_, coef_norm)

Out[229]:

22.25211612282708

In [230]:

decrypt(c_, p) == m_

Out[230]:

False