Fully Homomorphic Encryption Part One: A Gentle Intro
16 Jun 2020Recently I have taken CS355 (Topics in Cryptography) at Stanford. This was a comprehensive course on advanced crypto topics.
Throughout the 3-month course, the instructors covered various topics that span the history of Cryptography, starting from One-way Functions, PRFs all the way to applied cryptosystems such as MPC, Zero-Knowledge, and PIR. This was really a great course to take, and I’ve surely learned a lot about modern cryptosystems.
In order to strengthen my understanding of these topics, I’ve decided to start a series of blog posts that (gently) introduces these cool crypto topics. I’ll be summarizing lecture notes and paraphrase them into my own words. Hopefully, it should be an interesting (and not obscure) read that helps you understand these topics as well.
For the first series of posts, I want to talk about Fully Homomorphic Encryption (FHE), a fairly hot topic in the security industry.
Note that I’ll try to make my explanation as simple as possible, but still make sure that you have some security/cryptography context before continuing…
What is FHE?
I shall begin the post with a brief introduction of FHE, or Fully Homomorphic Encryption.
According to Wikipedia, the definition of Homomorphic Encryption is:
A form of encryption that allows computation on ciphertexts, generating an encrypted result which, when decrypted, matches the result of the operations as if they had been performed on the plaintext.
The definition is exactly right - Homomorphic Encryption is simply the encryption scheme where one can combine encrypted ciphertexts and obtain interesting computations on the original plaintext in an encrypted form without knowing the actual encryption key.
An Overview of Encryption Schemes
If you are familiar with crypto/security materials, you might know a few common encryption schemes such as AES and RSA. Although there’s a difference between symmetric and asymmetric encryption schemes, every scheme could be generalized in the following form.
From the diagram, we can see the overall composition of an encryption scheme:
- There is a $KeyGen$ algorithm that generates the key used for both encrypting and decrypting. Notice that in a symmetric scheme both keys are the same, and in an asymmetric scheme the keys are different. (One is called Public Key and the other is called Private Key)
- The $Encryption$ algorithm applies encryption to a given plaintext using the encryption key. Then it produces the ciphertext, which is the encrypted plaintext.
- The $Decryption$ algorithm inverts the encryption done on a ciphertext using the decryption key. It restores the ciphertext back to the original plaintext.
There it is. It doesn’t matter which algorithm we use (AES or RSA or something else), the overall structure is always the same.
In Cryptography, we tend to always write about properties of a system after introducing it. Therefore, I should also talk about it for the sake of completeness.
First, an encryption needs to have Correctness. This means that applying decryption on a ciphertext that was the result of applying encryption on some plaintext $pt$, using the keys derived from $KeyGen$, will always guarantee to result in the same plaintext $pt$ as before. To capture this relation in the language of probability, we use the following expression: \(\forall pt \in PT, (k_{enc}, k_{dec}) \leftarrow KeyGen(1^\lambda):\\ Pr[Decryption(k_{dec}, Encryption(k_{enc}, pt)) = pt] = 1\) This means that the probability of applying decryption on encrypted ciphertext using the appropriate keys will always result in the exact same plaintext as before.
The second property that we want is Semantic Security. I’m not going to do the full proof here, but the basic idea is that, given two encrypted ciphertexts $ct_0, ct_1$ that corresponds to the encryption of $pt_0, pt_1$, one cannot distinguish which ciphertext is the encryption of $ct_0$. This basically means that the encrypted ciphertext looks random and doesn’t provide any hint to its original plaintext, therefore ensuring that this scheme is secure against eavesdropping.
Homomorphic Encryption - an unexpected property
If encrypted ciphertexts look just like random garbage to any viewers, does this mean that ciphertexts are totally useless unless the corresponding decryption key is present?
Both luckily and unluckily, the answer is NO.
Some encryption schemes actually exhibits an amazing property, where if you obtain the encryption of number 1 and number 2, although you have no idea what the original plaintexts are, you can simply add these two ciphertexts together, and obtain the encrypted ciphertext for $1 + 2 = 3$.
This property is called Homomorphism - where the algebraic relationship on plaintexts is also preserved in the ciphertexts. And encryption schemes that exhibits this trait are therefore referred as Homomorphic Encryption.
The example we gave above is just an instance of Homomorphic Encryption. Namely, this is an instance of Additively Homomorphic Encryption, where one can freely add ciphertexts together and obtain any linear combination of the original plaintexts, in encrypted form. Of course, there is also Multiplicative Homomorphic Encryption, the counterpart of AHE. In a MHE scheme, one can freely multiply ciphertexts together and obtain any products of the original plaintexts.
Pretty interesting right? However, if you think this concept is still a bit vague, allow me to provide an example.
Example: Anonymous Polling
We all know how polling works - for example, when a group of people want to vote on something (say whether they want to order Japanese for lunch), they could start a poll.
In the poll process, everyone will submit their choice of either going (1) or not going (0) to the host. Then the host simply adds everyone’s share together, and compares whether this final result is larger than $1/2$ of the population.
However, the problem with this scheme is that the polling process is not anonymous. This means that anyone who can see the data traffic will know who likes and dislikes Japanese food - just by looking at everyone’s shares.
We can convert this protocol into an anonymous one easily with the help of Additively Homomorphic Encryption.
First, the host will run the $KeyGen$ algorithm that generates a pair of encryption key $pk$ and decryption key $sk$. (Here, we are using an asymmetric encryption scheme.) Then the host distributes the encryption key $pk$ to the public.
Now the polling begins. Every participant will encrypt their choice (either 1 or 0) under the encryption key $pk$ using the $Encryption$ algorithm. Then they will all send their ciphertexts to the host. \(ct_i = Encryption(pk, pt_i):pt_i \in \{0,1\}\) The host adds up all the encrypted ciphertexts and forms one ciphertext $\hat{ct}$. Finally the host then runs the decryption algorithm on the ciphertext and obtain the result of the poll. \(Decryption(sk, \sum_i ct_i) = \sum_i pt_i\) This might not do the full justice to the true power of HE. But this should be a solid example where HE allows us to do some kind of Secure Delegated Computation - meaning that we can delegate some third party to compute on secret inputs we have by encrypting our inputs using HE schemes. The third party then applies whatever computation directly on the ciphertext and obtains the result ciphertext that is the encryption of the computation result. Finally we can decrypt that ciphertext and receive the result.
Levels of Homomorphic Encryption
We should be pretty comfortable with what HE is, and what kind of application it enables us to do. Now, I want to briefly go over different levels (or stages) of HE that will eventually lead us to Fully Homomorphic Encryption.
Partially Homomorphic Encryption
This is the very first stage of HE. If there exists an encryption scheme that is partially homomorphic, this means that this scheme is either additively homomorphic or multiplicatively homomorphic.
In other words, let’s say that in the Delegated Computation example, we would like the remote server to compute some functionality $F$. If the HE scheme we choose is only partially homomorphic (for example, additively homomorphic), it means that the only thing we can do to the ciphertexts is to obtain an encrypted linear combination of the plaintexts.
Therefore, if we have inputs $ct_0, ct_1$ that corresponds to $pt_0, pt_1$, this HE scheme allows us to compute $Enc(pk, c_0 \cdot pt_0 + c_1 \cdot pt_1)$ where $c_0, c_1$ are some constants. However, with this scheme it is not possible to compute the product of the two plaintexts $Enc(pk, pt_0 \cdot pt_1)$ or anything like that.
To summarize, if the HE scheme is just additively homomorphic, then the functionality $F$ that the third party can compute is limited to all the functions where their outputs can be expressed as linear combinations of all the inputs.
This is exactly the same for schemes that are only multiplicatively homomorphic - the functionality $F$ then will be limited to all functions where their outputs can be expressed as products of their inputs.
Somewhat Homomorphic Encryption
This next category brings us closer to our ideal world. If we say that an HE scheme is somewhat homomorphic, this means that the HE scheme is capable of doing both addition and multiplication to the original plaintexts but its capability is heavily limited.
A typical HE scheme example would be - let’s say that there is a somewhat homomorphic HE scheme where you can perform unlimited additions but only one level of multiplication to the ciphertext. In such scheme, given $ct_0, …, ct_2$, we can obtain something like $Enc(pk, c_0 \cdot pt_0 \cdot pt_1 + pt_2)$, but we cannot go any further to get something like $Enc(pk, c_0 \cdot pt_0 \cdot pt_1 \cdot pt_2)$.
To summarize again, in a somewhat homomorphic scheme, the functionality $F$ is limited to all functions where their outputs can be expressed as linear combinations of plaintexts and one-round products.
Leveled Fully Homomorphic Encryption
Now we are getting even closer. If there exists an HE scheme in this category, then we are able to obtain both additions and multiplications to the plaintexts freely. There wouldn’t be any limits on how we combine ciphertexts together.
However, the fact that this category is called “leveled” is because it introduces an upper complexity limite $L$ to the functionality $F$. If the functionality $F$ can be expressed in a boolean circuit $C$ such that $\mid C \mid \le L$ (the depth of the circuit is lower than the bound $L$), then it can be evaluated in the leveled FHE scheme.
A good way to understand a leveled scheme is to think about that the HE addition and multiplication operations inevitably introduces some “noise” into the ciphertext. The noise level increases as the functionality $F$’s boolean circuit goes deeper. When $F$’s complexity approaches $L$, eventually the noise blows up and destroys the obsersable state of the ciphertext and we can no longer recover the original plaintext from the ciphertext anymore.
Fully Homomorphic Encryption
Finally, we arrived at our ultimate goal and the last category - FHE. In a FHE scheme, we can do arbitrary computation to the plaintexts by manipulating the ciphertexts. There’s no complexity requirements on the functionality $F$. Also, a FHE scheme will always gate the ciphertext noise at a manageable threshold so it doesn’t blow up and destroy its observable state.
At this stage, we can truly enable Secure Delegated Computing. If we can find efficient and practical FHE schemes, then we can basically securely offload all of our computations to remote servers without compromising a single bit of data!
Before we switch gears, let’s give a formal definition of a FHE scheme. The syntax of a proper FHE scheme should have four essential algorithms:
- $KeyGen(1^\lambda) \rightarrow sk$: This is the key generation algorithm that produces the secret key. (This might vary depending on if it’s symmetric/asymmetric.)
- $Enc(sk, \mu \in {0, 1}) \rightarrow ct$: This is the bit-wise encryption algorithm that encrypts a single bit into some ciphertext.
- $Dec(sk, ct) \rightarrow \mu$: This is the decryption algorithm that recovers the bit $\mu$ from the ciphertext.
- $Eval(F, ct_1, …, ct_l) \rightarrow \hat{ct}$: This is the evaluation algorithm that combines $l$ ciphertexts together to obtain an encrypted representation of the evaluation of functionality $F$ under the original $l$ plaintexts. Here the functionality $F$ is represented by a boolean circuit on $l$ inputs.
Aside from the Correctness and Semantic Security properties that every encryption scheme needs to have, we need one more property here: Compactness. To be exact, the FHE scheme needs to satisfy: \(\forall F, sk, ct_i \leftarrow Enc(sk, \mu_i):\\ \mid Eval(F, ct_1, ..., ct_l) \mid = poly(\lambda)\) In other words, the size of the output from the evaluation algorithm needs to be independent from the complexity of the functionality $F$.
Why is this property important? In fact, without this property, I can claim that there exists a super trivial construction of a perfectly valid FHE scheme. This construction operates as the following:
- The $KeyGen, Enc$ algorithms work the same as before.
- When we want to evaluate a set of ciphertexts on functionality $F$, the $Eval$ algorithm will simply output ciphertext $\hat{ct} = (F, ct_1, …, ct_i)$. Notice that we don’t restrict the output size here, so this size can just be as large as the input plus the description of the functionality $F$.
- Finally, when we want to decrypt the result ciphertext $\hat{ct}$, the decryption algorithm $Dec$ simply reads the description of the functionality $F$, decrypts all the following ciphertexts one-by-one, and then apply $F$ on the decrypted plaintexts.
Now you should get the idea - if we don’t restrict the output size of the $Eval$ algorithm, then this scheme can basically just “cheat” by not doing any FHE computation at all. All it needs to do is to keep appending the description of the functionality $F$ into the ciphertext and let whoever decrypts the ciphertext do all the work.
History of FHE
Before we proceed further into the theoretical land, I’d say let’s review the brief history of FHE together :)
The concept of FHE was proposed back in the late 70s. In 1978, Rivest, Adleman, and Dertouzos together proposed a scheme that enables secret evaluations of plaintexts by manipulating only the ciphertext without knowing the decryption key - basically FHE! Also, this was only two years after Diffie-Hellman public key exchange was proposed in 1976.
After this idea was proposed, the whole academia and industry started searching for the ideal candidate that has this amazing property. Unfortunately, people couldn’t find a worthy scheme that both satisfies the FHE requirement and is unbroken by cryptanalysis attacks.
It was until 2009 when Craig Gentry (a PhD at Stanford) made a major breakthrough on FHE. In his PhD thesis, he gave the first construction of a valid FHE scheme based on the assumption of ideal lattices.
In this thesis, he also introduced this novel idea of bootstrapping. With this bootstrapping trick, he was able to turn a leveled FHE scheme into a complete FHE scheme. The essence of this trick is to use some “black magic” to somehow “refresh” and ameliorate the noise level of a noisy ciphertext so the noise never blows up when computing arbitrarily-complex functionality $F$. We will come back to this in upcoming posts.
After Gentry’s discovery, the whole industry started the second round of massive search for ideal FHE candidates.
In 2011, Brakerski, Vaikuntanathan proposed a new FHE scheme - a scheme that is based on a lattice-based crypto assumption called Learning With Errors (LWE). Later, Brakerski, Gentry, Vaikuntanathan concluded this work by formally proposing the BGV FHE scheme. This scheme is a leveled FHE scheme and is based on better lattice assumptions compared to Gentry 09’. Commonly, we refer to the BGV scheme as the 2nd Generation FHE scheme.
Later in 2013, Gentry strikes back again! Gentry, Sahai, Waters together proposed the 3rd Generation FHE scheme - the GSW scheme. The GSW scheme is again based on the LWE lattice crypto assumption (but slightly better) and uses bootstrapping to make itself a truly FHE scheme (capable of evaluating any arbitrary-sized functionality $F$).
After 2013, there are a lot more follow up works in the industry to further improve the BGV and the GSW schemes to make them more practical. IBM worked on the HElib project which is an open-source FHE library that’s based on the BGV scheme. Another group developed the TFHE project which is derived from the GSW scheme. Both projects are actively maintained and optimized for performance and practicality.
There are also a few attempts to accelerate FHE schemes using hardware. Projects such as cuFHE aim to speed up FHE evaluations using CUDA-based GPU. There are also attempts to move this onto FPGA and ASIC chips to further boost its performance.
And here we are today - standing on top of a well-paved road that are built by avant-gardes in this field. The major focus today is still about performance. With the state-of-the-art technologies, we are still looking at the unit of seconds when evaluating useful functionalities on a moderately-rigged machine.
Worth-Mentioning Attempts at FHE
Before we continue explaining how Gentry’s FHE schemes work, I think it’ll be fun to go over some worth-mentioning attempts at build a FHE scheme.
RSA: Multiplicative HE
The very first candidate that exhibits some HE properties is RSA.
If you know about RSA already, feel free to skip this intro section! But if not, allow me to briefly summarize this algorithm in a few sentences.
RSA works by exponentiating a message in a cyclic finite field (or a group). It can be broken down into a few steps:
- First, fix a large number $N = p \cdot q$ where $p, q$ are large primes.
- Then, find a pair of numbers $e, d$ such that $e \cdot d = 1 \text{ mod } \Phi(N)$. Here, $\Phi(\cdot)$ is the Euler totient function and $\Phi(N) = (p-1)(q-1)$.
- Set $(N, e)$ to be the public key, and $(N, d)$ to be the private key.
- When encrypting a message $m$, simply compute $Enc(pk, m) \rightarrow m^e \text{ mod } N$.
- When decrypting a ciphertext $c$, simply compute $Dec(pk, c) \rightarrow c^d \text{ mod } N$.
The correctness of this scheme is assured by the fact that $(m^e)^d = m \text{ mod } N$.
Since the encryption of RSA is simply just exponentiation, we can immediate see some HE properties here. Assume that we have the encryption of message $m_0, m_1$ which is $c_0, c_1$. We can easily see that: \(\hat{c} = c_0 \cdot c_1 = m_0^e \cdot m_1^e = (m_0 \cdot m_1)^e\) By simply multiplying two ciphertexts together, we just obtained the encryption of the product of the original plaintexts. Indeed, we can tell that RSA is Multiplicative HE.
However, we couldn’t find additive HE properties within RSA. Therefore RSA is a Partially HE algorithm.
Cyclic Group ElGamal: Additive HE
The second candidate that people tried was the ElGamal encryption scheme over cyclic groups.
There are a few well-known cyclic groups such as the Integer Finite Field and the Elliptical Curve Group. However, we don’t need to understand these groups are actually implemented in detail.
All we need to know is that there’s a group $\mathbb{G}$ that contains a set of elements. Inside of that group there is a generator $g \in \mathbb{G}$ where its exponentiation results in every single element of $\mathbb{G}$. If that still sounds a bit fuzzy, just try to think that every element $h \in \mathbb{G}$ can be expressed as the generator raised to some exponent $x$: \(h = g^x \in \mathbb{G}\) ElGamal Encryption is an encryption scheme that works on top of generic cyclic groups. It works as the following:
- First, sample a random element integer $\alpha$ that is less than the size of $\mathbb{G}$, and set it to be the private key. We set $g^\alpha \in \mathbb{G}$ to be the public key.
- When we want to encrypt a message $m$, we will first sample a random integer $\beta$, and then we will output ciphertext $ct = (v = g^\beta, e = pk^\beta \cdot g^m)$.
- Later, when we want to decrypt this ciphertext $(v, e)$, we will first compute $w \leftarrow v^\alpha = g^{\alpha \beta} = pk^\beta$. Finally, we can obtain the message $m \leftarrow log_g(e/w)$.
For the sake of simplicity, I’ll omit the properties and the proofs here.
If we look at this encryption scheme, it’s not hard to spot out its HE property. Let’s say we have an encryption of $m_0, m_1$ which is $(v_0 = g^{\beta_0}, e_0 = pk^{\beta_0} \cdot g^m_0), (v_1 = g^{\beta_1}, e_1 = pk^{\beta_1} \cdot g^m_1)$. Notice that we can multiply the ciphertexts together: \(\hat{v} = v_0 \cdot v_1 = g^{\beta_0 + \beta_1}\\ \hat{e} = e_0 \cdot e_1 = pk^{\beta_0 + \beta_1} \cdot g^{m_0 + m_1}\) Then we can obtain a pair $(\hat{v}, \hat{e})$ which is the encryption of the of $m_0 + m_1$. We can tell that ElGamal is Additive HE.
We haven’t made much progress in finding the its multiplicative counterpart just by using normal properties of a cyclic group. Therefore, this scheme still lies in the Partially HE realm.
However, it is possible to get some multiplicative HE, if we assume some non-standard properties. Let’s take a look at it closely in the next section.
Pairing-based Crypto: Somewhat HE
After people discovered how crypto based on groups (especially Elliptic Curve Crypto) work, cryptographers went through an extensive research for plausible schemes in thie realm. Around 20 years ago, Pairings became a fairly hot topic.
So what is a pairing? Actually, pairing is just a special operation that can be done between group elements. Consider we have two group elements $g^a, g^b \in \mathbb{G}$. By applying the pairing operation $e(\cdot, \cdot)$, we can obtain a new group element $g_T^{ab} \in \mathbb{G}_T$. \(e(g^a \in \mathbb{G}, g^b \in \mathbb{G}) = g_T^{ab} \in \mathbb{G}_T\) Notice that the result of the pairing operation is a group element with exponent $a \cdot b$ but in a new group $\mathbb{G}_T$. This property allows us to “compute” the product of two unknown exponents $a$ and $b$ and can be useful to do many things. For example, if we want to see whether $r = a \cdot b$ given $g^r, g^a, g^b$, then we can essentially check this equality using pairings: \(e(g^a, g^b) \stackrel{?}{=} e(g^r, g) \iff g_T^{ab} \stackrel{?}{=} g_T^r\) There are many more powerful applications of the pairing operation, such as the BLS Signature Scheme. But here we will focus on the homomorphic properties of groups that allows this pairing operation.
From the previous section, we already know that ElGamal based on cyclic groups exhibits the Additive HE property. Here, with the help of pairings, it takes us further - allowing us to compute multiplications of group element exponents.
With the help of pairings, we are capable of computing both addition and multiplication of the exponents in the cyclic group. However, this scheme is still not FHE. In fact, it falls in the category of Somewhat HE.
The reason behind this is that the pairing operation is very limited and will only work in some groups. Let’s say that we find a group $\mathbb{G}$ that we can do both ElGamal and pairings, it does not guarantee that the target group $\mathbb{G}_T$ that the pairing operation takes us to also has a valid pairing operation.
Therefore, the functionality $F$ that this scheme allows us to homomorphically compute consists of boolean circuits that can be represented by arbitrary additions plus a few multiplications. The number of multiplications allowed in the scheme depends on the actual group $\mathbb{G}$ that we select to begin with.
MPC-based HE
The last worthy attempt that I wanted to mention is MPC-based HE.
MPC, or Secure Multi-Party Computation, stands for a genre of problems in Cryptography where we have a group of (possibly malicious) parties, each holding their own private input $x_i$, and together they wish to evaluate some functionality $F$, and obtain $F(x_1, …, x_n)$. The requirements of MPC is that all the parties do not want to reveal their private input to anyone, therefore every party in the end should only obtain the evaluation result of the functionality $F$, and nothing else.
MPC is actually a well-studies problem already. Starting in the last century where Andrew Yao proposed his famous Garbled Circuits approach that leverages the powerful protocol called Oblivious Transfer. After that this field blossomed with a wide set of frameworks that allows multiple parties to compute arbitrary functionalities with a fairly practical complexity.
Therefore, we can easily solve the FHE problem with any MPC protocol. All we need to do is to have the user with private input be one party of the protocol, and the delegated server to be the other party. The user has its private input $pt_i$, and the server has its private functionality $F$. Assume that the MPC protocol does what it says, then in the end the user can receive its result, and the server receives nothing.
Although the FHE problem sounds kind of trivial in the presence of MPC, but there’s a critical issue with this approach - Since MPC requires interaction between the participating parties, the user has to remain online during the process of the protocol. This kinds of defies the purpose of FHE, because we would want the user to offload computation to a third party, not participate in a computation with it. Therefore, solving the FHE problem while keeping the user’s computation low still remains as a valid and open problem!
Up Next: The GSW FHE Scheme
I think we should all have a pretty solid understanding of what FHE is at this moment.
In this series of posts, I’ll be going over the full details of the GSW FHE scheme, since it’s easiest to understand and makes the most convenient set of assumptions. (Also this was the scheme they taught in class :P)
Since this post is probably already long enough, I will stop writing here. In the next post, I’ll start to go over the fundamental concepts that help us understand how GSW works. Namely I’ll be talking about what Learning With Errors is and how Lattice-based Crypto works.
Credits
Most content of this post were summarized from my notes for CS355. You can locate all the awesome lecture notes from CS355 here.
Thanks again to Dima, Florian, and Saba for teaching this amazing course!