EnRUPT

First all-in-one symmetric cryptographic primitive.

Sean O’Neil
www.enrupt.com

Abstract: EnRUPT is a new simple scalable word-based symmetric cryptographic primitive that can be used to construct secure block ciphers, hash functions, stream ciphers, RNGs and MACs for a wide range of software and hardware applications. It accepts keys and data blocks of any size. It is very easy to memorise. It was found and tested using automated cryptanalysis tools. It does not use any patented or patentable techniques. It is a nearly optimal word-based “ADD-XOR-ROL” design forming an unbalanced Feistel network first proposed in XXTEA. EnRUPT design, all its implementations and all its pictures are in the public domain.

1. Introduction

Block ciphers (TEA, XTEA, XXTEA, RC5), stream ciphers (Salsa20, Phelix) and hash functions (MDx, SHA, RipeMD) based on ADD-XOR-ROL networks have been studied for years. Despite their incredible complexity, many of them were found insecure. It calls for new simpler, faster and more secure primitives.

EnRUPT is a scalable unbalanced Feistel network first proposed in XXTEA[1]. It operates on w-bit words in little-endian format. The word width w is currently fixed at 32 bits. EnRUPT has four parameters: 1) source block k (the key), 2) its size kw, 3) target block x (the state), and 4) its size xw. EnRUPT security is measured by the smallest of the two blocks: 2w*min(xw,kw), but for some applications it is naturally bound by the birthday barrier or by the size of the secret. When data is to be encrypted, it must be supplied as the state. When it is to be hashed irreversibly, it must be supplied as the key and the initial hash value as the state. The reversible EnRUPT round function can be summarised as:

xr ⊕= rotr (2*xr–1xr+1krr, 8) * 9 ⊕ kr (er1)

where ⊕ is XOR (^ in C, Java, JavaScript, Perl and Python), rotr is w-bit rotation right, * is multiplication modulo 2w and all the indexes are modulo their specific block sizes – mod kw for the key and mod xw for the state. The (er1) round function should be iterated for n=s*(2*xw+kw) rounds (normally s=4), with the round number r beginning with 1.

Fig 1. Block enRUPT/unRUPT round function (er1)

The key is processed irreversibly. The target block is reversible (decipherable) only when the key is known. Decryption is identical to encryption, except the state is processed iterating r in reverse. There are no special constants to memorise. The entire message can be encrypted as a single block providing free authentication of packets by verifying a constant or random value inside the block.

1.1 Notation

c – next w-bit ciphertext value, in little-endian format

d previous w-bit ciphertext/keystream value, saved between rounds

k key or source block, in little-endian format

kr current [r mod kw] word of the key

n – number of rounds in the encryption/hashing process; n=s*(2*xw+kw) for block EnRUPT and n=s*xw for stream EnRUPT.

p – next w-bit long plaintext value, in little-endian format

r – round number, 1 to n during encryption and n to 1 during decryption

s – security/performance trade-off parameter; s≤1 is insecure, s=2 defends against passive distinguishers, s=3 against all non-adaptive chosen plaintext, ciphertext or related key attacks and s≥4 against all adaptive attacks; s>4*sw is for the high-assurance applications requiring unreasonably high security margins. We recommend s=4 for all modes of operation.

w – word width, currently fixed at w=32 bits

hw – size of the hash in w-bit words, normally equal to sw*2 (hb=hw*w)

kw – size of the source block (key) in w-bit words (kb=kw*w)

sw – required security level in w-bit words

xw – size of the target block (state) in w-bit words (xb=xw*w)

x state or target block, in little-endian format

xr current [r mod xw] to be updated word of the state

2. Operation

The name must reflect all the size parameters and the required security/performance trade-off parameter s as EnRUPT-kb-xb-hb/s to ensure clarity and compatibility with other applications. If s is omitted, s=4. If hb is omitted, hb=kb. If xb is omitted, xb=kb in block ciphers or 2*kb otherwise. Names of the functions proposed in this paper are as follows:

Block Cipher

Hash Function

Stream Cipher, PRNG

MAC

Stream Hash

enRUPT/unRUPT

mdRUPT

RUPT/aeRUPT

mcRUPT

irRUPT

For the general-purpose applications, we recommend the following three types of functions with the security trade-off parameter s set to offer resistance to all adaptive attacks:

1) Block functions with keys (or data blocks in hashing) and data blocks (hashes) of the same size twice the required security: ciphers with xw=kw=2*sw as in enRUPT-512 equivalent to enRUPT-512-512/4, and MD hash functions with kw=hw=2*sw and xw=s*sw as in mdRUPT-512 equivalent to mdRUPT-512-1024-512/4.

2) Block functions for processing whole messages as single blocks: ciphers with kw=2*sw and xw={message words} as in enRUPT-512-max equivalent to enRUPT-512-max/4, and MD hash functions with xw=s*sw, hw=2*sw and kw={message words} as in mdRUPT-max-512 equivalent to mdRUPT-max-1024-512/4.

3) Stream functions processing one w-bit word of plaintext/ciphertext every two rounds, with kw=2*sw [hw=2*sw] and xw=s*sw, such as RUPT-512, mcRUPT-512, irRUPT-512 or aeRUPT-512, which are all equivalent to xRUPT-512-1024-512/4.

Other variants may use smaller or larger states or reduced or increased numbers of rounds for increased security or increased performance. The name of course must reflect the difference to ensure compatibility with other applications. There are only two absolute restrictions: 1) state x in stream modes and block hash modes cannot be smaller than 2*sw words, and 2) stream modes must execute at least 2 (two) rounds between inputs/outputs.

2.1 Block cipher mode (enRUPT, unRUPT)

2.1.1) Per-block initialisation

Provide the secret key as the source block and its size in w-bit words as kw. If IV is required, concatenate it with the key (key first) and supply it as the sourceblock.

Provide plaintext/ciphertext as the target block and its size in w-bit words as xw. If xw<2, the block cannot be decrypted. If necessary, include a constant or random authenticator value of required size as part of the plaintext target block.

Set n = s*(2*xw+kw), by default s=4.

(if) enRUPT: Set r = 1.

(if) unRUPT: Set r = n.

2.1.2) Iteration

Execute (er1) for n rounds.

(if) enRUPT: Increment r on each round as long as rn.

(if) unRUPT: Decrement r on each round as long as r > 0.

2.1.3) Finalization

(if) unRUPT: If necessary, check the authenticator value inside the decrypted block.

2.2 Stream modes (RUPT, aeRUPT, mcRUPT, irRUPT)

In stream modes, EnRUPT updates the state irreversibly as shown below. The only difference is that the key word kr is replaced by a w-bit feedback variable d storing the previous keystream/ciphertext value linearly combined with xr+[xw/2].

t = rotr(2*xr–1xr+1dr, 8) * 9;

d ⊕= t;

xr ⊕= t;

d ⊕= xr+[xw/2];

r = r+1;

Fig 2. Stream EnRUPT round function (ir1)

Fig 3. aeRUPT/mcRUPT/irRUPT structure

In all the stream modes, the (ir1) round function must be iterated at least twice between inputs/outputs. There is no source block, so the formula for n is also slightly different. The average period of RUPT is ~2w*((xw+1)/2+1), so it should be rekeyed after 2w*xw/4 outputs, although probability of such short loops is infinitesimal.

To ensure w*sw bits of security, the secret state size xw must be at least 2*sw, but we recommend 4*sw to provide a good security margin. The size of the final hash value hw cannot be less than 2*sw. If it is a keyed mode of operation, the key size or concatenated key+IV size kw must also be at least 2*sw. By default, xw=4*sw andkw=hw=2*sw.

2.2.1) Initialisation

Provide the state size in w-bit words as xw. It cannot be less than 2.

(if) If it is a keyed function, provide the secret key or concatenation of the secret key and the IV (key first by default) as the first kw w-bit words of the input stream.

Set all xw words of the state to 0. Set n = s*xw, by default s=4. Set r = 1. Set d = 0.

(if) If it is a keyed function, execute (ir1) for 2*kw rounds incrementing r on each round to load the key [and IV] words as described in 2.2.2. Then execute (ir1) for nmore rounds incrementing r on each round.

2.2.2) Stream processing

Iterate (ir1) twice.

(if) mcRUPT/irRUPT message processing or keyed RUPT/aeRUPT/mcRUPT/irRUPT key/IV loading: set d ⊕= p[r/2] to load the next w-bit word in little-endian format.

(if) RUPT keystream generation or mcRUPT/mdRUPT/irRUPT output of MAC/hash: release w bits of d as keystream output.

(if) aeRUPT: release w bits of ciphertext c[r/2] = dp[r/2] as output during encryption or w bits of plaintext p[r/2] = dc[r/2] during decryption. Save c[r/2] as d.

(if) irRUPT or any keyed stream mode that may have to reuse nonces: iterate the (ir1) round function n times between every xw words of input/output.

2.2.3) Finalization (if necessary)

(if) irRUPT: pad the last word of the stream with a single bit 1 followed by zeros, or load one more word ‘1’ if it is full, or pad the message as the protocol requires.

Execute (ir1) for n more rounds incrementing r on each round.

Execute (ir1) for 2*hw more rounds in keystream generation mode as described in 2.2.2 to produce the required hw w-bit words of the final MAC or hash value.

2.3 Collision-resistant block hashing (Merkle-Damgård) mode (mdRUPT)

To ensure w*sw bits of security, the state size xw, the size of the final hash value hw and the size of the data block to be hashed kw must be ³ 2*sw of w-bit words. If it is a keyed hash function, key or concatenated key+IV (key first by default) must also be ³ 2*sw of w-bit words. By default, xw=4*sw and kw=hw=2*sw, the same as for irRUPT.

The choice of prefix, postfix and padding is left to the protocol/standard designer. mdRUPT core only provides a secure compression/expansion function so it could replace any existing Merkle-Damgård or other construction regardless of its high-level operation.

2.3.1) Per-message initialisation

Set all the xw words of the target block (hash state) to 0 [or to any other constant].

Set n = s*(2*xw+kw), by default s=4. Set r = 1.

If it is a keyed hash function, execute step 2.3.3 as many times as required to hash the secret key and IV/nonce as the first source block[s].

2.3.2) Before processing the last block

If it is a keyed hash function, execute step 2.3.3 as many times as required to process the secret key and the IV/nonce again as the last source block[s].

If processing messages of odd sizes, pad the last word of the message with a single bit 1 followed by zeros to the end of the block, or attach one more word containing the padding ‘1’ if the last word is full, or pad the message as the protocol requires.

2.3.3) Iteration

For each data block, execute (er1) for n rounds incrementing r on each round.

Reset r to 1 between blocks.

2.3.4) Finalization

Set d=0. Set r=1.

Execute (ir1) for 2*hw rounds in the keystream generation mode as in 2.2.2 to produce the required hw w-bit words of the final hash value.

Note: Key expansion/compression can be performed by hashing it with irRUPT, mcRUPT or mdRUPT, concatenated with other constant or random values according to the protocol.

3. Design

Formally speaking, EnRUPT is a consistent incomplete source-heavy heterogenous UFN (unbalanced Feistel network)[2]. Despite its apparent simplicity, it seems to be an optimal structure that can process equally well blocks of any size. Simplicity of every aspect of the complete design was our main goal and turned out to be the hardest thing to achieve while maintaining high performance with a single simple set of rules for all possible sizes and applications that require security for completely different attack scenarios. It is actually very easy to save a few clock cycles per byte at the cost of simplicity.

If ultra-high speed is necessary, EnRUPT can take advantage of CTR or any other parallelisable mode like any other cipher. MAC and hashing can also be parallelised 4 times as proposed in [10]. It is not the purpose of this specification to restrict the user to a certain specific mode of operation, but to provide a primitive that can be safely used in any mode.

3.1 Design principles

First of all, s-boxes, key/data-dependent rotations and multiplications were not considered at all. Key/data-dependent rotations and s-boxes suffer horribly from timing problems – fragility totally unacceptable for a cipher. Multiplications may be cheap in some modern processors, but not only they are terribly expensive in all the embedded processors and in the FPGA and ASIC hardware, they are also not nearly as fast as ADD-XOR-ROL networks at building PRFs according to our tests. This leaves only addition, subtraction, bitwise shifts, rotations and logical operations to use in the search for an optimal construction, ultimately resulting in ADD-XOR-ROL networks that are both fast and simple.

With complexity of software and hardware designs growing exponentially with time, it is important to offer developers a symmetric cryptographic instrument that they will not make a mistake implementing. Developers should not struggle to implement cryptography to secure their increasingly complex systems. Thus the cryptographic primitive should be:

1. Extremely simple

2. Easy to memorise and hard to forget

3. Almost impossible to make a mistake implementing (no timing issues either)

4. Equally strong when iterated in either direction

5. Fast on the latest most widely used software processors

6. Reasonably fast [and small] on embedded processors and in high-level languages

7. Updating xr reversibly (by XOR) so that it could be reused as it is for decryption

3.2 Measuring polynomial pseudorandomness as design methodology

Automated algebraic distinguishers as described in [2] can measure size and randomness of polynomial relationships between all the bits. The number of rounds indistinguishable from random and the bias in the last distinguishable round can be used as the measure of strength of each round function. It also provides a good estimate of the number of rounds required for the cipher to provide sufficient security by combining these results with theoretical works ([4]-[9]) and with cryptanalytic results for well-studied ciphers and hash functions. Resistance to guess-and-determine attacks was measured by other tools.

Due to the sequential structure processing one word at a time, at least xw+kw rounds are required even for a perfect round function of this type to propagate a change in the key or data to the entire block. Since a perfect round function would be very expensive, the second best simple enough proportion is 2*xw+kw. With a stronger round function, either the formula for the total number of rounds would be too complicated or performance would be hindered. So the aim was to find the simplest and the fastest function that takes no longer than that to build a PRF for blocks and keys of any size.

3.2 Search for the best round function

Intel assembly language includes LEA instructions that can shift a 32-bit word by 1, 2 or 3 bits and add to it another 32-bit word and a constant in a single instruction, thus providing multiplication by 3, 5 or 9 in a single adder. All the other shift-ADD combinations referred to as MUL [as the only considered multiplications] have to use an additional temporary register and a bitwise shift. Bitwise shifts are as slow as rotations but are not as strong. While constant shifts and rotations are almost free in hardware, they are the most expensive operations in software processors to be used in an ADD-XOR-ROL cipher. Therefore the use of shifts or rotations must be limited as much as possible. Unfortunately, that also has a limit: combining more than two different ADD-XOR or MUL-XOR operations per rotation shows no further improvement in strength.

Hundreds of different functions with all possible combinations of shifts, rotations and byte-swapping operations were tried. There was no time to research why some of them perform better than others. All the strong functions that allowed parallelisation 2-4 times such as (er2) xr⊕=rotr(xr–2⊕2*xr+2⊕1krr,8)*9⊕kr, were dropped after thorough testing, as their much slower diffusion rate makes up for all the gain from parallelisation but with a more complex round function that would not support 2-3-word blocks and would be faster only on some expensive processors. To achieve top performance, any change from the previous operation should be used in the following operation. The same applies to the round function when it is iterated in the opposite direction. So the best performing function would be of the form:

xr = f (xr, xr–1, xr+1),

of which the most efficient of the usable functions are of the form:

xr ⊕= f ((xr–1<<c1) ± (xr+1<<c2)), c1 ≠ c2,

where f includes a rotation and ± is a bitwise or arithmetic addition or subtraction. This is due to the fact that the register currently storing the last processed word in either direction gets updated with the output of the shift and addition operations and its original value is lost immediately. All other strong constructions require additional registers to implement.

After testing hundreds of thousands of similar variants, the following points became clear:

1) Of all the logical operations, XOR is the best choice. ADD provides enough nonlinearity.

2) Rotations work much better than shifts, but at the same cost in software.

3) It is essential to choose the right shift[s] and/or rotation[s].

4) A set of shifts and/or rotations works better than byte swapping or their combination.

5) At least two addition operations should be performed for every rotation.

6) Constants in one form or another are essential.

7) The round index must be included to remove self-similarity. It can replace the constants.

8) Different rotations or shifts should be used for different neighbours.

9) f (2*xr–1 ± xr+1) and f (xr–1 ± 2*xr+1) are the strongest choices that work equally well.

10) They are also the only functions that work for the smallest 2-word blocks.

11) Continuous addition of xr+[xw/2] values and releasing output after 2+ rounds is the only scheme we could find that is guaranteed to resist guess-and-determine attacks for any size.

12) The simpler the key schedule the faster it builds a PRF. Optimally, each key word should be added twice, before and after the rotation.

This construction works for 2-word blocks thanks to its processing both sides with a slight offset shifting the left side left by 1 bit to satisfy the point (8) above. It is also the reason why an 8-bit rotation works with this round function: xr–1 is thus effectively rotated right by 7 bits feeding forward w–1 bits. Since this addition only diffuses bits of different words, diffusion of bits within each word is also necessary. It was added in the form of multiplication by 9 described above. With the 3-bit shift, the adder thus provides diffusion of the two neighbours and of the key bits shifted/rotated by 0, 4, 5, 7 and 8 bits. Multiplying the sum by 9 before the rotation is actually slightly weaker. The nonlinearity provided by this arithmetic addition is sufficient. It allows all the other operations to be linear.

8-bit rotation is usually a very bad choice for ADD-XOR-ROL ciphers, even in combination with other rotations. But this particular construction seems to be a very lucky case: an 8-bit rotation here works on words of any size as well as almost any other rotation. Although the strongest choice would be rotation right by 11 bits, it is only marginally stronger and only for large blocks. Rotation left or right by 8 bits performs equally well on small blocks where it matters the most. Since rotation right by 8 bits is slightly stronger on very large blocks than rotation left [but at the same low cost on 8-bit processors], it is our final choice.

EnRUPT+ with arithmetic additions before the rotation instead of the linear ones was also considered: xr⊕=rotr(2*xr–1+xr+1+kr+r,8)*9⊕kr. It is as secure as EnRUPT with the same number of rounds and it is faster on Intel processors (3.8 clocks per byte in stream modes), but it is considerably slower on small 8-bit processors and at least two times slower in the FPGA and ASIC hardware.

Thanks to the extensive use of automated cryptanalysis to compare different primitives, the discovered round function is very close to optimal, although it was not the goal of this project. Some of the EnRUPT performance had to be traded in favor of its simplicity. Compare the EnRUPT round function (er1) with the TEA, XTEA and XXTEA shown below:

Fig 4. TEA

Fig 5. XTEA

Fig 6. XXTEA

It should be obvious that EnRUPT round function is much smaller and much simpler than that of its closest relatives. It is also much faster, especially on the 8-bit processors. We designed EnRUPT aiming to replace its slower, more complex and limited predecessors.

3.3 Number of rounds

It is the hardest part of any cipher or hash function to get right. It is especially challenging to find the right formula for a fully scalable design like EnRUPT. And this is exactly where cryptologic theory ends and scientific guessing, guestimating and plain speculating usually begin. The only solid cryptologic base are the theoretical works pioneered by Luby and Rackoff [4] and improved and developed by Jacques Patarin [5], Naor and Reingold [6], Yun Park and Lee [7], Aiello and Venkatesan [8], Maurer [9], etc. These works provide theoretical proofs of security against various generic attacks for ciphers constructed with independent PRFs as round functions. Polynomial time distinguishers and independent PRFs are needed for these proofs of security to be applicable.

Approximately 2*xw+kw rounds of (er1) build a PRF for blocks and keys of any size with a controlled change anywhere in the data or key. The irreversible function (ir1) grows faster: approximately xw+2 rounds build a PRF for a state of any size with a controlled change in the input. Only the smallest sizes require an additional round or two. Those have a lower security margin.

Experimental results compared with the recent developments in cryptanalysis show that 4 times that number is the safest reasonable number of rounds for a secure cipher, which also seems to match Luby-Rackoff results. For example, mdRUPT-512-128-x similar to MD5 in proportions and structure of its round function [except (er1) is slightly stronger] is iterated for 96 rounds compared to the 64 rounds of MD5, and mdRUPT-512-160-x similar to SHA [except (er1) is stronger again] is iterated for 104 rounds compared to the 80 rounds of SHA. These numbers are much more reasonable than the 64 and 80 rounds chosen by the authors of MD5 and SHA. So far no one can claim that they can break 96 rounds of MD5 or 104 rounds of SHA. EnRUPT also demonstrates a much higher resistance to differential and rotational related key attacks[11] than the TEA, XTEA or XXTEA, which is essential if it is to be used as a hash function.

Cipher

PRF rounds (data/IV)

PRF rounds (key)

PRF rounds (both)

Proposed rounds

Maximum broken

TEA

10

64

[11]

XTEA

9

19

19

64

27[12] ≤ ?

XXTEA

10

15

21

64

?

RTEA

10

12

14

48

?

enRUPT-128-64

8

10

10

32

MD5-1

13

27

27

64

64 ≤ ?

MD5-2

11

25

25

64

64 ≤ ?

MD5-3

10

24

24

64

64 ≤ ?

MD5-4

13

26

26

64

64 ≤ ?

mdRUPT-512-128-x

12

24

24

96

SHA-0

16

27

27

80

80 ≤ ?

SHA-1

16

27

27

80

80 ≤ ?

mdRUPT-512-160-x

14

25

25

104

Salsa-20

4*16

4*16

4*16

20*16

7*16 ≤ ?

RUPT-512-512-x