We hear amazing rumours of a 5-time speedup of the AES with SSE5. Wow. A 5-time speedup from what to what? It turns out that the best the experts could squeeze out of it so far is 182 clock cycles per block or 11.375 CPB (cycles per byte). Now this figure sounds believable…
This breathtaking 5-time acceleration from 16 down to almost 11 CPB must be due to the new much ballyhooed SSE5 PPERM instruction. I wonder just how much ïrRUPT with its 5-13 CPB can be accelerated with PPERM instead of the ridiculous MOVE + SHIFT LEFT + SHIFT RIGHT + XOR combination it is forced to use on the previous SSE extensions to perform a simple byte-wise or word-wise rotation operation… I would be more than happy with a mere 2-time speedup, which would make ïrRUPT as fast as MD6 executed on 16 processors.
The 2x parallelisable ïrRUPT stream hashing mode of operation of EnRUPT has been fully tested, implemented and submitted to the NIST SHA-3 competition.

Even with their 8 rounds per word, speed-optimised ïrRUPT implementations hash a stream at 5-13 CPB on Core 2 Duo, which is faster than MD5 with its 4 rounds per word and a lot faster than the SHA family with their 5 rounds per word. ïrRUPT is about 8 times faster than MD6 with all its parallelisability! It is also very small and very fast in high-level languages such as Basic, C#, Java, JavaScript, PHP, Perl, Python or Ruby and in all kinds of hardware – ASIC, FPGA, 8-bit and 16-bit microprocessors.

In big-endian C, ïrRUPT looks as follows:
#if (HASH_BITS>256) // 128,160,192,224,256,384,512,1024,1536
#define w 64
#define uw uint64_t
#define rotr rotr64
#else // 64,80,96,128,160,192,224,256,384,512,768
#define w 32
#define uw uint32_t
#define rotr rotr32
#endif
#define H ((HASH_BITS*2+w-1)/w)
#define ïr1 (f=rotr(2*x[(r^1)%H]^x[(r+4)%H]^d[r&1]^r,w/4)*9,\
x[(r+2)%H]^=f,d[r&1]^=f^x[r++%H])
#define ïr2 (ïr1,ïr1)
#define ïr2s(p) (ïr2,ïr2,ïr2,ïr2,d[1]^=p) // s=4
void ïrRUPT (const uw *msg, const uw msgwords, uw hash[(H+1)/2])
{
uw d[2],f,i,r,x[H];
for (i=0; i<H; i++) x[i]=0; d[1]=d[0]=r=0; // initialize
for (i=0; i<msgwords; i++) ïr2s(msg[i]); // process
for (i=0; i<H; i++) ïr2s(0); // finalize
for (i=0; i<(H+1)/2; i++) hash[i]=ïr2s(0); // output d1
}
// message is in big-endian bit/byte order
// message must be padded with one bit 1
// followed by 0 bits up to the word size
The only differences with the initially proposed irRUPT mode are the big-endian bit/byte order and the ability to update two [or more] words in parallel. There are no structural differences. ïrRUPT is simply a direct extension of irRUPT to processing P words in parallel. ïrRUPT with P=1 is identical to irRUPT.
ïrRUPT is probably the only easily memorisable hash function on Earth. Being a stream mode of EnRUPT, it also doubles as a fast authenticated or binary additive stream cipher or a MAC.
We hope that this information helps other participants improve their submissions. We appreciate your votes of confidence as well as your constructive criticism.
Big thanks go to Karsten Nohl from University of Virginia and to Luca Henzen from ETH Zürich for helping us raise the bar a little higher.
An extensive testing has demonstrated that the initially proposed irRUPT mode of operation must be corrected to load each input word once every 2*s=8 rounds instead of 6. No other modes of operation are affected.
EnRUPT SASC-2008 slide show is finally here! [Colour PDF, 6.4 MB] and [B/W PDF, 825K]
People often ask me “is this or that secure” questions about ciphers and products… Rather than trying to answer those questions directly, which can spark flames of endless arguments and may sound like paranoia or a conspiracy theory, I like to present rhetorical questions, which help people determine what is and what isn’t secure for themselves. These are some of the questions:
1. Hasn’t it always been the NSA’s duty to maintain cryptographic superiority being able to decrypt communications of all the US adversaries while no one else can? Remember their giving away Enigma machines after the WWII?
2. Isn’t it the NSA’s duty to make sure that they are the only ones who can decrypt communications of their own government? Remember the Clipper chip backdoor?
3. Isn’t it the NSA’s duty to enforce digital signatures to ensure tracking of individuals? Since when do authentication or message integrity require the use of digital signatures?
4. Why is AES-128 not accepted as a Type-1 cipher [not allowed to secure top secret documents or communications] while the 64-bit secure Skipjack with its 80-bit keys and a well-known 16-bit NSA backdoor is [along with AES-192 and AES-256]? Since brute-force of a single AES-128 key would take a thousand years on a quadrillion of 10 GHz microchips, what is wrong with it?
5. If the NSA could break the DES in 10 microseconds, how fast could they break 3DES?
6. Why is algebraic cryptanalysis the most under-developed type of cryptanalysis by the academia, while most of the academic efforts are thrown into linear and differential cryptanalysis requiring infinitely large numbers of chosen plaintexts and which have not been responsible for a single known practical break of a cipher? Who influences the lemminghood?
7. Why was Rijndael chosen as the AES against all the warnings that it has the weakest structure of all the AES candidates against algebraic attacks, the only type of attacks that could be of a real practical threat requiring minimal amounts of information about the plaintext?
8. Why did RSA in PGP become limited to 2048-bit keys by “#define MAX_RSA_KEY_SIZE 2048” since Philip Zimmermann sold it to the NSA partner Network Associates Inc?
9. Why is RSA in GPG limited only to digital signatures and is not allowed to be used for encryption?
10. Why is RSA in GPG’s “expert” mode limited to 4096-bit? Are 8192-bit RSA keys too much to ask?
11. Why are the same mysterious prime moduli used in PGP, GPG and all the US government Diffie-Hellman standards and why is there no explanation of how they were generated? Are provably pseudorandom DH moduli too much to ask?
12. Why are the prime moduli over 112 bits chosen for the ECC standards have such very special and obviously weak form (1000…0001) even though the actual difference in speed between implementations of those and pseudorandom moduli is so small? [~18ms vs ~26ms for 128-bit secure 256-bit keys, merely 30% faster]
13. If the slightly higher speed is in fact the key factor and if this special form does not affect security, shouldn’t the 112-bit and shorter prime moduli also have the same special form? Why are those pseudorandom?
14. Why are the random-looking prime ECC moduli up to 112 bits listed in the US standards in hexadecimal form, and the special prime moduli over 112 bits are presented there in decimal form in which they appear random? Why hide all the zeros?
15. Why was there a backdoor in the US government standard for random number generation? ![]()
16. If there is obviously at least one backdoor, who says every one of those standards does not contain a backdoor? Shouldn’t they?
17. If it is the NSA’s responsibility to include hidden backdoors in the US cryptographic standards, how can anyone else trust any of them?
18. If the NSA could break as some people think any cipher or any key exchange algorithm, why would they go through all this trouble to limit and control it?
A slightly off-topic post to demonstrate yet another product that fails miserably due to the lack of properly used cryptography. It is the first day at the Hacker Space Fest. I have just received the first copy of the “Consumer B Gone” MP3, which if played on your mobile phone or iPod next to the wheel of a shopping cart, locks it.
I can truly appreciate the harmless fun of this ingenious way to use the speaker coil to produce higher frequency radio waves that affect electronic devices in its proximity. Check it out for yourself: The Al Quaddie.
The moral of this story is quite simple: if encryption is implemented and used properly in a digital or a security product, there is still a chance that the product may fail. But without cryptography, it is guaranteed to fail.
Automated cryptanalysis of EnRUPT128 (a variant of EnRUPT operating on 128-bit words) has shown that it is much weaker than EnRUPT32 and EnRUPT64 and requires different rotations and shifts to maintain the same speed just barely. It needs a rotation right by 24 bits and a multiplication by 33 (a shift left by 5 bits instead of 3) just to support the same formulas for the number of rounds, but generally it needs 50% more.
On top of that, even SSE cannot perform 128-bit addition operations or bitwise shifts across all 128 bits. It turned out to be terribly complicated. Once again, the absence of a rotation operation in MMX and SSE made me swear like only an Irishman can! How hard is it to implement any left or right shift with a single rotation and one masking AND??? You even need a rotation operation in only one direction in your processor! But try rotating with two shifts, a spare register and another operation to combine them! It’s a nightmare! Somebody, please help Intel out with a little bit more intelligent engineers or their managers…
While EnRUPT64 is a little slower on the old IA32 platforms, its MMX or SSE implementations can be as fast or faster than EnRUPT32, while its simple ANSI C implementations are certainly much faster on IA64. It also has the advantage of the absent rotation operations on 16-bit microprocessors. It seems that for the large block sizes and secret states, EnRUPT64 is the best candidate for the foreseeable future.
The updated specification paper includes EnRUPT64 since its automated cryptanalysis has been completed and revealed no significant differences comparing to the originally proposed EnRUPT32. Both variants have roughly the same security properties. Unfortunately, there is no way to keep them simple yet compatible with each other. They can only be used separately.
While originally EnRUPT specification only proposed a variant operating on 32-bit words, it was designed to support variable word length. The difference in word length only affects the rotation operation. It is w/4 (a quarter of the word length), not simply 8. Other word sizes have not been fully analysed yet and may require a different number of rounds to be secure as it is affected nonlinearly. While larger word sizes can be made secure, 16-bit words demonstrate significant weakness and cannot be used efficiently.
Performance of EnRUPT64 is much better than EnRUPT32 on 64-bit processors but much worse on 32-bit processors (about 2 times slower than EnRUPT32). Both perform equally well on 8-bit microprocessors except for small 64-bit blocks [minimum block size of EnRUPT64 is 128 bits], while EnRUPT64 is slightly faster on 16-bit microprocessors due to the absence of rotations.