To end the started confusion about EnRUPT, it is not just a tiny block cipher like the TEA, XTEA or XXTEA. EnRUPT round function can also be used continuously in different stream modes. For example, RUPT-256 stream cipher can be implemented in plain C as follows:
#define sw 8 // 256-bit security
#define xw (4*sw) // means 1024-bit state
// one (ir1) round: same as (er1), only adding the word
// half of the state away to the 32-bit accumulator d
// added once on every round instead of the key word
#define ir1(p) (f=rotr32(2*x[(r-1)%xw]^x[(r+1)%xw]^d^r,8)*9,\
x[r%xw]^=f,d^=f^p^x[(xw/2+r++)%xw])

// the complete [double] stream cipher round,
// returning accumulator d value as keystream
#define rupt(p) (ir1(0), ir1(p))
// Complete RUPT
{
// [default: 256-bit] key followed by [default: 256-bit] IV
u32 key[key_words], iv[iv_words];
// secret state: 4*security, accumulator and round index
u32 x[xw], d, r;
// local variables
u32 i, t;
// plaintext <=> ciphertext
u32 text[data_words];
...
// initialise, load the key, the IV, seal
for (i = 0; i < xw; i++) x[i] = 0; d = 0; r = 1;
for (i = 0; i < key_words; i++) d = rupt(0) ^ key[i];
for (i = 0; i < iv_words; i++) d = rupt(0) ^ iv[i];
for (i = 0; i < 2*xw; i++) rupt(0);
...
// to encrypt or decrypt:
for (i = 0; i < data_words; i++) text[i] ^= rupt(0);
}
That’s all. The above code is complete RUPT [EnRUPT used as a stream cipher]. Its speed is about the same [so far] as Salsa20/12 on Core 2 Duo with a significantly smaller and simpler code. That is about 3 times faster than RC4, while fitting in half of its memory even for 256-bit security and without the pain of implementing RC4 in ASIC/FPGA or Salsa20 on 8-bit processors. Any PRNG that produces 32-bit numbers can be quickly and easily replaced with RUPT to provide top speed and perfect randomness of a secure stream cipher at the same time.
Note: the only advantage of Salsa20 is its operation in CTR mode, which allows fast forwarding to any block in the stream. If it is ever required, RUPT can also be used in CTR mode, but with any desired block size. The per-block counter loading/sealing overhead is insignificant for sufficiently large blocks that are encrypted only one word at a time anyway.