Jan 17, 1996 This file has examples of compiling and running the ELL.C program. Compilation on my Sparc workstation. I like to include the date & compile conditions in the name of the binary. Compile takes 5-10 minutes. ------------------------------------------------------- leibniz% gcc2 -O9 -o ell-960117-gcc2O9 -DMAIN ell.c leibniz% gcc2 -v Reading specs from /usr/local/gnu-2.5.7/lib/gcc-lib/sparc-sun-sunos4.1.1/2.5.7/specs gcc version 2.5.7 ------------------------------------------------------- Running. With 0 or 1 arguments, the program does a little bit of everything - a sampler. Be sure to use at least 2 arguments if you want to do anything else. ------------------------------------------------------- leibniz% ell-960117-gcc2O9 GF155 here. testing prarr: %00000000 00000000 000bcdef 0006789a 00012345 % gf155m2 = 07ffffff ffffffff ffffffff ffffffff fffffffe tp2 before:00000000 00000000 00000000 00000000 00000000 tp2 after zeroing: 00000000 00000000 00000000 00000000 00000000 tp2 after copy: 00000000 00000000 000bcdef 0006789a 00012345 sum of tp1 + tp2: 00000000 00000000 00000000 00000000 00000000 sum of tp1 + tpx: 00000000 00000000 000bcdef 0006789a 00012347 tp2 after zeroing: 00000000 00000000 00000000 00000000 00000000 product of tp1 * x: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00179bde 000cf134 0002468a tp1 * tp1: 00000000 00000000 00000000 00000000 00000045 50515455 00000014 15404144 00000001 04051011 square of tp1: 00000000 00000000 00000000 00000000 00000045 50515455 00000014 15404144 00000001 04051011 square checks: 1 1 after mod: 00000000 00000000 00000000 00000000 00000045 00515455 00000014 1540436e 80000001 040518bb modpow tp1^2: 00515455 00000014 1540436e 80000001 040518bb modpow tp1^3: 04185df8 30702882 8f020a0e 21c99ef5 71933b82 modinvold x: 04000000 00000000 00000000 20000000 00000000 modinv x: 04000000 00000000 00000000 20000000 00000000 modinvold = modinv? 1 modinvold tp1: 032020b2 a433507f 11c4eb55 8b55d084 055b41f8 modinv tp1: 032020b2 a433507f 11c4eb55 8b55d084 055b41f8 modinvold = modinv? 1 check (* tp1): 00000000 00000000 00000000 00000000 00000001 a6calc ectax: 00000000 00000000 00000000 00000000 0000007b ectay: 00000000 00000000 00000000 00000000 000001c8 a6: 00000000 00000000 00000000 00000000 0007338f Edouble ectax: 00000000 00000000 00000000 00000000 0000007b ectay: 00000000 00000000 00000000 00000000 000001c8 ectbx: 03d5af92 c8311d9e 8f56be4b 3e690aec 7cd200f2 ectby: 04024707 23532114 26377516 41226e7c 5abb09c3 a6 check: = 1 Eadd a+b x: ectcx: 0022cbce d6574009 2435995c 484de42b 96721bea ectcy: 031f6fc9 c25d2764 90a8144f 262db4aa ea9e1858 a6 check: = 1 ectdx: 00aa089a e4666a42 2e714651 ad937221 3fa65a93 ectdy: 012d29c6 30dda760 10397809 a6816be6 d2ffa815 3+1 = 2*2 check: =1 2*ectax,y = 03d5af92 c8311d9e 8f56be4b 3e690aec 7cd200f2 04024707 23532114 26377516 41226e7c 5abb09c3 3*ectax,y = 0022cbce d6574009 2435995c 484de42b 96721bea 031f6fc9 c25d2764 90a8144f 262db4aa ea9e1858 expo3*ectax,y = 07f495fb 649627d7 e3116cb3 0fea3628 918ea0b2 0059827a a0d93b10 4efca562 11a048d0 11edb4e5 expo4*3*ectax,y = 01f3437e 52ebbbf5 9ce149f4 7b207c60 3f790efa 01ae622b d3dd8a0a 33d5eb93 eab52496 7db3cbd0 expo4*ectax,y = 03cc4c97 4c4ac25b ffd5bc31 888c76ef 743484c3 027b41c2 686298ba 06270858 eb61b45a 93057056 expo3*4*ectax,y = 01f3437e 52ebbbf5 9ce149f4 7b207c60 3f790efa 01ae622b d3dd8a0a 33d5eb93 eab52496 7db3cbd0 check =: x 1 y 1 ------------------------------------------------------- This runs the DHKX algorithm 1 time. The check is verifying that the two different ways of computing the answer agree. ------------------------------------------------------- leibniz% ell-960117-gcc2O9 -dhkx 1 GF155 here. expo3*ectax,y = 07f495fb 649627d7 e3116cb3 0fea3628 918ea0b2 0059827a a0d93b10 4efca562 11a048d0 11edb4e5 expo4*3*ectax,y = 01f3437e 52ebbbf5 9ce149f4 7b207c60 3f790efa 01ae622b d3dd8a0a 33d5eb93 eab52496 7db3cbd0 expo4*ectax,y = 03cc4c97 4c4ac25b ffd5bc31 888c76ef 743484c3 027b41c2 686298ba 06270858 eb61b45a 93057056 expo3*4*ectax,y = 01f3437e 52ebbbf5 9ce149f4 7b207c60 3f790efa 01ae622b d3dd8a0a 33d5eb93 eab52496 7db3cbd0 check =: x 1 y 1 ------------------------------------------------------- This runs DH, using the known-generator optimizations. This is the fastest version of the key exchange, and represents a realistic timing for a real exchange between two Sparcs, after adding network latency. The time for the arithmetic here is 13.9sec/100 = 139 milliseconds for one key exchange. ------------------------------------------------------- leibniz% ell-960117-gcc2O9 -dhtm 1 GF155 here. DHTM test, 1 times: check =: x 1 y 1 leibniz% time ell-960117-gcc2O9 -dhtm 10 GF155 here. DHTM test, 10 times: check =: x 1 y 1 1.5u 0.0s 0:02 64% 0+336k 0+1io 0pf+0w leibniz% time ell-960117-gcc2O9 -dhtm 100 GF155 here. DHTM test, 100 times: check =: x 1 y 1 13.9u 0.0s 0:14 94% 0+344k 0+0io 0pf+0w ------------------------------------------------------- This runs the DHKX code without the known-generator optimization, and doing all four point-multiplication in the exchange. ------------------------------------------------------- leibniz% time ell-960117-gcc2O9 -ecv 100 GF155 here. ECV test for DHKX, 100 times: check =: x 1 y 1 36.4u 0.1s 0:37 97% 0+340k 0+0io 0pf+0w ------------------------------------------------------- Multiplication of two field elements. The product is 309 bits. The time per multiplication is 11.2sec/100000 = 112 microseconds. ------------------------------------------------------- leibniz% time ell-960117-gcc2O9 -m55 100000 GF155 here. Multiply test M55, 155*155 bits, 100000 times: 07531fca 53579ace ff003742 98765432 abcdef12 05371fac abdecf12 35795ace f003f742 98754632 001a94d3 ce915148 99c5bf6f 64daaef0 0f620c12 d6a32b04 d25a6f7a aeb1a2e5 c59b9857 71e42144 11.2u 0.0s 0:12 91% 0+336k 0+1io 0pf+0w ------------------------------------------------------- Inversion (reciprocal) of a field element. The inversion is done 100000 times. The check is done once at the end - the starting element is multiplied by the alleged inverse, and reduced. The answer should be (and is) 1. The time per inversion is 27.9sec/100000 = 279 microseconds. ------------------------------------------------------- leibniz% time ell-960117-gcc2O9 -ih2w 100000 GF155 here. Modular inverse test H2W, 155 bits, shift 261, 100000 times: 07531fca 53579ace ff003742 98765432 abcdef12 0122f978 80cfd72c f0e0a23e d44b572a 8cac28bd check: 00000000 00000000 00000000 00000000 00000001 27.9u 0.1s 0:31 90% 0+336k 0+0io 0pf+0w ------------------------------------------------------- An example of using the program for doing a field calculation - here, the field element "3 0 0 0 0" representing the polynomial u+1 is inverted. Notice a kludge: the inputs are specified in the reverse of the natural order, with the low order word of the field element first. Since we are interested in the answer and not the time, we only call the inversion routine 1 time. ------------------------------------------------------- leibniz% ell-960117-gcc2O9b -ih2w 1 3 0 0 0 0 GF155 here. Modular inverse test H2W, 155 bits, shift 155, 1 times: 00000000 00000000 00000000 00000000 00000003 07ffffff ffffffff ffffffff c0000000 00000000 check: 00000000 00000000 00000000 00000000 00000001 ------------------------------------------------------- Now we do the reciprocal of the answer from the previous problem. Naturally, we expect to get back the starting value, "3 0 0 0 0". The arguments are reversed because of the backward kludge; and they must be prefixed with "0x" for the number reader to read them in hex. We get the expected answer, 3. ------------------------------------------------------- leibniz% ell-960117-gcc2O9b -ih2w 1 0 0xc0000000 0xffffffff 0xffffffff 0x7ffffff GF155 here. Modular inverse test H2W, 155 bits, shift 155, 1 times: 07ffffff ffffffff ffffffff c0000000 00000000 00000000 00000000 00000000 00000000 00000003 check: 00000000 00000000 00000000 00000000 00000001 ------------------------------------------------------- An example of using the program to calculate the double of a curve point. The input point is X = "23 0 0 0 0", Y = "115 0 0 0 0". These are printed in hex as x1,y1. The polynomials are (x1,y1) = hex (17,73) = (u^4+u^2+u+1, u^6+u^5+u^4+u+1). The program automatically calculates the curve parameter A60 from the coordinates of the input point. The answer is (xd,yd). I won't try to write out the polynomials. ------------------------------------------------------- leibniz% ell-960117-gcc2O9b -e2calc 1 23 0 0 0 0 115 0 0 0 0 GF155 here. x1 = 00000000 00000000 00000000 00000000 00000017 y1 = 00000000 00000000 00000000 00000000 00000073 xd = 02feebfb afeebfba feebfbaf f948e523 948e5329 yd = 0030ebd5 830ebd58 30ebd583 0f3a069c f3a079cc ------------------------------------------------------- One defect of the program is that I haven't provided a full set of calls for doing all the various calculations you might want to do. And you'll have to look at the code for "main" to figure out which of the tests take optional arguments. These examples are on our 175MHz Alpha workstation. ------------------------------------------------------- DEC OSF/1 V3.0 Worksystem Software (Rev. 347) ------------------------------------------------------- First, the compilation. This is using the native compiler. I've named the binary to reflect the date & compiler information. The compiler declines to optimize "main" because it's too big. Just as well - a waste of time. ------------------------------------------------------- % cc -O4 -o ell-960117-ccO4 -DALPHA -DMAIN ell.c mult155b64c.s ell.c: mult155b64c.s: uopt: Warning: main: this procedure not optimized because it exceeds size threshold; to optimize this procedure, use -Olimit option with value >= 1400. ------------------------------------------------------- With 0 or 1 arguments, the program runs a sampler of various tests. Notice the print format for field elements is different from the Sparc, since only 3 words are required to contain a 155bit field element. ------------------------------------------------------- % ell-960117-ccO4 GF155 here. testing prarr: %0000000000000000 00000000000bcdef 0006789a00012345 % gf155m2 = 0000000007ffffff ffffffffffffffff fffffffffffffffe tp2 before:0000000000000000 0000000000000000 0000000000000000 tp2 after zeroing: 0000000000000000 0000000000000000 0000000000000000 tp2 after copy: 0000000000000000 00000000000bcdef 0006789a00012345 sum of tp1 + tp2: 0000000000000000 0000000000000000 0000000000000000 sum of tp1 + tpx: 0000000000000000 00000000000bcdef 0006789a00012347 tp2 after zeroing: 0000000000000000 0000000000000000 0000000000000000 product of tp1 * x: 0000000000000000 0000000000000000 0000000000000000 0000000000179bde 000cf1340002468a tp1 * tp1: 0000000000000000 0000000000000000 0000004550515455 0000001415404144 0000000104051011 square of tp1: 0000000000000000 0000000000000000 0000004550515455 0000001415404144 0000000104051011 square checks: 1 1 after mod: 0000000000000000 0000000000000000 0000000000515455 000000141540436e 80000001040518bb modpow tp1^2: 0000000000515455 000000141540436e 80000001040518bb modpow tp1^3: 0000000004185df8 307028828f020a0e 21c99ef571933b82 modinvold x: 0000000004000000 0000000000000000 2000000000000000 modinv x: 0000000004000000 0000000000000000 2000000000000000 modinvold = modinv? 1 modinvold tp1: 00000000032020b2 a433507f11c4eb55 8b55d084055b41f8 modinv tp1: 00000000032020b2 a433507f11c4eb55 8b55d084055b41f8 modinvold = modinv? 1 check (* tp1): 0000000000000000 0000000000000000 0000000000000001 a6calc ectax: 0000000000000000 0000000000000000 000000000000007b ectay: 0000000000000000 0000000000000000 00000000000001c8 a6: 0000000000000000 0000000000000000 000000000007338f Edouble ectax: 0000000000000000 0000000000000000 000000000000007b ectay: 0000000000000000 0000000000000000 00000000000001c8 ectbx: 0000000003d5af92 c8311d9e8f56be4b 3e690aec7cd200f2 ectby: 0000000004024707 2353211426377516 41226e7c5abb09c3 a6 check: = 1 Eadd a+b x: ectcx: 000000000022cbce d65740092435995c 484de42b96721bea ectcy: 00000000031f6fc9 c25d276490a8144f 262db4aaea9e1858 a6 check: = 1 ectdx: 0000000000aa089a e4666a422e714651 ad9372213fa65a93 ectdy: 00000000012d29c6 30dda76010397809 a6816be6d2ffa815 3+1 = 2*2 check: =1 2*ectax,y = 0000000003d5af92 c8311d9e8f56be4b 3e690aec7cd200f2 0000000004024707 2353211426377516 41226e7c5abb09c3 3*ectax,y = 000000000022cbce d65740092435995c 484de42b96721bea 00000000031f6fc9 c25d276490a8144f 262db4aaea9e1858 expo3*ectax,y = 0000000007f495fb 649627d7e3116cb3 0fea3628918ea0b2 000000000059827a a0d93b104efca562 11a048d011edb4e5 expo4*3*ectax,y = 0000000001f3437e 52ebbbf59ce149f4 7b207c603f790efa 0000000001ae622b d3dd8a0a33d5eb93 eab524967db3cbd0 expo4*ectax,y = 0000000003cc4c97 4c4ac25bffd5bc31 888c76ef743484c3 00000000027b41c2 686298ba06270858 eb61b45a93057056 expo3*4*ectax,y = 0000000001f3437e 52ebbbf59ce149f4 7b207c603f790efa 0000000001ae622b d3dd8a0a33d5eb93 eab524967db3cbd0 check =: x 1 y 1 ------------------------------------------------------- Run the DHKX algorithm. This is the basic test that the ecurve code is playing. The check verifies that the X and Y coordinates of the final point (calculated two ways) match. ------------------------------------------------------- % ell-960117-ccO4 -dhkx 1 GF155 here. expo3*ectax,y = 0000000007f495fb 649627d7e3116cb3 0fea3628918ea0b2 000000000059827a a0d93b104efca562 11a048d011edb4e5 expo4*3*ectax,y = 0000000001f3437e 52ebbbf59ce149f4 7b207c603f790efa 0000000001ae622b d3dd8a0a33d5eb93 eab524967db3cbd0 expo4*ectax,y = 0000000003cc4c97 4c4ac25bffd5bc31 888c76ef743484c3 00000000027b41c2 686298ba06270858 eb61b45a93057056 expo3*4*ectax,y = 0000000001f3437e 52ebbbf59ce149f4 7b207c603f790efa 0000000001ae622b d3dd8a0a33d5eb93 eab524967db3cbd0 check =: x 1 y 1 ------------------------------------------------------- This is the speed test for the DHKX algorithm, with the known-generator optimization. This is a realistic test of the latency for a key-exchange between two Alphas, although network latency would add noticeably to this time (11.6 msec). ------------------------------------------------------- % time ell-960117-ccO4 -dhtm 1000 GF155 here. DHTM test, 1000 times: check =: x 1 y 1 11.59u 0.07s 0:11 99% 0+1k 0+0io 18pf+0w ------------------------------------------------------- This runs the elliptic curve arithmetic for a key exchange without the known-generator optimization. It does the four point-multiplications. ------------------------------------------------------- % time ell-960117-ccO4 -ecv 1000 GF155 here. ECV test for DHKX, 1000 times: check =: x 1 y 1 31.71u 0.20s 0:31 99% 0+1k 0+0io 18pf+0w ------------------------------------------------------- This is an older, slow multiplication routine. It takes 13.6 microseconds. ------------------------------------------------------- % time ell-960117-ccO4 -m33 100000 GF155 here. Multiply test M33, 155*155 bits, 100000 times: 0000000007531fca 53579aceff003742 98765432abcdef12 0000000005371fac abdecf1235795ace f003f74298754632 001a94d3ce915148 99c5bf6f64daaef0 0f620c12d6a32b04 d25a6f7aaeb1a2e5 c59b985771e42144 1.36u 0.01s 0:01 100% 0+1k 0+0io 18pf+0w ------------------------------------------------------- This is the production multiplication routine. It takes only 7.07sec/1000000 = 7.07 microseconds per multiplication. ------------------------------------------------------- % time ell-960117-ccO4 -m1b6b 1000000 GF155 here. Multiply test M1B6B, 155*155 bits, 1000000 times: 0000000007531fca 53579aceff003742 98765432abcdef12 0000000005371fac abdecf1235795ace f003f74298754632 001a94d3ce915148 99c5bf6f64daaef0 0f620c12d6a32b04 d25a6f7aaeb1a2e5 c59b985771e42144 7.07u 0.04s 0:07 99% 0+1k 0+0io 18pf+0w ------------------------------------------------------- Three other alternative multiplication routines. 8.84, 7.62, and 9.92 microseconds. ------------------------------------------------------- % time ell-960117-ccO4 -m1b6c 1000000 GF155 here. Multiply test M1B6C, 155*155 bits, 1000000 times: 0000000007531fca 53579aceff003742 98765432abcdef12 0000000005371fac abdecf1235795ace f003f74298754632 001a94d3ce915148 99c5bf6f64daaef0 0f620c12d6a32b04 d25a6f7aaeb1a2e5 c59b985771e42144 8.84u 0.04s 0:08 99% 0+1k 0+0io 18pf+0w % time ell-960117-ccO4 -m1b6 1000000 GF155 here. Multiply test M1B6, 155*155 bits, 1000000 times: 0000000007531fca 53579aceff003742 98765432abcdef12 0000000005371fac abdecf1235795ace f003f74298754632 001a94d3ce915148 99c5bf6f64daaef0 0f620c12d6a32b04 d25a6f7aaeb1a2e5 c59b985771e42144 7.62u 0.05s 0:07 99% 0+1k 0+0io 30pf+0w % time ell-960117-ccO4 -m33d 1000000 GF155 here. Multiply test M33D, 155*155 bits, 1000000 times: 0000000007531fca 53579aceff003742 98765432abcdef12 0000000005371fac abdecf1235795ace f003f74298754632 001a94d3ce915148 99c5bf6f64daaef0 0f620c12d6a32b04 d25a6f7aaeb1a2e5 c59b985771e42144 9.92u 0.07s 0:10 99% 0+1k 0+0io 18pf+0w ------------------------------------------------------- This is the fastest multiplication, taking only 6.75 microseconds. Unfortunately, it's too big to share the Icache with the inversion routine. ------------------------------------------------------- % time ell-960117-ccO4 -m33e 1000000 GF155 here. Multiply test M33E, 155*155 bits, 1000000 times: 0000000007531fca 53579aceff003742 98765432abcdef12 0000000005371fac abdecf1235795ace f003f74298754632 001a94d3ce915148 99c5bf6f64daaef0 0f620c12d6a32b04 d25a6f7aaeb1a2e5 c59b985771e42144 6.75u 0.03s 0:06 99% 0+1k 0+0io 18pf+0w ------------------------------------------------------- Here's the compilation with the ECV switch to select the minimum code to just run the bare elliptic curve key exchange. ------------------------------------------------------- % cc -O4 -o ell-960117-ccO4ECV -DECV -DALPHA -DMAIN ell.c mult155b64c.s ell.c: mult155b64c.s: ------------------------------------------------------- The -DECV version uses a much simpler "main": the program is called with precisely one argument, the number of times to run the test. The time is about the same as for the -ecv test in the non-stripped version. ------------------------------------------------------- % time ell-960117-ccO4ECV 1000 GF155 here. ECV test for DHKX, 1000 times: check =: x 1 y 1 32.26u 0.16s 0:32 99% 0+1k 0+0io 16pf+0w -------------------------------------------------------