Rocksolid Light - comp.arch - Re: Two New 128-bit Floating-Point Formats

MitchAlsup wrote:
> On Wednesday, August 9, 2023 at 8:47:36â¯AM UTC-5, Terje Mathisen wrote:
>> MitchAlsup wrote:
>>> On Tuesday, August 8, 2023 at 5:41:47â¯PM UTC-5, JimBrakefield wrote:
>>>> On Tuesday, August 8, 2023 at 3:28:23â¯PM UTC-5, Quadibloc wrote:
>>>>> Only the _first_ of which is _intentionally_ silly.
>>>>>
>>>>> I have a section on my web site which discusses the history of the computer,
>>>>> at
>>>>> http://www.quadibloc.com/comp/histint.htm
>>>>>
>>>>> On that page, one of the many computer systems I discuss is the
>>>>> HP 9845, from 1978. This computer had amazing capabilities
>>>>> for its day; some have termed it the "first workstation".
>>>>> Unlike anything by Sun or Apollo, though, the processor for this
>>>>> computer, designed by HP, had an architecture based on the
>>>>> HP 211x microcomputer, but did calculations in decimal floating
>>>>> point.
>>>>> Hey, wait a moment. Isn't that a description of the processor chip
>>>>> used in HP pocket calculators, and the earlier HP 9830? How on
>>>>> Earth can something that does floating-point calculations at the
>>>>> speed of a pocket calculator, even a good one, be called a
>>>>> "workstation"?
>>>>> Well, further study allowed me to resolve this doubt. The CPU
>>>>> module of the 9845 included a chip called EMC, which did its
>>>>> floating-point arithmetic. It did it within a 16-bit ALU, and the
>>>>> floating-point format had a *binary* exponent with a range from
>>>>> -511 to +511. It *did* do its arithmetic at speeds considerably
>>>>> greater than those of pocket calculators.
>>>>>
>>>>> Well, the HP 85 may have been the world's cutest computer, but
>>>>> the HP 9845C seemed to me to have taken the crown for the
>>>>> most quintessentially geeky computer to ever warm the heart of
>>>>> a retrocomputing enthusiast that ever existed.
>>>>>
>>>>> Inspired by this computer, and by another favorite of mine, the
>>>>> famous RECOMP II computer, the one that's capable of handling
>>>>> numbers that can go 2 1/2 times around the world, I came up with
>>>>> this floating-point format...
>>>>>
>>>>> the intended goal of which is to be included, along with more
>>>>> conventional floating-point formats, in the processor for a
>>>>> computer that boots up as a calculator, but can then be
>>>>> switched over to full computer operation when desired.
>>>>>
>>>>> Here it is:
>>>>>
>>>>> (76 bits) Mantissa: 19 BCD digits
>>>>> (1 bit) Sign
>>>>> (51 bits) Excess-1,125,899,906,842,642 decimal exponent
>>>>>
>>>>> Initially, I had conceptualized the format as being closer to that
>>>>> of the RECOMP II, with one word of mantissa, and the sign and
>>>>> exponent in the second word.
>>>>> But then I thought of making the first 64 bit word into one BCD
>>>>> digit, and six groups of three digits encoded by Chen-Ho encoding.
>>>>> That would allow nineteen-digit precision.
>>>>> Then I decided that a 63-bit exponent was so large that it would
>>>>> be preferable to sacrifice some exponent bits, and have the same
>>>>> increase of precision without going to the extra gate delays
>>>>> required for Chen-Ho encoding.
>>>>>
>>>>> The ideas I played with in that chain of thought then turned my
>>>>> attention to how they might be used for a more serious
>>>>> purpose.
>>>>> Remember John Gustafson, and his quest, first with Unums, and
>>>>> then with Posits, to devise a better floating-point format that
>>>>> would help combat the dangerous numerical errors that abound
>>>>> in conventional floating-point arithmetic?
>>>>> Perhaps I could come up with something more conventional
>>>>> that would go partway, at least, towards providing the facilities
>>>>> that his inventions provide.
>>>>>
>>>>> And here is where that chain of thought went:
>>>>>
>>>>> (1 bit) Sign
>>>>> (31 bits) Excess-1,073,741,824 binary exponent
>>>>> (96 bits) Significand
>>>>>
>>>>> Providing a wide exponent range (like Posits and Unums) and a high
>>>>> precision (like Unums) but both within the bounds of reason, and
>>>>> without any uncoventional steps, like decreasing precision for
>>>>> large exponents, or having the length of the number variable.
>>>>> But there's something *else* that I also came up with to do when
>>>>> implementing this floating-point format in order to help it achieve
>>>>> its ends.
>>>>>
>>>>> Seymour Cray was the designer of the Control Data 6600 computer.
>>>>> It had a 60-bit word. When he designed the Cray I computer, although
>>>>> he surrendered to the 8-bit byte, and gave it a 64-bit word, apparently
>>>>> he still felt that the 60-bit floats of the 6600 provided all the precision
>>>>> that anyone needed.
>>>>> So the floating-point format of the Cray I had an exponent field that
>>>>> was 15 bits long. But the defined range of possible exponents in that
>>>>> format would fit in a *14-bit* exponent field.
>>>>> I guess this would make it easier to detect and, even more importantly,
>>>>> to recover from floating-point overflows and underflows.
>>>>>
>>>>> At first, I thought that simply copying this idea would be useful.
>>>>> Then, inspired by the inexact bit of the IEEE-754 standard, I
>>> <
>>>> Would think that in most calculations, most calculated values are inexact.
>>>> Previously considered taking one mantissa bit to indicate inexact.
>>>> Which is a painful loss of accuracy.
>>> <
>>> The most precise thing we can routinely measure is 22-bits.
>>> The most precise thing we can measure in 1ns is 8-bits.
>>> The most precise thing we have ever measured is 44-bits.
>>> {and this took 25+ years to decrease the noise to this}
>>> <
>>> However: there are lots of calculations that expand (ln, sqrt) and
>>> compress (^2, exp, erf) the number of bits needed to retain precision
>>> "down the line". Just computing ln2() to IEEE accuracy requires
>>> something-like 72-bit of fraction in the intermediate x*ln2(y) part
>>> to achieve IEEE 754 accuracy in the final result of Ln2() over the
>>> entire range of x and y.
>>> <
>>> This is the problem, not the number of bits in the fraction at one
>>> instant. It is a problem well understood by numerical analysists
>>> ..........And something casual programmers remain unaware of
>>> for decades of experience.........
>>>>
>>>> So how many values are exact? Can those values be encoded into
>>>> the NAN bits? If so, why not let inexact be the default, thereby allowing
>>>> one to use round-to-odd thereby eliminating double rounding issues?
>>>> (one would still follow the rule of rounding the nearest representable value)
>>> <
>>> Nobody doing real FP math gives a crap about exactness. The 99%
>>> Only people testing FP arithmetic units do. The way-less-than 0.1%
>>> <
>>> Consider: COS( 6381956970095103Ã2^797) = -4.68716592425462761112E-19
>>> <
>>> Conceptually, this requires calculating over 800-bits of intermediate
>>> INT(2/piÃx) !!! to get the proper reduced argument which will result in
>>> the above properly rounded result.
>>> <
>>> To get that 800-bits one uses Payne-Hanek argument reduction which
>>> takes somewhat longer than 100 cycles--compared to computing the
>> 100 cycles seems to be way to large, but maybe this is what current
>> implementations need?
>>> COS(reduced) polynomial taking slightly less than 100 cycles.
>>> <
>>> I have a patented method that can perform reduction in 5 cycles: and a
>>> designed function unit that can perform the above COS(actual)
>>> in 19 cycles.
>> This is of course really wonderful, to the point where I want _all_ FPUs
>> to license and/or emulate your algorithms.
>>
>> However, the actual range reduction should be significantly faster than
>> 100 cycles, even in pure SW:
>>
>> Load and inspect the exponent, use it to determine a byte-granularity
>> starting point in the reciprocal pi expansion, then multiply the
>> mantissa by a 96-128 bit slice (two 64x64->128 MUL operations).
> <
> static void ReduceFull(double *xp, int *a, double x)
> {
> Double X = { x };
> int ec = X.s.exponent - (1023+33);
> int k = (ec + 26) * (607*4) >> 16;
> int m = 27*k - ec;
> int offset = m >> 3;
> x *= 0x1p-400;
> double xDekker = x * (0x1p27 + 1);
> double x0 = xDekker - (xDekker - x);
> double x1 = x - x0;
> const double *p0 = &TwoOverPiWithOffset[offset][k]; // 180 DP FP numbers
> const double fp0 = p0[0];
> const double fp1 = p0[1];
> const double fp2 = p0[2];
> const double fp3 = p0[3];
> const double f0 = x1 * fp0 + fp1 * x0;
> Double f = x1 * fp1 + fp2 * x0;
> const double fi = f0 + f;
> static const double IntegerBias = 0x1.8p52;
> double Fi = { fi + IntegerBias };
> *a = Fi.s.significand2;
> double fint = Fi.d - IntegerBias;
> const double fp4 = p0[4];
> const double fp5 = p0[5];
> const double fp6 = p0[6];
> f = f0 - fint + f;
> f += x1 * fp2 + fp3 * x0;
> f += x1 * fp3 + fp4 * x0;
> f += x1 * fp4 + fp5 * x0;
> f += x1 * fp5 + fp6 * x0;
> *xp = f * 0x3.243F6A8885A3p-1;
> }
> along with a large array of FP numbers representing 2/pi. From an
> old SUN library. Double (with the capital first letter is a union overlay)
> on a double.
>>
>> We discard the whole circles and normalize, then we use the top N (3-5)
>> bits to select the reduced range poly to use.
> <
> After you create this result, the top 2 bits are the quadrant, but up to
> 61 bits between there and the hidden bit of the reduced argument
> can be 0 and have to be normalized away.
>>
>> All this should be eminently doable in less than 20 cycles, right?
> <
> See function.

Thanks for posting!

I see immediately that in order to be portable, they are using a lot of
redundant storage and doing a lot more FMULs than what I would need when
using integer 64x64->128 MULs, on top of misaligned loads, as the
bulding block.

I would of course also do the actual poly evaluation with integer
operations, but that happens later.

Accessing a byte array means that I get maximum 7 extra leading bits, so
after the 53 x 128 mul, I have at least 121 useful bits, with the bottom
part capable of swallowing any carries from the truncation error at the
end of the 128-bit reciprocal, right?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Subject	Replies	Author
Two New 128-bit Floating-Point Formats By: Quadibloc on Tue, 8 Aug 2023	25	Quadibloc

There are three kinds of people: men, women, and unix.

computers / comp.arch / Re: Two New 128-bit Floating-Point Formats