Number Systems
        (PC Magazine Vol 4 No 13 June 25, 1985 PC Tutor) 

     One of the first things newcomers notice when they start reading
computer magazines is the frequency with which they encounter
hexadecimal (base 16) numbers.  There are sound reasons for using hex
notation in preference to decimal (base 10) notation.  Even binary
(base 2) notation should be used in certain instances.  The figure
below shows (a) the symbols used to denote different digits and (b)
several examples of numeric equivalences for the four number bases
commonly used in computer writing and programming.
- - - - -
(a)  Name           Base      Numeral Set
     ----           ----      -----------
hexadecimal          16       0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
decimal              10       0,1,2,3,4,5,6,7,8,9
octal                 8       0,1,2,3,4,5,6,7
binary                2       0,1

(b)  hex       decimal        octal               binary
     ---       -------        -----               ------
      1           1             1                            1
      8           8            10                         1000
      A          10            12                         1010
      F          15            17                         1111
     10          16            20                        10000
    3E8        1000          1750                   1111101000
    400        1024(1K)      2000                  10000000000
   FFFF       65535        177777             1111111111111111
- - - - -
     One thing evident is the "obvious" truth that the more different
symbols you have available to represent a number, the shorter the
number can be written. FFFF (hex), 65535 (decimal), 177777 (octal),
and 1111111111111111 (binary) are all the same number, but the first
representation takes 4 places and the last takes 16.  Humans have
trouble keeping track of long strings of numbers, but for a machine
this system is much easier than keeping track of a lot of differently
valued symbols.  Two is the minimum number of symbols needed to
represent 1 and not-1 (i.e., 0), and all machine code consists of
nothing but strings of binary numbers.
     As a concession to human weakness, references to "computer
numbers" are conventionally translated (even by a utility like
DEBUG, which works directly on machine code) into hex form.  Each
hex digit is the equivalent of four binary digits (bits):

     hex       binary              hex       binary
      0         0000                8         1000
      1         0001                9         1001
      2         0010                A         1010
      3         0011                B         1011
      4         0100                C         1100
      5         0101                D         1101
      6         0110                E         1110
      7         0111                F         1111
     Two hex digits thus represent 1 byte of information and from the
table above you can always translate down into binary to see which
bits are 1s and which at 0s.
     Bytes need not have been defined containing 8 bits.  Earlier
in computer history, grouping things in terms of the possible values
of 3 bits at a time was popular giving rise to octal notation.  (The
last mention of octal.)  But the "powers of x" idea is as strong in
computer design (where x is 2) as in "human" design (where x is 10),
because it is the most efficient way to handle place value.
Mathematically, if a number is represented as a sequence of digits:

dn ...d3d2d1d0 is a number base b

then the value of that number is:

d*bn + ... d*b3 + d*b2 + d*b1 + d*b0

     A few examples of working with hex in terms of the PC's 8088
microprocessor will show how much clearer hex makes things than
does decimal.  The 8088 has 20 address lines.  Each line can be a 0
or 1 (this is digital; ones are +5 volts and zeros are 0 volts).
Thus, the 8088 can address at most:

11111111111111111111
(or FFFFF hex or 1,048,575 decimal)

characters (bytes).  We generally say that the 8088 has a 1-megabyte
memory capacity.  The exact number in base 10 (1,048,575) is
meaningless, but it is very meaningful in either binary or hex.
Both are much closer to the way the computer thinks.
     Thus, if you work in hex, just by knowing that the 8088 has 20
address lines you immediately know its directly addressable memory
capacity.  Similarly, if you tell me that an integer is stored at
0305 hex, I know immediately that (1) the machine uses 16-bit
addresses, and (2) when the integer is accessed, lines 9, 8, 2, 0
should be high; the rest should be low.  This is because 0305 hex =
0000001100000101 binary.  (Line numbering begins with the right-most
digit.)
     Consider next the way the 8088 computes the address of objects.
It does this by combining the contents of two registers (normally,
DS and DX), each of which contains a 16-bit quantity (4 hex digits).
If we refer to DS:DX, for example, the actual address in decimal would
be calculated via:

effective address = DS * 16 + DX

This looks a bit strange to me in decimal notation, and it is not
easy to calculate -- try it, assuming DS is 4096 and DX is 256,
for example.  If we switch to hex, however, the problem becomes:

effective address = DS * 10 + DX

which looks a lot simpler.  It says, put a 0 at the end of the
number in DS and add it to that in DX.  Now, using the same numbers
as above (in hex), we have:

          DS             1000
          DX              100 DX
          DS * 10       10000 + DS * 10
          DS * 10 + DX  10100 ===

Thus, the effective address is 10100 hex.  If you did it in decimal,
you'd come out with 65792. Conclusion: when referring to the IBM PC,
it often makes more sense to state computer references in hex or binary.
     Now let's see why computer languages use binary internally for
calculations.  The internal representation of a number is the way a
computer encodes it into its internal memory (registers).  There is
actually no intrinsic reason that a computer or computer language
could not use decimal notation internally, but there are good
mathematical reasons why it usually doesn't.
     We should start by understanding why the computer chips have
small registers.  Why does the 8088 on the IBM PC have 16-bit
registers rather than 100-bit registers which would produce greater
accuracy?  Today's digital logic requires one line (an electrical
path) for each binary digit (bit) in a number.  This is a consequence
of recognizing only two voltage levels: low (0) and high (1).  For
each line on a chip, there are probably 5 to 10 more supporting chips
on the computer motherboard for buffering and control.  Thus, the
fewer outside lines a microprocessor chip has, the cheaper it is to
design and build both the chip and its support.  The 8088 is relatively
cheap since it provides only 8 data lines to the outside world, even
though internally it "thinks" with 16 data lines.  (The 8088 has an
8-bit external data bus and a 16-bit internal data bus.)  This means
that sometimes a numeric transfer requires two successive accesses to
memory: one for the low 8 bits and one for the high 8 bits.
     The 80286 used by the PC AT, on the other hand, has a full
16-bit data bus so programs with a lot of memory-based arithmetic
automatically run more quickly, even ignoring such other speedups
as a higher clock rate.  The 8086 (used on the AT&T 6300 and the
CompaqPlus, among others) also has a 16-bit data bus.  Even though
the 8086 was available when the PC was developed, IBM opted for
the 8088 because of the dollar savings resulting from using 8
rather than 16 data lines.
     Thus, to reduce cost in computer designs the hardware people
use as few data bits as possible, both internally and externally.
Microcomputers are usually 8- or 16-bit designs, while mainframes
are typically in the 60- to 80-bit range.
     While the hardware designers have an interest in reducing
the number of bits in a register, the language writers, on the
other hand, want to minimize the errors that result from arithmetic
roundoff and overflow.  One way they do this is by using binary
arithmetic internally.  Let's compare arithmetic strength -- that is,
the accuracy and range -- of an 8088 language that represents numbers
in binary form internally (which we read as hex) with one that uses
decimal internally.
     A single hex digit has a maximum value of 15 decimal (1111
binary).  Hence it requires 4 bits in order to represent the full
complement of hex digits.  A decimal digit also requires 4 bits,
however.  (Try to represent 8 or 9 with 3 bits, if you're skeptical.)
An 8088 register holds 16 bits of information.  Thus, the 8088 can
manipulate either (but no more than) 4 hex digits or 4 decimal degits
(16 bits total).  Now consider a number, x, that equals 1237 decimal
(4d5 in hex).  A decimal language stores x as:

  1    2    3    7
0001 0010 0011 0111

A binary language stores x as:

  0    4    d    5
0000 0100 1101 0101

In this example, the only difference lies in the encoding method, but
a hint is given by the fact that the binary/hex version needed only
three digits to represent the decimal quantity 1237.  If you now return
to the last example shown in the figure above, you'll see that since
the 8088 has 16-bit registers it can easily manipulate any number up to
FFFF (65535 decimal) by using hex encoding.  On the other hand, if the
8088 uses decimal encoding it can only easily manipulate numbers up to
9999.  This means that by thinking internally in hex (or binary), a
language running on the PC can increase the usable range of its
arithmetic by 6.5 times, in comparison with decimal.  The disparity
gets even larger with bigger word sizes.  A machine with 32-bit
internal arithmetic increases its arithmetic range by 43:1 if it uses
hex, not decimal, internal representation.

                  (continuation of above .....)
         (PC Magazine Vol 4 No 14 July 9, 1985 PC Tutor)

     The situation with floating-point numbers is harder to describe,
but the end result is essentially the same.  In floating-point
arithmetic, the average error involved in representing a floating-
point number is lower if the computer things in binary (hex).  This
may need some explanation, however, for one result of using binary
notation was that BASIC had a bit of "trouble" when it was first
released.  While one would expect that 10*0.3 = 3.0, this does not
necessarily happen when using binary languages.
     The reason for the seeming inaccuracy in floating-point
calculations can be easily demonstrated using decimal notation.
Consider the equation:

1 - 3 * (1/3) = x

You would think x could only b 0, but consider the way a 4-digit
decimal language might do the arithmetic:

1/3 = .3333
3 * (1/3) = 3 * .3333 = .9999
1 - 3 * (1/3) = 1 - .9999 = 0.0001

In fact, if the floating-point routines were written poorly, that last
statement could even be as bad as:

1 - 3 * (1/3) = 1 - .9999 = 1.000 - 0.999 = 0.001

This would occur if the third step required that each number be
represented by a 4-digit number which would drop off the last "9"
digit.  In reality, it takes a tougher test to cause the binary
floating-point routines in IBM's current BASIC to conk out.  If you're
interested in doing so, however, you might try:

PRINT 5 - 25 * (2/25 + 3/25)

We, who can scan the algebra, can see that the answer should be 0 but
the computer cannot.  (Some high-level computer languages do support
algebraic reduction, but that's a separate issue.)
     Conclusion: computer languages generally process numbers
internally in binary.  This increases the range of the arithmetic,
decreases roundoff errors, and reduces the number of overflow errors.
Binary representation, however, may sometimes cause unexpected results
since noncomputer people think in decimal.
     To avoid this last difficulty, some business languages (notably
 C-BASIC) use decimal notation internally, thus ensuring that dollars
and cents line up.  Although that makes sense at first blush, consider
some arithmetic such as:

purchase 1 item at 3 for $1.00 or 1 for .33 (.34?)

In any problem involving division, the greater number of digits made
available by using binary rather than decimal representation internally
will yield potentially greater accuracy.
     As we've seen, an integer can be represented internally in many
different forms.  Here we must discuss two standards: the IBM BASIC
representation and the IEEE proposed standard (used by the 8087
coprocessor implementation).  In each case, there are different levels
of precision, which allow tradeoffs between storage and accuracy
requirements, just as there are when you select between single- and
double-precision in BASIC or FORTRAN.
    The IEEE (Institute for Electrical and Electronics Engineers) is an
august body that, among other things, tries to set standards for the
industry.  This is a worthwhile task since it promotes transportability
between different manufacturer's hardware and different language
writers' compilers and interpreters.
     IBM's implementation of the 8087 is almost completely IEEE
compatible, both in data format and in the algorithms it uses to
compute errors due to rounding, overflow and truncation.  I know this
because, among other reasons, I once had a number of sophisticated
numerical-analysis programs running on an Amdahl mainframe computer
in APL.  When these applications were moved to the IBM PC, they worked
without modification.  What is more, the answers were identical to the
14th decimal place because the language on the Amdahl and the language
on the PC used the same floating-point format.
     For some reason, however, IBM BASIC does not adhere to the IEEE
standard, so we need to know two different formats to understand
floating-point math on a PC.  The various formats are:

          Name       Number      Range    Mantissa   Exponent
                     of Bits                Bits       Bits
          ----       -------     -----    --------   --------
     IBM BASIC:
        Integer      16          32767        -          -
        Single       32          1E+38       24          8
        Double       64          1E+38       56          8
     IEEE (8087):
        Word Int.    16          32767        -          -
        Short Int.   32           1E+9        -          -
        Long Int.    64          1E+19        -          -
        Packed BCD   80          1E+18        -          -
        Short Real   32          1E+38       24          8
        Long Real    64         1E+308       53         11
        Temp. Real   80        1E+4932       64         15

     As you examine the above, you will see that IEEE short real
format has the same overall characteristics as a single-precision
IBM BASIC (although the two are not absolutely identical).  The
IEEE long real format, however, is very different from double-
precision BASIC, even though both representation use the same
number of bits.  The IEEE long real uses more bits for the exponent
and fewer for the mantissa.  Thus, IBM's double-precision BASIC is
more precise (number of digits of precision) but has less range than
the IEEE long real.
     You may wish to note in passing that for business applications
the IEEE defines a data type called packed BCD (Binary Coded Decimal).
This is our old friend the decimal-language representation.  Packed
BDC consists of an 18-digit fixed-point number (which requires 9
bytes to store and process), with the 10th byte reserved for the
sign [+ or -] of the number.
     Returning to floating-point notation, one useful question to
consider is this:  What does a known mantissa, sign, and exponent
in binary notation translate into in decimal terms?  The equation
for this is:

x = sign * .mantissa * (2 ^ exponent)

The period before the mantissa signifies that the mantissa comes after
the decimal point (base 2).  Thus, for example, suppose we have a sign
of +1, a mantissa of C000 hex and an exponent of 12 hex.  The number in
hex then is:

+ 1 * .C000 * (2 ^ 12)

or (in decimal):

+ 1 * .75 * (2 ^ 18) = 196608

Why is .C000 equal to .75?  Consider that in decimal .3 is 3/10; in
hex .C is C/10 or 12/16 = .75 in decimal.
     Since the sign can be only plus 1 or minus 1, this is often
encoded as a single bit (off/0 == -1, on/1 == +1).  Further, in
base 2 the mantissa must begin with a 1.  (Otherwise you could
start one digit to the right and decrease the exponent, gaining
more bits of precision.)  Thus, sometimes the leading 1 is omitted
from the representation entirely.
     For the 8088 byte 0 is usually stored at the highest address
and the rightmost byte at the lowest address.  Further, in order
to be able to represent negative exponents, the actual exponent
is calculated by taking the binary (unsigned value) and subtracting
a bias.  The bias values are:

BASIC Single-Precision        128
      Double-Precision        128
IEEE  Short Real              127
      Long Real              1023
      Temporary Real        16383

     The following short program allows us to see just how the
various pieces of a floating-point number fit together when we're
using IBM single-precision BASIC.

5  'Assignment
10 J=0:N=0  'so varptr works, set all scalars
20 FLOAT=.xxx    'surprise (actually, I used 0.12)
30 J=VARPTR(FLOAT)   'where is the number?
40 FOR N=3 TO 0 STEP -1   'since bytes are stored backwards
50 PRINT PEEK(N+J)        'read the bytes that make up FLOAT
60 NEXT N
70 END

     Suppose that the printout from the program gives us the
following decimal numbers:

125
117
194
143     What is the value of xxx?

     First, we know that the exponent is stored in byte 0 (single
precision), so the exponent is 125 to 128 decimal (bias of 128).
Thus, the first part of the puzzle is:

2 ^ -3 = .125 (exponent portion)

     The sign bit is the first bit of byte 1. Since 117 decimal =
75 hex, we can deduce that the first bit is 0, so we can tell
that the result is positive.  How?  If the first hex digit were 8
or greater, the first bit in the byte would be a 1, and the sign
would be negative.  The second piece thus becomes:

          + 1 * 2 ^ -3

     Finally, we remove the first bit (0 anyway) and add to that byte
the leading 1 that was implicit, and we get the bytes:

byte 1 = 75 + 80 (hex) = F5 hex
byte 2 = 194 (decimal) = C2 hex
byte 3 = 143 (decimal) = 8F hex

     In this way, our mantissa becomes .F5C28F hex.  Note the decimal
place: .F5C28F = F5C28F/1000000 (hex). The mantissa takes up 6 hex
digits (24 bits), so when we put everything together, we have, finally:

+ 1 * (F5C28F/1000000) * (2 ^ -3)  hex

     In decimal, this is 0.1199999...., which is fortunate since, in
point of fact, I had actually set FLOAT to 0.12 originally to get the
numbers for the example.
     As you can see, understanding how IBM BASIC handles even
single-precision floating-point arithmetic is no piece of cake.
However, if interest warrants, I'll go into more detail about the
8087 routines in a future column.
                                                                                                                                