This file contains information explaining the workaround that
Microsoft has implemented in their compiler and run-time library
for the Intel Pentium processor's which contain a potential flaw
in four instructions.


----------
Compiler

By default, the compiler generates safe code for FDIV and FPREM.
Hooks are also provided for (user-written) replacements for FPTAN and
FPATAN.

Uses of the flawed instructions in inline-assembly code are flagged
(warning C4725, -W4) but not corrected by the compiler. Safe runtime
routines for a single, flexible form of the FDIV and FPREM instructions
are provided to aid manual user conversion of the code to the safe form
(see the "Runtimes" section below).

The compiler generates the above-mentioned safe (or replacable) sequences
by default. You can turn off the fix by using the compiler option "-QIfdiv-".

----------
Runtimes

The following runtime routines are provided. The names given here are
C names; prefix the name with an underscore ( _ ) to get the "true"
assembler/OBJ name.

    _adjust_fdiv -- Flag which tells if there is a 'flawed' Pentium installed.
	Used to 'short-circuit' calls to the safe routines in speed-critical
	sections of code.

	e.g. of use:
		...
		pushfd			; save flags (if needed)
		fld	op1		; load dividend
		fld	op2		; load divisor
		cmp	__adjust_fdiv,0	; 0=ok, !0=flawed
		jeq	ok		; brif ok
		call	__safe_fdiv	; safe version
		jmp	done
	    ok:
		fdivp	st(1),st(0)	; hdwr version
	    done:
		; either way, args gone, result on top of NDP
		popfd			; restore flags
		...

	Alternately one could just always call the safe version
	(slower, but safe):

		pushfd			; save flags (if needed)
		...
		fld	op1
		fld	op2
		call	xxx_fdiv
		...
		popfd			; restore flags

    _safe_fdiv -- safe divide routine

	Interface is same as for the x87 NDP 'FDIV' instruction
	    (aka FDIVP ST(1),ST(0))

	Takes two arguments on the NDP, pops them, does divide, pushes result
	onto NDP.

	Routine does 'safe' version of divide.

    _safe_fdivr -- safe reverse divide routine

	As for _safe_fdiv, but does reverse operation.

	Interface is the same as for the x87 NDP 'FDIVR' instruction
	    (aka FDIVRP ST(1),ST(0))

    _safe_fprem -- safe remainder routine (x87 compatible)

	As for _safe_fdiv, but does remainder.
	
	Interface is the same as for the x87 NDP 'FPREM' instruction.

    _safe_fprem1 -- safe remainder routine (IEEE conformant)

	As for _safe_fdiv, but does IEEE remainder.

	Interface is the same as for the x87 NDP 'FPREM1' instruction.

    _adj_fptan -- unsafe tangent routine (replacable)

	As for _safe_div, but (n.b.!!!) provides hooks only; does *not* do a
	Safe version.

	Interface is the same as for the x86 NDP 'FPTAN' instruction.

	Users who want a safe version must replace this routine with one of
	their own.

    _adj_fpatan -- unsafe arctangent routine (replacable)

	As for _adj_tan, but does atan.

	Interface is the same as for the x86 NDP 'FPATAN' instruction.

	Note: Does *not* do a safe version.

In summary:

	routine		safe?	replacable?
	-------		-----	-----------
	_adjust_fdiv	n/a	n/a
	_safe_fdiv	y	y
	_safe_fprem	y	y
	_safe_fprem1	y	y
	_adj_fptan	n	y
	_adj_fpatan	n	y


----------
Performance

Performance of FDIV on the following two interesting cases:

-- Worst Case: an (unrealistic) program which did nothing but FDIVs ran
-- Realistic Case: FPSpec, a set of FP-intensive programs

is as follows:

    - Worst Case:
			flawed pentium	good pentium
    unsafe code		(error)		1.0
    safe code		2.0		1.1

    - Realistic Case:
			flawed pentium	good pentium
    unsafe code		(error)		1.0
    safe code		1.10		1.01

In other words, the (extremely unlikely) worst case penalty is 10% on a good
Pentium, 2x on a flawed Pentium; and the realistic penalty is <1% on a
good Pentium, and 10% on a flawed Pentium.

As always, "Your mileage may vary."  That is, you may see no slowdown in a
realistic program, or you may see 2x in a realistic program.  If performance
is an issue for you, measure it and see what your actual results are.

----------
CAVEAT:

We have tested this fix extensively.  However, as with all software, there is
always a possibility that bugs remain.  We assume that customers will rigorously
test their applications to ensure correctness.

Accuracy of floating point operations is a complex subject.  Even with an
accurate set of 'atomic' operations, such as +,-,*,/,  a program can give
unexpected results.  The C/C++ standard does not in general guarantee
a specific order of evaluation for expressions, nor does it guarantee that
intermediate results will be forced to a particular precision, so two programs
that are logically equivalent on the surface may yield different results.

For a more detailed discussion of some of the above, see:

    Visual C++ documentation, "-Op" compiler option
    IEEE Floating-Point Standard
    C Language Standard

The first of these references is probably the most readable overview.

Most people need not worry about this, either because they do not use floating point at
all, or because they do not need an extremely high degree of accuracy.  Those
that do need to worry are urged to make sure they understand the issues rather than
blindly assume that the tools will "just work."