PCI EIDE CONTROLLER FLAWS (ABRIDGED) Revision 18: 1995 October 4 INTRODUCTION There are serious flaws affecting about 1/3 of all PCI motherboards. The flaws affect any motherboard or EIDE controller paddleboard containing the PC-Tech RZ-1000 PCI EIDE controller chip or the CMD PCIO 640 PCI EIDE controller chip. The flaws affect motherboards from ASUSTeK, AT&T, DEC, Dell, Gateway, Intel, Micron, NEC, Zeos and others. Since Intel makes so many of the motherboards sold under other brand names, the flaws affect many machines, both 486 and Pentium PCI. The flaws show up most frequently when you run a true multitasking operating system such as OS/2 Warp or NT. It also shows up under Windows For WorkGroups in 32-bit mode during tape or floppy backup and restore. In theory the flaws could do damage under DOS, DESQview, Windows and Windows For WorkGroups in 16-bit mode, but so far there have been no damage reports. Windows-95 contains code to bypass the flaws. The RZ-1000 has two flaws. The CMD-640 has those same two flaws plus three others. To make matters worse, most motherboard manufacturers using these two flawed chips connected them up incorrectly. There are software bypasses for these flaws. However, the Warp fix the CMD-640 reduces disk performance by 15 to 50%. The RZ-1000 fix has negligible impact on disk I/O though it can slow down background processes. I would advise new hardware to bypass the CMD-640 flaws, and living with software fixes to bypass the RZ-1000 flaws. WHAT ARE THE SYMPTOMS? When you are using an IDE or EIDE hard disk attached to the EIDE motherboard port, the flaws subtly corrupt your files by randomly changing bytes every once in a while. The flaws introduce bugs into EXE files, subtle errors into your spreadsheets, stray characters into your word processing documents, changes to the deductions in last year's tax return files, and random changes to engineering design files. This corruption happens when you are simultaneously using your EIDE or IDE hard disk and some other device, most commonly the floppy drive or mag tape backup. The same sorts of problem may occur on reading a CD-ROM drive attached to an EIDE port. TESTING FOR THE FLAWS I wrote two test programs that run under DESQview, Windows, Windows For WorkGroups, Windows 95, NT and OS/2. EIDEtest verifies that your hard disk in working properly, and CDtest verifies your CD-ROM. If these tests fail, it proves you have a serious problem, but not necessarily that you have the RZ-1000 or CMD-640 chip. If the tests pass, you still may have a problem since, especially under DOS, DESQview and Windows, the flaws may only show up very rarely. If you run the tests under Windows- 95 they will always pass, even if you have the defective chip, because the operating system already bypasses the flaws. WHAT CAN YOU DO IF YOU HAVE A FLAW? 1) Pester the manufacturer. Unfortunately, the EIDE controller chips are soldered in. The only way to repair a flaw is to replace the whole motherboard, recycling the socketed chips -- the CPU, DRAM and SRAM cache. It would be very expensive for computer and motherboard manufacturers to fix a flaw. 2) Buy a new unpopulated Triton PCI motherboard and recycle the CPU, DRAM and SRAM cache chips from the old motherboard. Unfortunately, the Triton chipset has design shortcuts that hamper performance in simultaneous I/O situations. At least they don't corrupt data. 3) Run the controller in degraded mode. Some BIOSes have a feature disable the EIDE prefetch buffer. Vendors may offer a BIOS upgrade to allow you to manually disable prefetch. The BIOS may also turn it off automatically if either of the defective chips is present. This will bypass both RZ-1000 flaws and two of the five CMD-640 flaws. 4) Buy a PCI EIDE paddleboard controller such as the DTC 2130S, the Tekram 290N/290S, the Promise 2300+ or the BusLogic BT-910 to replace the one on the motherboard. You must disable the EIDE controller on the motherboard. This fix will waste one of your precious slots. Be careful. You could be leaping out of the RZ-1000 frying pan into the CMD- 640 fire since paddleboards often use the CMD-640. 5) Buy a SCSI hard disk and CD-ROM, and avoid using the EIDE ports entirely. Under OS/2 and Linux, SCSI gives better performance, but costs more. DOS, Windows, Windows For WorkGroups and Windows-95 are unable to exploit the advanced features of SCSI, but at least avoid the EIDE flaws when you go pure SCSI. 6) Find a software work-around. There are fixes for Warp to bypass all the flaws in the RZ-1000 and CMD-640. Fixpack 10 is the first fixpack to bypass the flaws. Now that Intel and IBM have finally revealed the technical details, all the operating system writers can patch their EIDE drivers to bypass the flaws. There are also fixes for NT 3.1 and 3.5. 7) Get a BIOS upgrade. For DOS, DESQview, and Windows 3.1, to bypass the flaws you may need a new BIOS -- an EPROM chip. If you have a flash BIOS, you can update it simply by downloading a file. Most BIOSes already have code to bypass the flaws for DOS, DESQview and Windows. However, more advanced operating systems bypass the BIOS, so even a smart BIOS will not protect you. However, the BIOS CMOS settings may allow you to disable prefetch, which also protects you even in true multitasking operating systems. 8) Cut the trace. Cut the trace on the motherboard from the floppy changeline to the EIDE controller. However this just bypasses one of the CMD-640's five flaws and one of the RZ-1000's two flaws. 9) Use the Secondary EIDE Controller. Some motherboards such as the Micron P5-90 M54Pi-N 11P use different kinds of controller on the primary and secondary EIDE ports. The primary may be flawed, but the secondary OK. Whatever method you use to bypass the flaws, retest with EIDEtest and CDTest afterwards to be sure your fix worked and you caught all the problems. CLEANING UP THE MESS Once you have bypassed the flaws, you can start working the problem of cleaning up your files. The first thing to do is to re-install your operating system and all your application programs. This will replace any damaged EXE and DLL files. Catching errors in your data files is more difficult. Keep your eyes peeled for any improbable spreadsheet results. You may have to hire a programmer to write you some comb programs to sniff through your databases, looking for suspicious values. If you routinely use the verify feature of Lotus Magellan, it can detect changes to files that should not have changed. This may help you uncover some of the damage. The flaws are not polite enough to redate the files they corrupt. :-) If you have backups from before the time you bought the faulty machine, you can restore them and re-key everything. Most people will not be so fortunate. All their backups will also be corrupt. Most people with flaws will just have to put up with random errors dotting their data files ever after. WHAT ARE THE FLAWS? IBM Confirmed the RZ-100 has two different flaws: 1. In prefetch mode, multi-sector reads often fail. 2. The chip erroneously responds to floppy status commands and corrupts hard disk or CD-ROM I/O in the process. IBM confirmed the CMD-640 has five different flaws: 1. It has the same prefetch problem as the RZ-1000. 2. It has the same floppy status problem as the RZ-1000. 3. It does not support simultaneous I/O on the primary and secondary EIDE ports. 4. Confusion over legacy and PCI mode. 5. Does not support 32-bit writes. TEST PROGRAMS When accessing files on the Internet generally you must use lower case. On the Internet, I have posted my EIDEtest and CDtest programs for DOS, DESQview, Windows, Windows For WorkGroups, Windows 95, NT, OS/2 and Warp. They ensure your hard disk and CDROM will function without interference from background I/O activity. These indirectly detect the flawed RZ-1000 and CMD-640 chips. It also includes an unabridged 28-page version of this article, complete with references to essays, tests, and fixes for the various operating systems. By the time you read this, I may have posted a newer version. ftp://garbo.uwasa.fi/pc/diskutil/eidete18.zip alternatively ftp://ftp.cdrom.com/.4/os2/incoming/eidete18.zip or ftp://ftp.cdrom.com/.4/os2/sysutil/eidete18.zip CONTACTING THE AUTHOR The author, Roedy Green is a computer consultant who prefers to work on Forth, C++, Delphi, DOS, OS/2 and Internet Web projects. If you send me $5 (US or Canadian) to cover duplication, postage to anywhere in the world, and handling I will send you a diskette containing the relevant test programs, fixes, Internet postings and essays. Please report any machines with flaws. Send email to: Roedy@bix.com or discuss this problem on the Internet newsgroup in: comp.os.os2.bugs. You can also write via snail mail: Roedy Green Canadian Mind Products #601 - 1330 Burrard Street Vancouver, BC CANADA V6Z 2B8 (604) 685-8412 -30-