CBNORMAL 0.3a (32-bit version) Copyright Rob Weir, 1994-96 CompuServe: 71165,2722 Internet: rweir@cybercom.net This program is free for personal use. ======================================================================= WARNING: This program produces modified ChessBase data files, something quite difficult, and quite undocumented. This program seems to work for me, but don't you think it would be better if you made a backup of your BIG ChessBase database before using me?! ======================================================================= New in Version 0.3a! Fixed bug where we were ignoring the Pivot option in CBNORMAL.INI. Also, expanded the "AddComma" option to change: "Kasparov Gary" -> "Kasparov,Gary" ======================================================================= New in Version 0.3! This is a 32-bit port with many new features: a) Rewritten to take advantage of 32-bit features, like memory-mapped files and additonal available memory. b) Several additional heuristics for fixing malformed headers c) CBNORMAL.INI used to turn on/off the various modifications d) Practice mode, which shows you exactly what would be modified ======================================================================= New in Version 0.2! I've fixed a few bugs: 1) I now detect and don't let pass games which have checksum errors or games which require a user id to access. I've provided a program which you can download called CBCHKSUM.EXE which can read and unprotect such datafiles. 2) I now enforce a 47 character limit on the names and source fields. 3) I detect and ignore user-defined substitutions which are circular, i.e. "New York"="New York City". 4) Fixed a big where a field could be corrupted if the length was increased via a substitution. Other changes: 5) When calculating what determines a word (the smallest unit for user-defined substitutions) I now use a sequence of letters and numbers. Version 0.1 defined a word as a sequence of letters. 6) Changed the display of percent progress 7) New code for doing the search-and-replace operations which is faster Additions: 8) Changed date fields of year=1792 to year=blank (0). 9) Remove text in parenthesis from names. ======================================================================= Files you now have: CBNORMAL.TXT the file you are reading CBNORMAL.EXE the CBNORMAL program CBNORMAL.INI CBNORMAL configuration file NAMESUB.DAT sample name-field substitutions PLACESUB.DAT sample source-field substitutions PLAYERS.DAT large list of preferred player names ======================================================================= The program CBNORMAL takes a ChessBase data file and creates a new data file with user-defined modifications made to the text of the player and source fields in the game header. CBNORMAL produces a new file (always called CBNORMAL.CBF/CBI) which contains the modified games. The original data files are left untouched. Why would you want a program like this? If you are like me, you get games (directly or indirectly) from many different sources: the Internet, CompuServe, or other BBS's. These games come in many different formats, like CBF, PGN, NTF, NIC, etc. After I go through the effort of converting the games over to a single data format (CBF), I find that the games do not follow a common layout of the game headers. For example, I'll see the same game in several different styles: Karpov,A-Kasparov,G Moscow A. Karpov-G. Kasparov Moskva Karpow.A - Kasparov.G Moskau And so on. We see differences in spelling, usage of accented characters and umlauts, punctuation, spacing, etc. I wrote CBNORMAL to help with this problem. CBNORMAL reads a ChessBase file and creates a new file with the following modifications: 1) Convert "foreign" characters to the nearest ASCII representation. So, an accented letter 'i' is converted to a plain 'i'. German umlauts are converted in the standard way (o-umlaut goes to oe, etc). German "ess-zet" is converted to "ss". 2) The players names are put in a standard format like "Karpov,A-Kasparov,G". This may involve rearranging the names, deleting extraneous spaces and punctuation, etc. 3) User defined substitutions are applied. These are defined in the text files "NAMESUB.DAT" and "PLACESUB.DAT", which apply to the players field and source field respectively. Think of these files are a list of search-and-replace operations which are applied, in order, to every game. Each entry in these files look like this: "Kortschnoj"="Kortchnoi" The string on the left is the string to be search for, and the string on the right is the replacement. The search is case-sensitive and applies only to whole words, not to portions of a word. To be exact,CBNORMAL looks for the search pattern, and if it finds it, and the character before the beginning of the pattern is not a letter or number, and the character after the pattern is not a letter or number, then the substitution is made. For example, if you have an entry in your PLACESUB.DAT which says: "corr"="cr" this will match the following strings: "corr match" "thematic corr" "corr(9)" but will not match these: "Corr match" "correspondence" "corr9" Take a look at the two files NAMESUB.DAT and PLACESUB.DAT to get a feel for what can be done with them. Along with the user-defined substitutions, CBNORMAL has knows how to correct several categories of mal-formed headers. All of these heuristics can be individually enabled or disabled in the CBNORMAL.INI. Examine this file for more details. ======================================================================= CBNORMAL is easy to use. You just pass in the name of a ChessBase file as an argument and let it run. For example, if you have a ChessBase file of World Championship games called WCH.CBI and WCH.CBF, you run CBNORMAL like this: CBNORMAL WCH.CBF The results will be found in files CBNORMAL.CBF and CBNORMAL.CBI There is also a practice mode, which is enetered like this: CBNORMAL WCH.CBF -practice In this case, no output database is created, but instead three text files: NAMES.TXT - A list of all the modifications whcih were made to player names PLACES.TXT - A list of all the modifications whcih were made to the source fields SUBCOUNT.TXT - A tally of all user-defined substitutions which were made ======================================================================= Future directions? I'd like to do the following for future releases: 1) Add regular expression support for more sophisticated searches 2) Add more heuristics for dealing with mal-formed headers 3) Figure out how Dutch and German names should be expanded when the are abbreviated. When does vWijk become van Wijk or von Wijk, or when does vd become "von der" versus "von den" versus "van der" and "van den"? 4) Add option for users to selectively enable/disable the automatically applied heuristics,like character conversions. 5) Instead of just deleting the text within parentheses I the name field, I'd like to make sense of it. If it is a round number or country, I'd like to add it to the source field. If it is a rating, I should make apply it to the rating field. If it is an ECO code, I should put it there. Let me know if there is something you think would be really neat for CBNORMAL to do, or if you come up with a useful set of name or place substitutions you'd like to share. =======================================================================