The following is in response to many questions I have seen posted. -Paul ps: with the recent release of PGNSort, another, perhaps easier, method for breaking out a large file by ECO or opening descriptions is possible. The procedure outlined here is still useful for updating individual ECO or opening collections from files composed of miscellaneous games. (Re) Organizing Your PGN Collection =================================== There are many reasons for building a collection of chess games but common to all is some element of research during a review of the data. This means there must be a method to locate games of interest. The large chess file is one approach and I have heard from someone who maintains files holding 10,000 games each. One problem then is finding software which will handle such files. Perhaps only a large editor or word processor is suitable. ChessU4 and most of the other "U4" utilities are somewhat more modest--the optimum file size is 4000 games. Splitting games among smaller files can also have a disadvantage--one must have a system to locate a given game across multiple files. To reduce multiple-file searches, there are two reasonable methods for organizing files: 1) by ECO (Encyclopedia of Chess Opening) codes 2) by PGN Opening descriptions The second method has only been possible since early 1994 when PGN was introduced. Realistically, it came about a year later when chess software was finally developed to assign opening descriptions. Two "U4" programs which will add the "Opening" tag and description to PGN games are NORMAL.exe and ECOClass.exe. NORMAL does so by expanding previously assigned ECO codes into descriptions. ECOClass is more sophisticated--it will assign both the ECO codes and the descriptions. Using NORMAL - games must have ECO codes - the file, ECOIND.txt, available in many libraries, must be in the local (NORMAL) directory - run NORMAL with switch "expandECO=1" Using ECOClass - any valid PGN game can be classified - all materials necessary for classification are included with the download - see the ECOClass help file for instructions (ReadMeEC.txt) If 1000 games in a file of high-quality (GM) games were assigned descriptions, we might find this distribution, 195 - Queen's Gambit 170 - Sicilian 75 - English 75 - King's Indian 75 - Ruy Lopez 50 - Queen's Indian 40 - Caro-Kann 30 - French 30 - Gruenfeld 30 - Nimzo-Indian 30 - Perc That would account for all but 200 of the games. Right away we can see a method for distributing the games across several files. Is it going to be a big job? Not at all. The main problem will be one of concentration--not making any mistakes which would botch the process and cause one to start over. An old-fashioned pencil and paper will be necessary to take down game counts and make sure everything was accounted for at the end. First, however, an observation must be made to investigate the spelling of descriptions. If the games have been taken from numerous sources, it is perhaps best to reclassify them all to get standard opening names assigned. Looking just at the Queen's Gambit we might find these descriptions, [Opening "Queen's Gambit"] [Opening "Queen's Gambit, Accepted"] [Opening "Queen's Gambit, Declined"] [Opening "Queen's Gambit, Declined, Anti-Meran Defense"] [Opening "Queen's Gambit, Declined, Exchange System"] [Opening "Queen's Gambit, Declined, Meran Defense"] [Opening "Queen's Gambit, Declined, Semi-Slav Defense"] [Opening "Queen's Gambit, Slav Defense"] "Queen's Gambit" -- that makes sense. All we have to so is remember the apostrophe. It's not a bad idea either to include the starting quote in the search as some opening names are used again in variations of other openings. Now return to our mixed file of 1000 games. Call it file AAAAA.txt. Start ChessU4 with AAAAA.txt as input then enter the find command, > f* You will be prompted for a string... > "Queen's Gambit ...then a file name to save the selected games... > BBBBB.txt ChessU4 will reply with, "195 games written to BBBBB.txt." A new prompt will appear, Save the NOT component of selected games? y/n > Enter "y" (yes). Save to a third file, > CCCCC.txt Again, the echo, "805 games written to CCCCC.txt." The "NOT component save" subtracts out the games from the first iteration, saving what's left to a separate file. Our source file for the second pass will now be file CCCCC.txt, not AAAAA.txt. Now, to continue the process, do a restart (r) with ChessU4, opening the file CCCCC. Keep recording the game counts and file names on your process log to assure accuracy. Once file CCCCC has been opened, you can start from the top again, doing an initial "f*" search on, > "Sicilian When you're through, the final "NOT component" file will contain just the miscellaneous openings--those that weren't selected out by any of the searches. When done, you'll want to rename the files (but you probably already guessed that meaningful file names could have worked just as well.) You might also want to combine several of the less-common openings into a single file. You _could_ do this by appending but perhaps a single-pass method is simpler, > f* > <"Nimzo-Indian>,<"Gruenfeld>,<"Perc> The bracketed <> strings separated by commas tell U4 to select matches on string 1 *OR* string 2 *OR* string 3. If the commas were omitted, the condition is *AND* (not apt to be suitable to the above example). Other ChessU4 Searches ====================== The next time you want to find games matching a certain position, you'll know where to look. ChessU4 has two types of position searches. The basic position search requires a terminated line of notation, Queen's Gambit line 1.d4 d5 2.c4 c6 3.Nf3 Nf6 4.Nc3 dxc4 5.a4 Bf5 6.Ne5 e6 7.f3 Bb4 1/2 Put it as the only (or top) line in default file, GamesU4.txt. Start ChessU4 and open the Queen's Gambit file (mine was BBBBB). Select the position search option... > p ...then verify that the search line is in file GamesU4. On 200 Queen's Gambit games, you should get several matches. If you did not, use the "p*" position search command, > p* This tells ChessU4 to keep advancing towards the start of the line until a match is found. Once a match _has_ been found, "No" answered to the "search complete?" prompt will continue searching at an ever more shallow depth. (Pressing Enter is the same as answering, "No.") The games output to a file will be ordered from the nearest matches on down. Use the (r) restart option to open and review the saved games. That's enough for a start at reorganization of a chess game collection. There are many other possible approaches. One such would be to use PGNSrt followed by CChunk and process all the divisions in a single pass. ps: don't forget that ChessU4 can search game headers and notation in other ways as well. Here's one example to find all Alekhine's Def. games by searching on the ECO codes, > f* > ,,, ...and another that could be used to split games into the five ECO volumes... > f* > ECO "A ... > ECO "B [fin]