UKC_SIDX.ARJ ~~~~~~~~~~~~ Two Special Surname/Soundex indexes to the 2% Census Sample ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This archive contains:- UKC_SIDX.TXT The file you are now reading. UKC_NI0.TXT ) Two name indexes, derived from the original name UKC_SDX.TXT ) index to the 2% Census Sample. These files have been contructed specially for use with the program XTRACT, written by Ron MacRae and Rosemary Lockie, to help with extraction of households with specified surnames from the UK 2% Census Sample files, UKC_ccc.ARJ. Our program will look up the surnames(s) you specify for your search and generate the appropriate search request by selecting the appropriate UKC_ccc.ARJ files to search for the counties the surname occurs in - automatically. Both files contain a list of surnames found in the various county files, and are derived from the original name index, UKC_NIDX.TXT. UKC_NI0.TXT is a straight alphabetical surname listing. UKC_SDX.TXT has the soundex code for the surname added, and is sorted in order of soundex code. UKC_NI0.TXT began as a straight copy of UKC_NIDX. However for ease of use within XTRACT, and to keep the overall size of the index to a minimum, the following changes were made. 1. All counties for one surname have been combined onto the one line, separated by commas. The county trigraphs have been replaced with dinomes, 01 to 92 to represent the UK counties. 2. Trailing question marks on surnames have been ignored, so that entries for BROWN and BROWN? or BROWN?? have been combined together in the resultant index. N.B. Question marks elsewhere in the surnames have been retained. 3. Some of the entries in the original surname index have been split, if there appears to be more than one choice of surname. So for instance, two entries have been made for "SINCLAIR OR MCKELLAR", "SINCLAIR" and "MCKELLAR" (found in BUT5101.TXT) However, "DE LA MOTTE" (DOR5106.TXT) and "VAN DEN HONERT" (WAR5117.TXT) and similar have been retained as single names (in these two examples, if the first name is less than 4 characters - although the overall algorithm used for splitting is rather more complicated than that). Together, these two changes have resulted in a 3% saving in the size of the overall straight name index file:- 553,680 bytes, compared with 783,438 bytes in the original. UKC_SDX in its raw state adds an additional 733,290 bytes (229,625 bytes compressed). The format of the two files is as follows:- UKC_NI0.TXT format surname{tab}dd,dd,dd... UKC_SDX.TXT format sndx{sp}surname{tab}dd,dd,dd... In UKC_SDX, a single space separates the soundex code from the surname. A {tab} character (ASCII value 09) is used to separate the surname (variable length) from the list of dinomes. The soundex code is always 4 characters, and either of these indexes may be imported into a database file if desired. If so, you will need to know that the maximum length of line is 236 characters, and the maximum length of surname contained within the 236 characters is 19. The way to do this would be to create a database with the following structure:- Soundex 5 (may be reduced to 4, after importing. 5 characters (allows for the space on import. Data 236 Surname, and list of county dinomes. Surname 19 To be filled in after import. Please note that if you wish to separate the surname out as a separate field, you can do so with the following dBase command, or similar in your own database language:- replace all surname with left(data,at(chr(9),data)-1) A table of the counties, and the digraphs chosen follows:- 01 ABD Aberdeen 47 LKS Lanarkshire 02 AGY Anglesey 48 LAN Lancashire 03 ARL Argyll 49 LEC Leicestershire 04 AYR Ayrshire 50 LIN Lincolnshire 05 BAN Banff 51 LLS Linlithgow 06 BDF Bedfordshire 52 MER Merioneth 07 BRK Berkshire 53 MDX Middlesex 08 BEW Berwick 54 MLN Midlothian 09 BRE Brecknockshire 55 MON Monmouth 10 BKM Buckingham 56 MGY Montgomery 11 BUT Bute 57 MOR Moray 12 CAI Caithness 58 NAI Nairn 13 CAM Cambridgeshire 59 NFK Norfolk 14 CGN Cardiganshire 60 NTH Northamptonshire 15 CMN Carmarthenshire 61 NBL Northumberland 16 CAE Carnarvonshire 62 NTT Nottinghamshire 17 CHS Cheshire 63 ORK Orkney 18 CLK Clackmannan 64 OXF Oxfordshire 19 CON Cornwall 65 PEE Peebles 20 CUL Cumberland 66 PEM Pembroke 21 DEN Denbighshire 67 PER Perthshire 22 DBY Derbyshire 68 RAD Radnor 23 DEV Devon 69 RFW Renfrew 24 DOR Dorset 70 ROC Ross 25 DNB Dumbartonshire 71 ROX Roxburgh 26 DFS Dumfries 72 SEL Selkirk 27 DUR Durham 73 SAL Shropshire 28 EDN Edinburgh 74 SOM Somerset 29 ELG Elgin 75 STS Staffordshire 30 ESS Essex 76 STI Stirling 31 FIF Fife 77 SFK Suffolk 32 FLN Flint 78 SRY Surrey 33 ANS Forfar (Angus) 79 SSX Sussex 34 GLA Glamorgan 80 SUT Sutherland 35 GLS Gloucestershire 81 WAR Warwickshire 36 HAD Haddingtonshire 82 WES Westmorland 37 HAM Hampshire 83 WIG Wigtown 38 HEF Hereford 84 WIL Wiltshire 39 HRT Hertfordshire 85 WOR Worcestershire 40 HUN Huntingdon 86 ERY Yorkshire East Riding 41 INV Inverness 87 NRY Yorkshire North Riding 42 IOW Isle of Wight 88 WRY Yorkshire West Riding 43 KEN Kent 89 YKS Yorkshire County 44 KCD Kincardine 90 ZET Shetland 45 KRS Kinross 91 ANT Antrim 46 KKD Kirkcudbright 92 RUT Rutland This information has been prepared by Rosemary Lockie, 2:253/188 in FidoNet, 2nd September 1993.