SOME ADDITIONAL NOTES on the UKC_ccc.ARJ FILES ============================================== This document contains notes on:- 1. History 2. A Parish Rediscovered 3. Recently acquired transcripts 1. History ======= As you may know, the UKC_ccc.ARJ files, containing the 1851 2% census extract, and now available on FidoNet BBSs were acquired originally by Gordon Grant, with the permission and encouragement of Alan Stanier of Essex University - derived from the original data sample lodged in the Economic and Social Research Council (ESRC) archive, at Essex. We all owe a great deal to both Gordon and Alan for this:- Gordon for introducing us to them and Alan, for being responsible not only for retrieving the sample from the ESRC Data Archive, but also for collecting other transcripts from various sources and massaging them into a coherent whole. At Essex, the data became available for anonymous ftp via the Internet, and in fact still are, for those with an Internet connection. However, I understand Gordon received the files from Alan on a tape cartridge. Coming from a UNIX-based system, they had been archived using a combination of tar (Tape ARchiver) - a way of concatenating a series of files into a single entity, and often used on UNIX systems for backup - and gzip (GNUzip), a UNIX compression utility, performing a similar function to the more familiar (on DOS systems) PKZIP. Incidentally, a version of gzip *is* available for DOS; anyone requiring to read the Essex archives on a DOS system would need a copy (the latest is version 1.24, GZIP124.ZIP and likewise of a version of tar. Gordon, however performed the conversion to DOS initially. In the first flush of enthusiasm to tap this marvellous new source of raw research material, Gordon had made the files available to us on two sets of five disks, the circulation of which round the UK would put the proverbial chain letter writer to shame! Everyone who received a copy would we hoped, pass the set on, after copying for themselves, and perhaps making a second copy to pass on elsewhere. My set arrived posted to me by Sheila Jones, then co-SysOp of DDLG BBS in Suffolk. She had received them from Kerry McCandlish of Benbecula Shuttle, in the Outer Hebrides. I in turn copied them, sending to Barry Prazak, of Paradise Valley, Northampton, who in turn sent them on elsewhere, to Dave Roocroft of Time Tunnel, and thereafter to Wally's BBS in Livingston, Scotland and onwards to other BBS in the UK, which I know nothing about - although I understand disks changed hands for the price of a pint in the pub! Gordon named the files originally CCC.ARJ, where CCC is the Chapman County code, to correspond with the equivalent at Essex - census_CCC.tar.Z. However, Gordon's choice had given rise to a problem, which none of us had first noticed. During this time, Mike Fisher of ROOTS (UK!) has also received his copy of the 5-disk set, and was preparing to hatch them into the Genealogical Software Distribution Service (GSDS). He was the first to notice that a county was missing from his disks! It was the set of transcripts for Cornwall, which would have been named CON.ARJ, if it had ever been copied successfully onto the disks in the first place! The snag of course is that DOS does not permit the use of CON as a file name, as it conflicts with the name of the CONsole device driver. In the same way it is impossible to have a file named LPT, or PRN. CON.ARJ just never made it! For this reason, Mike decided to rename the complete set of files to UKC_ccc.ARJ, and that is how they came to be released to GSDS. Today, however, there may be copies of the old CCC.ARJ files still around, both in the UK and abroad - so if you come across them, please be aware the archive for Cornwall will be missing. The last released version of XTRACT, v2.2, was written originally to conform only to the GSDS standard, to accept input from files named UKC_ccc.ARJ only. However, since then, following the improved compression offered by PKZIP v2.04g, it has been modified to accept ZIP2, or indeed almost any form of archiving, providing the archiver operates as a single DOS command, with one or more switches, and input and output file names as arguments. 2. A Parish Rediscovered ===================== Since the original release of the transcripts into GSDS, two new files have been released to replace the existing UKC_STS.ARJ and UKC_WOR.ARJ. UKC_STS.ARJ 179,552 UK census 2% transcript .... REPLACEMENT UKC_WOR.ARJ 86,723 UK census 2% transcript .... REPLACEMENT These files contain an additional parish, DUDLEY, Worcestershire -1851. It contains 56 unique surnames, and 234 different surnames in all, which were not available at the time of the original GSDS release. I discovered this parish was missing whilst I was testing XTRACT - one of my initial diagnostic aids being written to ensure I could pull out a list of *all* surnames (and nothing else) from my UKC_ccc.ARJ set. The list I produced at first did not tally with the name index prepared at Essex University - what is now UKC_NIDX.TXT in the GSDS. My first thoughts were that my code was deficient, but further investigation revealed that all of the missing surnames were indexed under the one reference, STS.WOR, the dot indicating a parish split between two counties -- Staffordshire (STS) and Worcestershire (WOR). It began to look suspiciously like - again - a file was missing from the FidoNet set! Thankfully, the files are still available on the InterNet accessible via anonymous ftp from Essex University. I was able to use my InterNet connection to retrieve the Staffordshire archive, census_STS.tar.Z (UNIX systems allow for a more descriptive file name than DOS!), and a peek inside revealed the missing parish! Happily - with the additional file, my list now tallies with UKC_NIDX.TXT! Both counties, STS and WOR have been re-hatched complete. The file names for parishes which are unique to STS and WOR remain just as they were; however, Murphy's Law being with us as ever, the parish which was omitted was one of TWO split between Staffordshire and Worcestershire, the other being the parish of WARLEY WIGORN. The existing STS.WOR file had been named STS51WOR.TXT, being the only one in the archive at that time - but in order to conform with the naming convention in the remainder of the archives (and incidentally, that in the Essex University archive), this name had to be changed - to STS51WOR.T02. The new one, containing the parish of Dudley is named STS51WOR.T01, and STS51WOR.TXT no longer exists, being renamed to STS51WOR.T02. I emphasise the latter intentionally, just in case anyone thinks they're missing out again! 3. Recently acquired transcripts ============================= This year (1995), some additional census material has been added to the store in the ESRC data archive, and we will be releasing these additions into GSDS shortly. Counties represented are Devon, Glamorgan, Gloucestershire, and Worcestershire. In some cases the additions are significant, for instance, complete transcripts for the parish of Sevenhampton, and the inmates from Northleach Prison will be added to the Gloucestershire archive. In addition, some transcripts have been acquired from other sources, including samples from Yarm, to be added to the Yorkshire, North Riding archive, and portions of Hertfordshire - thanks to kind permission of Paul Joiner, the transcriber; and from the parish of Altarnun in Cornwall, again with kind permission of the transcriber, Jeff Burnard. Copyright (c) Rosemary Lockie, June 1995