Note: This thread on unique (primary) keys is from the Clipper echo of FidoNet: From: ED DAVIS Sent: 06-10-92 15:59 To: PAUL WILFORD Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID PW> I often come up against a situation where a database contains people's PW> names, birth dates, sex, and address. For relating to other datasets PW> (and other purposes), I want to use this information to create a unique PW> identifier for the person using this information. Note that it's ..... PW> Solutions like making an id of something like initials, birthdate, sex, PW> and a sequential sequence number are the best I've come up with, but You can't use birthdate, at least not by itself, because of the high probability of having 2 or more people with the same DOB. Combining it with sex, initials, etc is just plain too cumbersome. More than likely, your data sets already have SSN's - so use them! They are _supposed_ to be unique among LIVING individuals. You also mention that you most often need this unique id in hospital/patient datasets. Here again, you probably already have unique data available to you in the form of insurance policy numbers....but, admittedly, if the patient changes insurance companies...... One option I have employed is the use of the full telephone number (including area code) as a unique identifier. The limitation here is that is only useful for tracking households, not individuals. Two patients from the same family would blow you away. But it may prove a good starting point. You should also be aware that people with different names could end up with the same initials, so this, too, is not good data to work with. Sorry I couldn't be more help. From: ROBERT KELLEY Sent: 06-10-92 15:26 To: PAUL WILFORD Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID F Since your usual application is healthcare related, that means you will likely be able to get access to the people's Social Security Number. Although there ARE rare cases of duplicate SSN's, they work well as a key for most applications, with a very high accuracy rate. Insofar as fuzzy searching, I usually find it more useful to maintain secondary indexes on fields like last name, telephone, zip code, etc. Then use soft seek logic to get close or perform incremental searches. regards, robert === progress? From: TOM HELD Sent: 06-10-92 20:19 To: PAUL WILFORD Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID F Paul, I use a fairly simple method. I use the first four letters of the last name and the last four digits of the social security number to make an 8 character field. It positively and economically identifies an individual for use in relational databases. If the last name is shorter than four letters, append it with underline characters (_) or dashes (-). If the person's last name is "Duh", and the SSN ends in "1234", the field would contain "DUH_1234". Fewer characters would work nearly as well, but this length has the advantage that the operator can usually tell at a glance to whom the record belongs. This system is handy in the military. A person's SSN is considered FOUO (for official use only), and cannot legally be stored on an unclassified fixed hard disk - but the name and last four can! A related database on a FOUO floppy containing only sensitive information may be inserted in the floppy drive during processing, as required, but locked in a file cabinet at other times. Precise calculations reveal that a duplication of this ID code between 2 random individuals will not occur within 6.125 jillion recyclations, which takes us exactly 2.3 revelations beyond the second coming of Star Trek, the 5th Generation - so replication, duplication or misidentification is not of much concern to us right now, in this dimension. Tom From: TERRY CARMEN Sent: 06-10-92 22:46 To: PAUL WILFORD Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID F As long as you have the person's data, you can add a unique identifier to ithe record, such as a sequentially generated number which is guaranteed to be unique. The trick is to identify the patient on the next visit so you don't end up with multiple records. The SSN is handy, but babies and illegal immigrants don't ususally have one. Terry From: MIKE NELSON Sent: 06-15-92 08:49 To: ED DAVIS Rcvd: -NO- Re: (R)HOW TO MAKE A UNIQUE I In a message to George Jereza <06-14-92 10:32> Ed Davis wrote: ED> But apply for a credit card - and you have to supply a SSN. Why? ED> Interest is no longer tax deductable. That was never the reason for wanting the ID. If your name were John Smith, you would really appreciate the value of having your SSN associated with your account. I used to work in credit and having the SSN was a VERY valuable way to determine which accounts belonged to whom. You see, credit report bureaus do a 'close match' on credit information... The SSN is a VERY valuable tool to determine which credit file belongs to whom. We used to have a mother and daughter who had the same name (including middle) and lived at the same address. The mother's credit history was golden... the daughter's rotten-to-the-core. So the daughter would try to use the mother's account. SSN and date-of-birth were the only way to keep these things separate. It also comes in very handy when you have a Jr or Sr in the same household. It often happens that the father opens an account before jr is born so the generation indicator never gets attached to the file... The SSN comes in VERY handy. A few years ago, a lot of negative data showed up in my credit profile... Providing the SSAN on my ap as well as the dispute saved my bacon... From: ROBERT KELLEY Sent: 06-15-92 13:58 To: MIKE NELSON Rcvd: -NO- Re: (R)HOW TO MAKE A UNIQUE On , wrote to , regarding "RE: HOW TO MAKE A UNIQUE": >RK> ... access to the people's Social Security Number. > >Although there ARE rare cases of duplicate SSN's, MN>Well, that is fine, assuming that you indeed have the right SSAN. I do >the payroll for a client and recent discovered to our mutual chagrin >that the wrong SSAN was entered in the original payroll program. That is absolutely true. If ANY key is dependant upon actual data, the risk is run that the application will have to provide the ability to correct keys across all applicable files. Can be a PITA. MN>I personally recommend using a key that is unique and generated by the >database program Yep, I agree. That was the second method I mentioned as well. There are many advantages ... and some disadvantages. regards, robert === From: ANGUS SCOTT-FLEMING Sent: 06-15-92 06:15 To: GEORGE JEREZA @ 914/20 Rcvd: 06-19-92 20:07 Re: (R)HOW TO MAKE A UNIQUE On 06-12-92, George Jereza @ 914/201 wrote to Ed Davis: GJ > Hope you don't mind my dropping in? Quick question. I thought it was GJ > illegal to ask for social security numbers? ... or... is that just in GJ > California? Thanks in advance. No, SSNs are limitted in where they can be used. Credit apps, banks, and jobs can require them. Medical offices, schools, driver's license bureaus, etc. can't. Lots of grey areas. If you're in doubt, push back and refuse to give it to 'em. From: ROBERT KELLEY Sent: 06-15-92 18:10 To: ED DAVIS Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID On , wrote to , regarding "HOW TO MAKE A UNIQUE ID": ED> RK> Database theorists would say that once a unique key is > RK> derived for a record, you NEVER change the key. There > RK> is also a lot to be said for a numerical "serial > RK> number" which is not derived at ALL from underlying > RK> data, if the SSN is not available. ED>Correct - but this too has its' share of problems - mainly that the key >cannot be reproduced, if need be, if it isn't based on the record >itself. And, assigning a unique id in this manner is ok if you're >building the application and its maintenance routines from scratch. That is usually the "cleaner" approach. . Yes I agree that it creates its' own share of problems, mainly that one cannot easily do ad-hoc browse/searches based on the key. So secondary indexes become critical to support any ad-hoc user searches. ED>Trying to relate multiple old datasets, without the "unique keys" is >next to impossible. Using ANY method of generating keys, the above is next to impossible . ED>I think everyone is agreed that using SSN's is the best solution - but >what if the person doesn't have a SSN? How about combining the two approaches. Maybe a concatenated key consisting of two fields. The first field specifies whether the key a derived from the SSN or is a computer-generated serial number. Then use a SSN if you have it in the second field, or generate a unique serialized number if you don't? regards, robert === From: MIKE NELSON Sent: 06-16-92 16:08 To: ANGUS SCOTT-FLEMING Rcvd: -NO- Re: (R)HOW TO MAKE A UNIQUE In a message to George Jereza <06-15-92 06:15> Angus Scott-Fleming wrote: AS> GJ > illegal to ask for social security numbers? ... or... is that AS> just in AS> GJ > California? Thanks in advance. AS> Au contraire on the driver's licenses... At least one state uses the SSAN with an alpha character in front of it as the driver's license number. Credit grantors may ASK for the SSAN but you are under NO obligation to provide it. By the same token, if you are writing a database program that stores SSANs, be prepared to change the field... The Soc. Sec. Admin. is running out of numbers and they have declared that they will NOT reuse the number of anyone who is deceased. As far as I know, they have not decided whether to add another digit or whether to insert alphas. From: DAVID SOLLY Sent: 06-15-92 22:51 To: PAUL WILFORD Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID Paul... You might like to take a look at the Cutter-Sandbourn tables which many libraries use to convert author's names into a single initial plus a number. The system is flexible enough that you can add subtract numbers as you need them. A typical C-S number looks something like this Solly = S6559 Wilford = W554673 I have a program which will encode a string up to 15 characters long using the Library of Congress variant of the Cutter-Sandbourn tables. I could pass it on to you if you are interested. If you just need a sequence of numbers which are unique but have nothing to do with the patient's name, dob or whatever, PW World published a program which creates a unique series of "filenames" derived from the date and time setting of the computer when run. A typical example looks like C6FA0800 C6FA0801 C6FA0802 C6FA0803 etc. I have been using this system to solve a problem similar to yours. Each morning I print off a sheet of number and use them one-by-one until done and print off a second sheet if I need more. I print a new sheet of numbers every morning even if I have not used up the numbers on yesterday's sheet. Again, I can send you the program if you are interested. Hope this helps David Solly ... OFFLINE 1.22 * Bibliotheca Sagittarii, Ottawa, Canada From: BRIAN ASTILL Sent: 06-11-92 23:50 To: PAUL WILFORD Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID F In a message of Paul Wilford wrote to All: PW> (My usual application is a hospital clinic where patients make a PW> number of visits and datasets like Patient Information are related PW> to datasets like Clinic Results). First question. Have you thought of using the Social Security Number? If you _can_ use it, you could precede it with the first few letters of the person's last name, for indexing purposes (easier for us humans to find the file for WILF123-456-789 than look for 123-456-789), and to allow a quick search for 'WILF'. Boondoggle BiBi, -=>Brian<=- From: ARTHUR MARSH Sent: 06-15-92 20:40 To: TOM HELD Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID F TH> Precise calculations reveal that a duplication of this ID code between TH> 2 random individuals will not occur within 6.125 jillion recyclations, TH> which takes us exactly 2.3 revelations beyond the second coming of TH> Star Trek, the 5th Generation - so replication, duplication or TH> misidentification is not of much concern to us right now, in this TH> dimension. Sorry, duplicate SSN's of living persons have been found. It was reported some months back in Communications of the ACM. From: ANGUS SCOTT-FLEMING Sent: 06-17-92 06:17 To: MIKE NELSON Rcvd: -NO- Re: (R)HOW TO MAKE A UNIQUE On 06-16-92, Mike Nelson wrote to Angus Scott-Fleming: MN > In a message to George Jereza <06-15-92 06:15> Angus Scott-Fleming wrote: ASF> GJ > illegal to ask for social security numbers? ... or... is that ASF> just in ASF> GJ > California? Thanks in advance. ASF> ASF> No, SSNs are limitted in where they can be used. Credit apps, banks, ASF> and ASF> jobs can require them. Medical offices, schools, driver's license ASF> bureaus, etc. can't. Lots of grey areas. If you're in doubt, push MN > Au contraire on the driver's licenses... At least one state MN > uses the SSAN with an alpha character in front of it as MN > the driver's license number. Just because they do it that way doesn't make it legal. MN > By the same token, if you are writing a database program that stores MN > SSANs, be prepared to change the field... The Soc. Sec. Admin. is MN > running out of numbers and they have declared that they MN > will NOT reuse the number of anyone who is deceased. As MN > far as I know, they have not decided whether to add MN > another digit or whether to insert alphas. Great. Another one bites the dust. Like the IRS, they'll probably spring this on us _after_ we've had to make adjustments for it. And they'll make it retroactive, assigning a new number consisting of your old number plus the third letter of your maternal grandfather's cat's third kitten's first name. And they'll use this as an excuse to lose everyone's "benefits" from this gigantic Ponzi scheme. GRRR. If only we could opt out of the SS system _now_ I'd forgo all the "benefits" I've save up - and I'd like to retire in less than 20 years. Yours in freedom, -=< Angus S-F >=- From: ANGUS SCOTT-FLEMING Sent: 06-17-92 06:17 To: PAUL WILFORD Rcvd: -NO- Re: (R)HOW TO MAKE A UNIQUE On 06-16-92, Paul Wilford wrote to Bill Clark: PW > What about sex? At least people don't get that wrong a PW > lot. How have you used this? No. But even _that_ isn't fixed any more (or is it called "being fixed" when your sex is changed? I'm sooo confused!). And all programmers _have to_ (well, should) take into account the exception that breaks the rule or their systems will burp, hiccup, and die. Yours in freedom, -=< Angus S-F >=- From: BILL DEVINNEY Sent: 06-17-92 04:10 To: MIKE NELSON Rcvd: -NO- Re: (R)HOW TO MAKE A UNIQUE I ** here's what Mike Nelson said to Robert Kelley on 12 Jun 92 18:16:00 > >I personally recommend using a key that is unique and generated >by the database program--that way you can change any data you >want in the database without worrying about 'losing' data... We use SS numbers as id numbers. We deal with aliens who may not have a social security number, and, assign them a temporary number. When they get a ss number, however, it may not be picked up by the opeartor and entered as a new record. Even when the social security number is substituted for a temporary id, we have two years of inactive records which need to be changed as well. The ss number as main key is part of a dbms given to us by the state department of education. Not my idea. I'm going to change to a unique, computer generated number and convert the file to the state's structure for annual reporting purposes. I've experienced this headache long enough, as have many of you. Now that I've decided to do that, I guess I need to do the conversion and test for possible duplicates. Counting the records with unique on and off would result in different results if there's a duplicate. Am I going in the right direction? Regards From: ED DAVIS Sent: 06-17-92 11:11 To: ROBERT KELLEY Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID RK> How about combining the two approaches. Maybe a RK> concatenated key consisting of two fields. The first RK> field specifies whether the key a derived from the SSN RK> or is a computer-generated serial number. Then use a RK> SSN if you have it in the second field, or generate a RK> unique serialized number if you don't? This is probably the best approach. The more we discuss the problem of unique ID's, the more I realize that any field we would base a key on (name, address, phone number, telephone, etc.) are all fields containing data that could (and most likely will) change, thereby complicating our applications. And trying to generate keys on portions of these fields increases the possibility of duplication. So, maybe a unique serialized number, with no relation to the data, is the best solution after all...... From: PAUL WILFORD Sent: 06-20-92 11:39 To: BRIAN ASTILL Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID F BA> In a message of Paul Wilford wrote to All: PW> (My usual application is a hospital clinic where patients make a PW> number of visits and datasets like Patient PW> Information are related PW> to datasets like Clinic Results). BA> First question. Have you thought of using the Social Security Number? BA> If you _can_ use it, you could precede it with the first few letters BA> of the person's last name, for indexing purposes BA> (easier for us humans to find the file for WILF123- BA> 456-789 than look for 123-456-789), and to allow a BA> quick search for 'WILF'. BA> Boondoggle BiBi, -=>Brian<=- Something like the SSN would be great, except that not everyone has one - so it's not reliable. While it may work for some applications, I'd like to find a better - and more universal - method. I wonder if some fancy encoding method exists whereby a long string of birth date, names and so on could be converted to a unique id number through some formula. From: PAUL WILFORD Sent: 06-20-92 11:48 To: DAVID SOLLY Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID DS> DS> You might like to take a look at the Cutter- DS> Sandbourn tables which many DS> libraries use to convert author's names into a DS> single initial plus a number. DS> The system is flexible enough that you can add DS> subtract numbers as you need DS> them. DS> That's the sort of thing that I had been looking for. In fact I went to the Vancouver Public Library to see how librarians dealt with the problem of duplicate names but only found rules about eliminating duplicates by putting a date after the author's name - beyond which the rule was that you live with any further duplications. I'd like very much to see your program code - can you post it? æ*ÿû Paul. From: TOM HELD Sent: 06-21-92 21:16 To: ARTHUR MARSH Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID F Arthur, 15 Jun 92, Arthur Marsh writes to Tom Held: TH>> Precise calculations reveal that a duplication of this ID code TH>> between 2 random individuals will not occur within 6.125 jillion TH>> recyclations, which takes us exactly 2.3 revelations beyond the TH>> second coming of Star Trek, the 5th Generation - so replication, TH>> duplication or misidentification is not of much concern to us TH>> right now, in this dimension. AM> Sorry, duplicate SSN's of living persons have been found. It was AM> reported some months back in Communications of the ACM. I did not suggest the use of SSN by itself, but a combination of surname and SSN. Did any of the people with the duplicate SSNs have the same last name as well? Tom From: ANGUS SCOTT-FLEMING Sent: 06-22-92 06:22 To: MIKE NELSON Rcvd: -NO- Re: (R)HOW TO MAKE A UNIQUE On 06-21-92, Mike Nelson wrote to Angus Scott-Fleming: MN > In a message to Jim Lynch <06-19-92 06:19> Angus Scott-Fleming wrote: ASF> I thought this code fragment was germane to the discussion and worth ASF> sharing. I saw it in a page torn out of the May92 DBMS. It is used ASF> to ASF> generate unique IDs. I have taken the liberty of dHanging the ASF> variable ASF> names and translating the code slightly. I believe the original ASF> article ASF> was a Fox article. ASF> ASF> cLN = SubStr(LASTNAME,1,3) ASF> cFN = SubStr(FIRSTNAME,1,2) ASF> nSeq = 0 ASF> lUnique = .F. ASF> do while .not. lUnique: ASF> nSeq = nSeq + 1 ASF> CUSTID = LN + FN + right("000"+trim(transform(nSeq,"999")),3) ASF> lUnique = .not. seek(CUSTID) ASF> enddo ASF> ASF> Author (I only saw last page of article, it had been ripped out of ASF> magazine to give me a copy of an ad on that page GRRR) claimed that: ASF> ASF> "...In a recently completed client application with more than ASF> 300,000 ASF> entries in the CUSTOMER table, the highest sequence number in the ASF> database ASF> was 12! MN > That is fine as long as the person doesn't change their last name. Well, you can still use the original index code if you've built it into the system as a separate field, which is implied in the code fragment above. When you don't FIND someone who claims to be in the database, you just have to ask her if she's changed her surname recently. MN > I'm setting up an accounts payable program which does a MN > similar thing except that I use 3 letters of the company MN > name... Which raises a problem... MN > MN > Would I use MON-001 for Montgomery Wards or WAR-001 or MWA-001? MN > Would I use JCP-001 for J C Pennys or PEN-001 or J_C-001? MN > MN > I've decided to arbitrarily use the 'surname' if the MN > business contains first and last name (similar to what the MN > phone book does). I'd use JCP-001 in the example above, since people generally think of them as "J C Penneys" rather than PENNEYs. I sure wouldn't use J_C out of laziness as I see no reason to make a data-entry clerk use two keys (Shift is required to make the "_"). But your rule should work fine as long as people don't end up making two addresses for such companies. Yours in freedom, -=< Angus S-F >=- * JABBER v1.1 #75 * Mind own business, always cut cards From: ROBERT KELLEY Sent: 06-22-92 12:17 To: ED DAVIS Rcvd: -NO- Re: HOW TO MAKE A UNIQUE ID On , wrote to , regarding "HOW TO MAKE A UNIQUE ID": ED>... The more we discuss the problem of >unique ID's, the more I realize that any field we would base a key on >(name, address, phone number, telephone, etc.) are all fields containing >data that could (and most likely will) change, I'd have to agree. regards, robert From: MIKE NELSON Sent: 06-21-92 15:23 To: ED DAVIS Rcvd: -NO- Re: (R)HOW TO MAKE A UNIQUE I In a message to Mike Nelson <06-17-92 11:18> Ed Davis wrote: ED> So the SSN is used in just the manner that many of us have proposed - ED> as a "near perfect" identifier. I knew it was widely used, and ED> almost always requested on things involving any form of financing or ED> tax reporting. I *STILL* would not use it as a key for relating databases, though... Let me throw additional reasons: 1) It is not at all uncommon for a person to use their SPOUSE's SSAN for id--especially in single-income families... 2) The very REAL mistake of the data-entry clerk putting the WRONG SSAN in. One of the major problems with SSAN is that it is NOT check-digitable. 3) If the person doesn't provide their SSAN and you generate a pseudo-SSAN, there is the possiblity of creating one that is already in use (or will be in the future). I've seen fraud cases where someone transposed SSAN or knew enough about SOUNDEX principals to fool the credit bureaus into pulling perfectly legitimate credit reports--but NOT of the person who applied for credit. I remember such a fraud case about 10 years ago that cost creditors well over $100,000 in fraudulent charge. My employer was lucky... we only got stuck for about $200 but some of the major department stores got hit for over $2000--not to mention banks on VISA/MasterCard and a car loan. Furthermore, I am a firm believer that ALL data going into a database SHOULD be verified--either by a second dataentry operator or by a blind ÿû double-entry system.