DiamondBase  Documentation


               Version  0.2


                 Darren Platt

               Andrew Davison

                 Kevin Lentin


    darrenp@dibbler.cs.monash.edu.au

     davison@bruce.cs.monash.edu.au

      kevinl@bruce.cs.monash.edu.au


               January 18, 1994


2

DiamondBase_______________________________________________________________1_


1    What Is DiamondBase ?


1.1    Description


DiamondBase is an implementation of a programmer's database. It supports the

basic relational model via a schema compiler and the C++ language. It has been

designed to make usage of the resulting relations simply, via object methods. It

was written by three students at Monash University Melbourne, Australia in their

spare time, because they are crazy, and have always harboured secret ambitions

to write a database.  Kevin's previously frustrated attempts met with Darren's

misguided notions that a database would make his PhD easier to write, and Andy

wanted a database for a bibliographic retrieval system called Bibel. And so the

three muskateers set off in search of adventure.


1.2    Platforms


DiamondBase has been compiled for Linux, Ultrix, SunOS 4.1.1 using GCC, for

Irix using GCC and CFRONT and OS/2 with the Borland Compiler. It has also

been successfully compiled and used on an Amiga using SASC, NeXT using GCC

and on an RS6000. It currently requires gnu-make for its makefile, but a generic

Makefile is provided.


1.3    Distribution


Read the Copyright message in the distribution.


1.4    Limitations


1.4.1  Introduction


This section is depressingly large at the moment. It outlines the areas where we

believe Diamondbase is currently deficient. This is by no means an exhaustive l*
 *ist,

and we are open to suggestions for improvements. Having listed the deficiencies,

it should also be noted that there are no fixed plans to eliminate these defici*
 *encies.

   Database extension is currently driven by the whims and requirements of the

authors. We are quite open to accepting extensions written by other people - so

submit that SQL parser as soon as you have written it.


1.4.2  Limited Multi-User Support


This is a pretty major restriction at the moment. You should ensure that only

one copy of your application is in use at any one time.

2___________________________________________________What_Is_DiamondBase_?__


    Multi-user support is now available under Linux, Ultrix and Sunos.  In its

current incarnation it is setup to run under one user-id. It can very easily be*
 * run

by multiple people. It must run on one machine.

    We believe the Irix port of this part of the product should be merely a com*
 *pi-

lation issues. The Amiga and OS/2 versions may require more work. The OS/2

version should be available very soon.


1.4.3   No multi-machine support


The multi-user support uses shared memory segments and IPC communication. It

can therefore not be used over multiple machines. It can be moved from machine

to machine, but at any one time, all clients must be on the same machine as the

server.

    We have used the UNIX ndbm package to implement a multi-user database

by accepting connections on a socket and multi-threading the executable. There

is only one server process accessing the database but many clients can use it

concurrently. If you are interested in such code please let us know.


1.4.4   No SQL


I'm philosophical on this point. SQL isn't god's gift to query languages (It's *
 *more

like satan's if the truth is to be known).  I would however like to add a query

language to DiamondBase at some stage, and SQL is certainly a standard choice.

The reliance on compiled code to do the comparison makes an interactive SQL

setup more difficult since it would have to second-guess the compiler's layout *
 *for

data structures - or dynamically link the comparison code.

    Embedded SQL would be easier to implement. It would require a compiler to

preprocess the SQL (preferably not by extracting it from the C++ - but having

queries stored in separate text files) - and then generate C++ code to access t*
 *he

objects in the right order.


1.4.5   No duplicate indexes


This is a nuisance as much as anything. All records must have a unique value for

each index. If you wish to effectively have duplicates, then you must supply an

extra field which makes them unique.

    So, for example, if you wish to index employees by age, then you would also

include employee number in the index.  Finding all the employees who are 12

years old then involves finding the first record with age=12 and id>=0. Continue

fetching the next record until age is not equal to 12.

    This approach is a bit messy and labour intensive, so we would like to perm*
 *it

duplicates - if we can work out what it will do to the poor Btrees.

    The package does, however, support a unique type.  This is a long that is

assigned a value by DiamondBase and is guaranteed to be unique within the

DiamondBase_______________________________________________________________3_


relation.  Included in an index, it allows effectively for duplicate indexes wi*
 *th

much better semantics.


1.4.6  Recovery and Auditing


The Btree indexes can be built from the data storage file if they become corrup*
 *ted

- but no facility currently exists to do this. A record dump facility to dump t*
 *he

records in the record store to a text file is currently available in the form o*
 *f the

db2txt program. There is no auditing to keep track of all transactions that were

performed. This is not a high priority for us currently. If this is a high prio*
 *rity

for you then please let us know how you would like it done.


1.4.7  Static limits


Various components of the database have static limits.  These are listed in the

table below along with a description and the file in which they are defined bel*
 *ow.

     __________________________________________________________________
     _ Name/Description                         _Value _Where          _
     ___________________________________________________________________
     _ MAX_QUERY                                _10    _defs.h          _
     _                                         ________________________ _
     _ The maximum number of queries that a bTree or dbObj will allow.  _
     __________________________________________________________________ _
     _ MAX_REG                                  _20    _dbase.h         _
     _                                         ________________________ _
     _ The maximum number of registrations that the dbObj will allow.   _
     __________________________________________________________________ _
     _ MAX_DB_INFO                              _20    _dbase.h         _
     _                                         ________________________ _
     _ The maximum number of relations that the dbObj will keep open.  _
     ___________________________________________________________________
     _ MAX_RELATIONS                            _20    _nserver.h       _
     _                                         ________________________ _
     _ The maximum number of relations that the nameserver can handle. _
     ___________________________________________________________________
     _ MAX_FIELDS_IN_INDEX                      _10    _idxinfo.h       _
     _                                         ________________________ _
     _ The maximum number of fields that an index is permitted to have.  _
     __________________________________________________________________  _
     _ MAX_NAME_LENGTH                          _300   _idxinfo.h       _
     _                                         ________________________ _
     _ The maximum length that an index name can be.                  _
     ___________________________________________________________________
     _ MAX_TRANS_AREA                           _4K    _dcdefs.h        _
     _                                         ________________________ _
     _ The maximum relation size for the multi version                   _
     __________________________________________________________________  _


1.4.8  Fixed length records


Records are currently of a fixed length determined at the creation of the relat*
 *ion.

Support for variable length fields is available though the dbData and dbString

types.  The first is a generic binary object.  The second is a derivative which

behaves like a null terminated string. Details of these features are in Section*
 * 5.

4________________________________________________________Interface_Functions_


2     Interface Functions


2.1    Introduction


This section describes the basic application functions available to the program-

mer. They are supplied as methods to the relation class generated by dsc.


2.1.1   General functions


    fflbool  ok(void)


        -  Description

               Checks if last database operation was successful.

        -  Return codes

               true - successful operation

               false - unsuccessful operation

        -  Example


           fred yourRel;


           yourRel.add();


           if  (!yourRel.ok()) -

               cerr << "An error occurred." << endl;

           "


    fflvoid  perror(char  *description)


        -  Description

               Prints a textual description of the last error code to cout -

               with some textual prefix which you supply.

        -  Example


           fred yourRel;


           yourRel.id = 1234; // Which doesn't exist

           yourRel.get();


           if  (!yourRel.ok()) -

                   yourRel.perror("Getting a record:");

           "


    fflvoid  stats(void)

DiamondBase_______________________________________________________________5_


        - Description

              Outputs the current database cache hit statistics to cout for

              this relation.

        - Example


          yourReltype yourRel;


          yourRel.stats();


          Output..


          Record Server cache: 1669 attempts, 146 hits = 8.74775%

                   1523 writes and 1513 disposals

          Index 0 cache: 0 attempts, 0 hits = 0%

                   0 writes and 0 disposals

          Index 1 cache: 593 attempts, 454 hits = 76.5599%

                   139 writes and 129 disposals

          Index 2 cache: 3 attempts, 0 hits = 0%

                   3 writes and 0 disposals


2.1.2  Database Manipulation functions


   ffldbError add(void)


        - Description

              Insert the current values for this relation into the database.

              Any fields which are of type unique will be assigned a unique

              field as the record is inserted.  The corresponding fields of

              the structure will reflect these assigned values upon return.

              The data in the object is not modified by the call (except the

              unique field).

        - Return codes

              db_ok - successful addition

              db_dup - addition of this record would create a duplicate record

              on one or more indexes.

              db_nopen - database isn't currently open.

        - Example


          employee     newEmp;


          strcpy(newEmp.name,"Joe Bloggs");

          newExp.age = 12;

6________________________________________________________Interface_Functions_


           switch(newExp.add()) -

               case db_ok: break;

               case db_dup:

                   cerr << "This employee already exists" << endl;

                   break;

               case db_nopen:

                   cerr << "Open the database first dummy" << endl;

           "


    ffldbError begin(int indexNumber = 0)


        -  Description

               A call to begin initiates a query - signalling an intention to

               retrieve one or more records from the database using a partic-

               ular index. Within a query, an implicit pointer is maintained

               to a position within an ordered index list.

               Whilst direct fetches with get may be executed without a

               begin/end pair - begin must be used with all relational retrieval

               operations.  If a previous begin is outstanding when this is

               called, the previous query is terminated. You should therefore

               not use get inside a begin/end pair of the same instance of a

               relation as the get will terminate the query.  If you use 2

               different instances of the relation, then it will work correctly

               as each has its own query.

               The following functions require a query and must therefore

               be executed within a begin/end pair.

                  extract

                  extractNext

                  extractPrev

                  find

                  first

                  last

                  next

                  peekNext

                  peekPrev

                  prev

                  seek

                  seekFirst

                  seekLast

                  write

DiamondBase_______________________________________________________________7_


        - Return codes

              db_ok - the query started correctly

              db_range - the supplied index is out of range

        - Example


          // Retrieve all employees' names.

          employee tempEmp;


          tempEmp.begin(dbIdx_name); // Initiate query


          for(tempEmp.first();tempEmp.ok();tempEmp.next()) -

              cout << tempEmp.name << endl;

          "


          // The same thing, using a while instead


          tempEmp.seekFirst();


          while(tempEmp.next() != db_eof) -

              cout << tempEmp.name << endl;

          "


          tempEmp.end();


   ffldbError del(void)


        - Description

              Delete the record matching the current values for the index

              from the database. The data in the object is not modified by

              the call.

        - Return codes

              db_ok - successful deletion

              db_nfound - Couldn't find record to delete it

        - Example


          employee exEmp;


          exEmp.id = 1234; // Employee id to delete.

          exEmp.get(); // Fetch this employee.

          if (exEmp.ok()) -

              if (exEmp.del()==db_nfound) -

8________________________________________________________Interface_Functions_


                   cout << "Couldn't find employee" << endl;

               "

           "


    ffldbError end(void)


        -  Description

               End the previously initiated query.  This is mainly for effi-

               ciency reasons, and to terminate any outstanding locks, since

               the next begin will automatically terminate the previous begin

               anyway.

        -  Return codes

               db_ok - no problems.

               db_noquery - there was no matching begin.

               db_range - query value was not in the valid range

        -  Example - see begin


    ffldbError extract(void)


        -  Description

               This call does a find on the supplied object and then locks the

               object if it is found. The record itself is locked, not the key

               being used, so accesses to the record through any key will fail.

               After an extract, any access of the record will return db_locked

               . If this access is an extract, an extract will not occur and the

               operation will be equivalent to a find. If the access is another

               read-type operation - then the data will be retrieved. If this

               access is a write, then it will fail.

               If the requested record does not exist, then the data in the

               object is not modified. In that situation, extract degenerates

               to a seek.

        -  Example

        -  Return codes

               db_nfound

               db_ok

               db_locked


    ffldbError find(void)


        -  Description

DiamondBase_______________________________________________________________9_


              Search for the index entry in the database. If the entry exists,

              then the query pointer is positioned at that point and the

              record is retrieved.  Otherwise, the pointer is moved to the

              following record, and that record is retrieved.

              Note that since the retrieval of that record involves a call to

              next, subsequent calls to next will not return this same record.

              If you wish to position the relation before a record without

              advancing the record pointer, use the seek family of functions

              instead.

        - Return codes

              db_ok - a matching record was found retrieved

              db_nfound - no matching record was found and the data for the

              next record was placed in the class.

              db_locked - the record was found and was locked. The data is still

              returned however.

              db_range - query value was not in the valid range

              db_noquery - query value was not valid

        - Example


          employee exEmp;


          exEmp.id = 1234; // Employee id to find.

          exEmp.find(); // Fetch this employee.


   ffldbError first(void)


        - Description

              The first function may only be executed from within a query

              (see begin).  It sets the query pointer to the first record and

              returns the record at that position if one exists.

              Note, that since first retrieves the first record, a subsequent

              call to next will not return that record. Instead, it will return

              the second record. See seekFirst.

        - Return codes

              db_nfound - there are no records in the database

              db_range - query value was not in the valid range

              db_noquery - query value was not valid

        - Example - see begin


   ffldbError get(int idxNumber=0)

10_______________________________________________________Interface_Functions_


        -  Description

               Retrieve a record from the database using explicit values for

               an index. This is the only retrieval function which does not

               need to be executed within a begin/end pair.  The function

               issues an implicit begin/end pair around this atomic opera-

               tion.

        -  Return codes

               db_ok - Record matching index value was found

               db_nfound - Record was not found

               db_range - the index specified was illegal

        -  Example


           //

           // relation host_to_ip -

           //      char  hostname[100];

           //      char ip[16];

           //      index byIp    on ip;

           //      index byName on hostname;

           // "

           host_to_ip myLookup;


           strcpy(myLookup.hostname,"dibbler.cs.monash.edu.au");


           if (myLoopkup.get(dbIdx_byName)==db_ok) -

               cout << "IP=" << myLoopkup.ip << endl;

           " else -

               cout << "Not found" << endl;

           "


    ffldbError last(void)


        -  Description

               The last function may only be executed from within a query

               (see begin).  It sets the query pointer to the last record and

               returns the record at that position if one exists.

        -  Return codes

               db_ok - last record found and retrieved

               db_nfound - there were no records

               db_noquery - Last must be within a begin/end pair

        -  Example - see prev

DiamondBase_____________________________________________________________11_


   ffldbError next (void)


        - Description

              This function may only be executed from within a query (see

              begin). It sets the query pointer to the next relation for the

              current query index and retrieves it if it exists.

        - Return codes

              db_ok - Next record is retrieved

              db_eof - No more following records

              db_noquery - Next was not executed within a begin/end pair

              db_locked - data retrieved but record locked by someone else

        - Example -see begin


   ffldbError peekNext(void)


        - Description

              Return next record if it exists, but do not advance the btree

              query pointer. Subsequent next, prev, peekNext and peekPrev

              calls will return values as if this call never occurred.

        - Return codes

              db_ok - Next record is read

              db_eof - No more following records

              db_noquery - Next was not executed within a begin/end pair

              db_range - query value was not in the valid range

              db_locked - data is retrieved but record is locked

        - Example


          // Given the records,

          // JONES,PERCY,PLATT and SMITH


          employee tempEmp;


          strcpy(tempEmp.name,"PLATT");

          tempEmp.begin(dbIdx_name); // Initiate query

          tempEmp.find(); // Fetches PLATT


          tempEmp.peekNext();

          cout << tempEmp.name << endl; // Prints SMITH

12_______________________________________________________Interface_Functions_


           tempEmp.next();

           cout << tempEmp.name << endl; // Prints SMITH again


           tempEmp.end();


    ffldbError peekPrev(void)


        -  Description

               Return previous record if it exists, but do not move the btree

               query pointer. Subsequent next, prev, peekNext and peekPrev

               calls will return values as if this call never occurred.

        -  Return codes

               db_ok - Previous record is read

               db_eof - No more preceding records

               db_noquery - prev was not executed within a begin/end pair

               db_range - query value was not in the valid range

               db_locked - data is retrieved but record is locked

        -  Example


           // Given the records,

           // JONES,PERCY,PLATT and SMITH


           employee tempEmp;


           strcpy(tempEmp.name,"PLATT");

           tempEmp.begin(dbIdx_name); // Initiate query

           tempEmp.find(); // Fetches PLATT


           // NB query pointer is now after PLATT


           tempEmp.peekPrev();

           cout << tempEmp.name << endl; // Prints PLATT !


           tempEmp.next();

           cout << tempEmp.name << endl; // Prints SMITH


           tempEmp.end();


    ffldbError prev(void)


        -  Description

DiamondBase_____________________________________________________________13_


              This function may only be executed from within a query (see

              begin). It sets the query pointer to the previous relation for

              the current query index and retrieves it if it exists.

        - Return codes

              db_ok - Previous record is retrieved

              db_eof - No more previous records

              db_noquery - Prev was not executed within a begin/end pair

              db_locked - data retrieved but record locked by someone else

        - Example


          // Retrieve all employees' names - but in reverse

          // order.

          employee tempEmp;


          tempEmp.begin(dbIdx_name); // Initiate query


          for(tempEmp.last();tempEmp.ok();tempEmp.prev()) -

              cout << tempEmp.name << endl;

          "

          tempEmp.end();


   ffldbError put(int idxNumber=0)


        - Description

              This function writes a record into the database regardless of

              its presence. If the record does not exist, this call does an add,

              otherwise it does a write. This call, like get does not require

              a query and, as such, contains an implicit begin/end pair. It

              can be used to write a record into the relation with no regard

              to the current contents.

        - Return codes

              db_ok - Record was successfully written

              db_dup - The record's keys partially clash with existing records

              db_range - the index specified was illegal

        - Example


          // relation host_to_ip -

          //      char  hostname[100];

          //      char ip[16];

14_______________________________________________________Interface_Functions_


           //      index byIp    on ip;

           //      index byName on hostname;

           // "

           host_to_ip myLookup;


           strcpy(myLookup.hostname,"dibbler.cs.monash.edu.au");

           strcpy(myLookup.ip,"130.194.62.33");


           if (myLoopkup.put(dbIdx_byName)==db_ok) -

               cout << "Added successfullly" << endl;

           " else -

               cout << "Something went wrong" << endl;

           "


    ffldbError seek(void)


        -  Description

               Positions the query pointer using the values for the current

               query index.  The next record returned by a call to next or

               peekNext will be the record requested if it exists in the rela-

               tion, or the one that would follow it if it did exist.

        -  Return codes

               db_ok - The specific record was found.

               db_eof - No more preceding records

               db_noquery - prev was not executed within a begin/end pair

               db_range - query value was not in the valid range

               db_nfound - the specific record was not found.

        -  Example


    fflseekFirst


        -  Description

               Position the query pointer just before the first record for the

               current query index, so that the next record retrieved will

               be the first one. Like first, this call does not depend on the

               contents of the relation - it only moves to the beginning of

               the index.

        -  Return codes

        -  db_ok - The record was found.

        -  db_eof - No records

        -  db_noquery - seekFirst was not executed within a begin/end pair

DiamondBase_____________________________________________________________15_


        - db_range - query value was not in the valid range

        - Example


   ffldbError seekLast(void)


        - Description

              Position the query pointer just after the last record for the

              current query index, so that calling prev will retrieve the last

              record for that index.

        - Return codes

        - db_ok - The record was found.

        - db_eof - No records

        - db_noquery - seekLast was not executed within a begin/end pair

        - db_range - query value was not in the valid range

        - Example


   ffldbError write(void)


        - Description

              If all the keys are already present in the database and they all

              refer to the same record, then that record will be overwritten

              with the values in this class.  If all the keys are not present

              then the call fails.  The standard way to update a record is

              to perform an extract followed by a write.  If you wish to

              change one of the fields on which the record is indexed, then

              you must create a new record and copy the unchanged details

              from the old record to the new - followed by a deletion of the

              old record. This decision was made to considerably simplify

              the update mechanism within the database.

              If the write is being performed after an extract, then the keys

              are checked to ensure that they have not changed since the

              extract. If they have, then the write will fail. This last check

              is currently unimplemented.

        - Return codes

              db_ok - record was overwritten successfully

              db_dup - neither 0 nor all the indices matched

              db_range - query value was not in the valid range

              db_noquery - query value was not valid

              db_nfound - record is not already in the index

        - Example

16_______________________________________________________Interface_Functions_


2.1.3   Data Members


status

    Holds the return code from the last database operation.

    Error codes: db_ok ....

DiamondBase_____________________________________________________________17_


3    Internals


3.1    Introduction


As far as the big picture goes, you write a program which needs database func-

tions. You have used DSC to produce both the physical storage files, and some

compiler generated code which you link into your application.  You also link in

the diamond library which contains functions needed to perform the database

operations you execute as methods to DSC generated objects.

   This section of the manual describes how the internals of DiamondBase are

implemented.  We would like you to understand what is happening inside the

database - as this will aid you in debugging, reporting bugs within the database

code and suggesting improvements and changes.


3.2    Typical application code


#include "myrel.h"


diamondBase theDiamondBase("config.db");


main()

-

    myRel theRel; // Attach stage          (a)


    theRel.begin();

    theRel.first();

    // Use theRel here.


    theRel.extract(); // Decide to update it.

    // Change some non-indexed member fields


    theRel.write(); // Put it back

    theRel.end();

"


   Note the appalling lack of error checking going on - utterly disgusting. The

definition for the 'myRel' class will inherit both a transfer area and a diaRel*
 * class.

The transfer area acts as a sub-struct with only data members which can be used

for memory transfers as records are stored and retrieved. The diaRel base class

has the necessary methods to implement the calls (eg first, next, write).

   Note that in the above example, the myRel instance was in scope for all of

main - it need not have been.  You can declare relations wherever you wish to

18_________________________________________________________________Internals_


use them. Global relation instances are even permitted, provided you can ensure

that the theDiamondBase instance you declare is constructed first.

    Which brings me to the final point for this section - you must have an inst*
 *ance

of diamondBase called `theDiamondBase' at the global scope in your program.

We could have defined one ourselves in the library, but this way you get to name

the database configuration file.


3.2.1   diaRel


There is one instance of diaRel in each user-defined relation class.  It imple-

ments the functions which are used to manipulate the database. diarel.h and

diarel.cc are fairly self explanatory. The constructor for diaRel attaches to t*
 *he

global theDiamondBase object and obtains a reference Id which is used in all

further correspondance up until the point where the class which inherits diaRel

has been destroyed.

    diaRel also inherits an object class. This has a number of pure virtual met*
 *hods

which the database ultimately uses to perform functions like fetching the data,

storing the data, extracting the key component of the data - comparing keys.

These functions are performed by DSC generated code and so a pointer to the

base object class is passed to theDiamondBase during attach in order to permit

this.


3.2.2   diaGRel


The diaGRel (or Generic Relation) class can be used in place of a DSC generated

class to manipulate any relation you desire.  This can be used to write generic

relation browsers or such like. The use of the diaGRel class is described in de*
 *tail

in section 4.  diaGRel is used to implement the db2txt relation dumping utility,

and the server in the multi-user version.


3.2.3   diamondBase


diamondBase has rather a large job.  It overseas all the classes which actually

manipulate the database. On construction it is supplied with a string which is a

path to the name server configuration file. This file is typically called "conf*
 *ig.db"

- although the default parameter is "ns.dbc".  This parameter is passed to the

base TNameServer class.

    As mentioned in the diaRel section, diamondBase handles incoming diaRel

connection requests - assigning a reference id to each new connection. A list of

such class clients is maintained.

    A list of active and recently deregistered relations is kept. Old relations*
 * are

kept to avoid unnecessary overhead when a relation class is repeatedly created

then destroyed. If multiple active diaRel classes are accessing the same relati*
 *on,

then only one relation is maintained for all of them.

DiamondBase_____________________________________________________________19_


   Each diaRel uses only one relation - and provides its name upon registration.

Failure to locate that database name using TNameServer is an error.

   Each relation has an associated pointer to a dbObj  through which all non

register/deregister operations are handled.


3.2.4  TNameServer


This class uses the parameter supplied at construction time to open a file con-

taining entries in the form:


relationName=directory-path


This is simply used to provide some flexibility in the mappings from a relation

name onto the directory where it is actually stored.

   As an alternative, it is possible to pass the empty string ("") to diamondBa*
 *se

on construction and then the TNameServer will assume that all relations are in

the current directory. This is useful when using diaGRel.


3.2.5  dbObj


One dbObj instance handles one relation.  For each index on that relation, the

dbObj maintains one bTree instance through which operations using that index

are performed.  It inherits an recServer base class which it uses to fetch the

actual data associated with a particular record. Actual operations on indices a*
 *re

perfromed by calling the appropriate method in a bTree.

   The dbObj maintains a set of active queries which each refer to a particular

bTree query.  Record locking occurs at the dbObj level to prevent two updates

occurring on the one record concurrently.

   The dbObj also collects information about the relation when it is first inst*
 *an-

tiated. This allows the diaGRel to find out all the information it needs about a

relation.


3.2.6  bTree


The bTree is probably the most complicated object in the whole hierarchy (histo*
 *r-

ically it is was the first thing to be written). It manages the indexing of rec*
 *ords.

It manages its own queries, taking care to ensure that multiple queries reflect*
 * the

changes made to each other, whilst optimizing as much as possible for speed.

   It inherits a recServer to store the actual buckets. The actual implementati*
 *on

is essentially a B+ tree with pointers to allow sequential retrieval. At the mo*
 *ment,

buckets which drop to less than half full aren't merged with sibling buckets.

20_________________________________________________________________Internals_


    Operations like fetching key data from the user class, setting key data, co*
 *m-

paring keys etc, are performed via calls to member functions in the object class

that is passed in.

DiamondBase_____________________________________________________________21_


4    Generic Relations


4.1    Introduction


The fact that DiamondBase is based upon the DSC relation compiler meant

initially that only those relations that had code compiled into an executable

could be accessed by that executable.

   This was seen as being too restrictive because it meant that no programs

could be written that acted upon an arbitrary relation. Since it was intended to

eventually write such programs as relation browsers and index rebuilders, it was

vital that a mechanism for accessing unknown relations be written.


4.2    diaGRel


The diaGRel class accomplishes this. It is meant to be used in any situation wh*
 *ere

a DSC generated class would otherwise be used. It implements all functions that

the generated code does except that it is less efficient.

   This class is instantiated as a DSC generated class, except that its constru*
 *ctor

takes a string as a parameter which specifies the relation to use. This should *
 *be

the name of the files which make up the relation without their file extension.


4.3    Use


Once instantiated, a diaGRel is used exactly as any DSC generated class is.


4.4    Implementation


Initially, the diaGRel was written to open the relation file and read in the re*
 *cord

and key structure stored there. It would then do some pretty horrendous traver-

sals of this data to build up arrays of information about the relation.  These

arrays contain data in a form that makes implementing the methods required to

imitate a DSC generated class quick and efficient. This process is complicated *
 *by

the fact that relations and keys are reordered by DSC to put them in an order

that prevents alignment problems.

   Key access in a diaGRel is the main ineffciency of the class. This is because

much pointer arithmetic has to be done at runtime to extract relevant data. In

a DSC class this is done at compile time by the compiler by way of structure

references.

   Access to the relation data can be via one of two mechanisms.  Either the

diaGRel allocates a block of memory or a block of memory is given to it as

construction time. The latter has the benefit that you can inherit a stuct that*
 * is

in the relation and pass a cast to this to diaGRel thus allowing you to referen*
 *ce

22_________________________________________________________Generic_Relations_


relation members by name. Without this, you only have access to the relation as

a block of memory.


struct BookRel -

char name[100];

char author[100];

long barcode;

"


class Book : public BookRel, public diaGRel -

public:

Book(void);

"


Book::Book(void) : diaGRel("book",(BookRel *)this);

-

"


diamondBase theDiamondBase("config.db");

main()

-

Book ARM;


strncpy(ARM.name,"Annotated C++ reference",sizeof ARM.name);

strncpy(ARM.author,"B. Stroustroup",sizeof ARM.author);

ARM.barcode = 11345654;


ARM.put();

"


4.5    Performance


The above implementation worked correctly but inefficiently. The reason for this

was the necessity to open, read and disect the relation file on every instantia*
 *tion.

This meant that a process using a diaGRel instead of a DSC generated class took

more than 5 times as much system time to perform the same job. This was not

satisfactory.

    The solution to the above was to allow the dbObj to collect the data. This *
 *was

already being partially done there and the change involved setting up a struct *
 *to

hold all the data and moving a lot of code from the one class to the other. Now

DiamondBase_____________________________________________________________23_


the dbObj can return a pointer to this struct to the diaGRel via the diamondBas*
 *e,

through the attach method to the diaRel underneath the diaGRel (phew!).

   The above change meant that unless more than about 10 different relations

are used in a program, the data discussed above is assembled only once.  The

result is that the diaGRel takes about twice as much user time as the DSC

generated classes (about expected) and about 35-40% more system time (also

not a disappointing number).

   This raises interesting performance questions for database engine writers. U*
 *n-

less you either employ a system whereby you write self-modifying code, or you

link in generated code (as we do) - you are are forced to accept this overhead *
 *due

to interpreting your relation layout. It basically involves extra levels of ind*
 *irec-

tion during database access. One could envisage a system whereby the database

engine generates custom machine code in designated parts of its executable for

optimally handling each relation, and then loads it in when that relation is ne*
 *eded

- but the mess associated with that system is not warranted at this stage.

24____________________________________________________Variable_Length_Fields_


5     Variable Length Fields


5.1    Introduction


A noted absence in early versions of DiamondBase was variable length records in

some form or another.

    Given the complexities of implementing a variable length record server, we

opted to instead provide variable length fields. This solution satisfies almost*
 * all

sensible known uses of length variation in a relation.


5.2    dbData


To this end, two variable length field types are now available:  dbData  and

dbString. The dbString type is derived from dbData and so I will discuss dbData

first in detail.  dbData is a type that can be used to start arbitrary (possibly

binary) data of variable size.

    The following access methods are available.


    fflconstructors


        -  dbData: Construct from another dbData.

        -  char*:  Copy the string and make the size of the dbData equal to 1

           more than the string length.

        -  void*,long: Make the dbData the specified length and copy that much

           data in.

        -  void: Just creates an empty dbData.


    fflunsigned int len(void): Returns the current length of the dbData object.


    fflunsigned int setSize(unsigned int): Sets the size of the dbData object a*
 *nd

      returns the size set.  The result of this method differs from its paramet*
 *er

      only in dbString cases, see below.


    fflunsigned int getSize(void):  Returns the current size of the object.  No*
 *te

      that for dbString, size and length are different.


    fflchar* cat(char *): Adds a string onto the end of a dbData object. Returns

      a char* pointer to the data in hte dbData.


    fflchar* cat(dbData&): Similar but adds the contents of the dbData passed

      in.


    fflchar* cpy(char*):

DiamondBase_____________________________________________________________25_


   fflchar* cpy(dbData&): These two replace the contents of the dbData with

     the relavent data.


   fflvoid fill(char, unsigned int=0): Fills the dbData with the specified char*
 *ac-

     ter. If the second argument is passed then that many characters are filled

     in, otherwise the entire dbData is filled.


   fflvoid clr(void):


   fflvoid dispose(void): These both free any memory associated with the dbData

     and set it's content and length to 0.


   ffloperator char*:


   ffloperator void*: These return appropriate casts of the dbData object.


   ffloperator +(char*):


   ffloperator +(dbData&): These add the contents of their second operand onto

     the end of their first.


   ffloperator <<:


   ffloperator >>: Perform the appropriate stream operations.


   ffloperator []: Returns a reference to the appropriate char in the object or*
 * a

     reference to a dummy char if the paramter is out of range.


   ffloperator =(char*):


   ffloperator =(dbData&): These behave exactly as cpy.


   ffloperator +=(char*):


   ffloperator +=(dbData&): These behave exactly as cat.


5.3    dbString


The dbString class was orginally written (and rewritten a few times) as the only

variable length type. When the more general dbData class was designed it became

clear that dbString should be a derived class from dbData. This is now how it

works.

   dbString behaves almost exactly as dbData does except that it is a null ter-

minated string.  This means that some of the resizing functions behave slightly

differently. In particular, the size of a dbString and the length of a dbString*
 * are

different things. The size is the amount of memory allocated to the internal ch*
 *ar

array and the length is the number of characters before the first null characte*
 *r.

26____________________________________________________Variable_Length_Fields_


    The difference between size and length can be used to ensure that there is

enough space in a dbString to perform an operation. For instance, you may use

setSize to increase the size of a dbString before passing a char* cast of it to*
 * a

function which manipulates it (eg the library sprintf function). In this way, y*
 *ou

know there is enough space in the string to extend its length.

    The following are the differences in dbString.


    fflThe following functions behave differently either by using a differnet d*
 *efi-

      nition of length or by storing a 0 to truncate the string during size cha*
 *nges

      or adding (or otherwise taking into account) the extra 0 needed on the end

      of a string.


        -  unsigned int len(void)

        -  void setSize(unsigned int)

        -  char* cat()

        -  char* cpy()

        -  void fill(char c, unsigned int = 0)

        -  constructors: by passing a flag to dbData to indicate it is a string


5.4    memServer


The memServer is the class that allows the variable length fields to be used wi*
 *th

diamondBase.  It does this by creating a new file with a .str extension to hold

the dbString and dbData records.

    The memServer file is laid out like a block of memory being manipulated by

an allocator such as malloc.  It manipulates used and empty blocks and joins

empty blocks should they lay adjacent to each other.  Blocks are alocated on a

first-fit basis.

    In order to able to access these fields normally in a relation, they must be

members of the relation.  In order to make it practical to store the relation in

a recServer, only longs are stored for each dbString or dbData. These longs are

offsets into the .str file maintained by the memServer. It is therefor necessar*
 *y to

have those longs stored in the relation structure so it can be written out effi*
 *ciently

by the recServer.

    The solution to the above dilema was to declare another struct to hold the

dbString and dbData members with the same names as given in the relation

file. The long offsets are stored in the main relation struct with slightly man*
 *gled

names. Both of these structs are inherited by the relation.

    In this way, users may reference their dbString and dbData references as th*
 *ey

are named even though only longs are stored in the relation struct.

    The dbObject makes sure that whenever there are dbString or dbData fields

in a relation, appropriate calls are made to memServer to keep them up to date.

DiamondBase_____________________________________________________________27_


Every time a record is read or written, the strings need to be similarly update*
 *d.

This gives a performance penalty compared to using char arrays to store strings

but this penalty has been optimised heavily to minimise this penalty.

   The memServer and its integration into the dbObject provide seamless use of

dbData and dbString fields in relations.

28_________________________________________________Multiuser_DiamondBase___


6     Multiuser DiamondBase


6.0.1   Disclaimer


Warning: The multi user component of DiamondBase has only been completed

very recently, and is being released as an ALPHA version. There are many defi-

ciencies documented below (and goodness knows how many undocumented ones).

If you intend to use it, then read this carefully, and please give us feedback *
 *on

what you think, and any improvements you think are necessary.


6.1    Introduction


The transition from single user to the multi-user version of DiamondBase has

been quite an adventure. We started by defining what we meant by "multi-user".

To an extent, the existing version already supports a level of concurrency in t*
 *hat

you can have multiple objects in your program which are all accessing the one

relation.  There is proper locking to ensure that they don't interfere with one

another.

    At a higher level however, you want multiple processes which may not even

be the same program, accessing the database concurrently without corrupting

the underlying files.  For this version, we have chosen to implement a system

which allows multiple processes on the one machine. The possibility exists for a

distributed database across a TCP network but that shall wait.


6.2    What didn't work


The first attempt at a concurrent version of DiamondBase began at the bottom.

If the record management at the recServer level was given proper locking, then

all that sat on top of would behave transparently. This turned out to be about *
 *as

incorrect as is imaginable. For a start, cache coherence becomes a problem. The*
 *re

are also higher level structures like B-Trees, for which, simply locking a smal*
 *l part

is totally inappropriate.  As the changes propagated up the code hierarchy and

the changes to the bottom layer multiplied, this was abandoned.  Andy wisely

smiled, having predicted this from the start.

    Having explored its "rear end" if you like, we next commenced with oral

surgery.  Putting a layer near the top, between the schema generated code and

the database engine would allow the two to exist in separate applications quite

happily.  This worked much more nicely, with the engine left unchanged, and

some surgery at the level of the diamondBase and diaRel objects. There was a

problem however. Having all the comparison code compiled into the application

makes for a very fast single user version (and I would still recommend it unless

you really need concurrent behavior).  However, it meant that the server still

DiamondBase_____________________________________________________________29_


needed to convey the object callbacks to the application with the appropriate

data.

   After quite a bit of coding, and the odd kernel IPC fix or two for linux,

we had a running version which was only 11 times slower than the uni version.

Disgusted, I decided that for this, and other reasons, we would have to interpr*
 *et

the relation layout and do the comparisons within the server, only talking to t*
 *he

application when it issued a request, and to supply the reply. The diaGRel class

was born to permit generalized access so the server could get to any relation, *
 *and

the communications protocol emasculated to handle the smaller command set.


6.3    The architecture


This version uses shared memory segments between the server and the application

processes to transfer records, and a message queue shared between all processes*
 * to

communicate requests and intentions. Incoming messages on a very low message

number signify attach requests from applications.

   The connection protocol is as follows:


   fflClient sends an attach request as a message with id 2, and its pid.


   fflServer creates /tmp/diamond.pid and a shared memory segment to go with

     it. It then sends a message on BASE_RESP + pid


   fflClient attaches to the new shared memory area.


   Further transactions are performed as follows:


   fflCopy the relations data into the shared memory segment


   fflSend message with transaction details on BASE_REQ. Initial versions had

     pid added to this but this caused clients with lower pid's to be given pre*
 *f-

     erential treatment. Now the pid is embodied in the message.


   fflServer reads message and gets the database engine to perform the trans-

     action after copying the data from the shared area into a diaGRel object

     which is carrying out the database requests on behalf of the client.


   fflServer copies resultant relation into transfer area and sends reply messa*
 *ge

     on BASE_RESP + pid


   Transactions consist of database transaction ids along with associated index

numbers, new object attach requests, object detach notification messages and ap-

plication detach messages. The reference Id numbers handed out by the database

engine are mapped onto "fake" diaGRel objects which do all the work in the en-

gine on behalf of the client's diaRel objects.  Two lists are maintained to tra*
 *ck

the attached client applications and their attached objects. Having the diaGRel

functionality makes the whole process quite simple.

30_________________________________________________Multiuser_DiamondBase___


6.4    Performance


Was not terribly disappointing.  It was never going to be faster than the uni

version, but you have to keep in mind some potential gains from having a separa*
 *te

server:


    fflKilling application programs won't jeopardize database integrity


    fflRetrieved data is in a single pooled cache increasing efficiency


    fflThe possibility exists to prevent direct access to the database files.


                    ___________________________________
                    _      Test _User  _System _ Total  _
                    ___________________________________ _
                    _       Uni _27.9  _ 8.0   _ 35.9  _
                    ____________________________________
                    _ Multi (1) _     _        _       _
                    _           _     _        _       _
                    _    server _23.19 _ 11.8   _50.19  _
                    _           _      _        _       _
                    _     client _10.0  _5.2   _       _
                    ____________________________________
                    _ Multi (2) _     _        _       _
                    _           _     _        _       _
                    _    Server _49.0  _ 22.3   _104.23 _
                    _           _      _        _       _
                    _   client A _11.0  _22.3   _      _
                    _            _      _       _      _
                    _   client B1_0.83 _ 5.5   _       _
                    ____________________________________


            Table 1: Performance figures for multi and uni versions


    From the above figures, the multi version is approximately 40% slower than

the uni version - which can largely be attributed to the fact that the server i*
 *s in-

terpreting the record comparisons. Note that for the two client test, the datab*
 *ase

is essentially doing twice the work. Running the two tests under the uni version

takes twice the figure quoted for the uni version.  If you really want high per-

formance concurrent database management for specialized applications, it would

be theoretically possible to compile the comparison code into a specialized ser*
 *ver

which can just do the record comparisons you need.

    The other thing to note on the performance front is that we have not opti-

mized the code at all (ie -O6) - there is reason to believe that this could imp*
 *rove

things. We have not profiled the multi stuff at all, so there are many potential

improvements in performance there. The next release will be a little more fleet

footed on that front I expect.


6.5    Compiling the Concurrent version


The target multi exists in the Makefile.  Typing gmake multi after you make

already built the main system will perform the multi compilation.  The target

DiamondBase_____________________________________________________________31_


will build the dbmulti library, and the jeweler server process. It has been tes*
 *ted

under SunOS 4.1.1, Ultrix IRIX and Linux. There are some prototypes for the

IPC system calls which seem to be woefully under prototyped on non Linux

systems. It should theoretically work on any UN*X system which supports IPC

(interprocess communication) - although my experience so far suggests that IPC

is about as standard as curses so I am prepared for bug reports.


6.6    Testing the concurrent version


I will distribute a small application called bump. It creates 5000 records, int*
 *er-

spersing requests for old records in the addition calls, and ensuring their int*
 *egrity.

It takes a parameter indicating a range. If you are running more than one, you

should specify parameters about 10,000 apart. I have run up to 8 of them con-

currently under SunOS, and Linux. Note that if you are running more than one

- you should provide parameters spaced at 10000 and run no test twice without

first creating an empty database, otherwise duplicate records will result and t*
 *he

test code will fail. Running more than 9 processes at once causes the tests to *
 *fail

due to limits within the server (which can be changed if desired).


6.7    Using the Concurrent version


One of the issues we would still like to address more thoroughly is that of er-

gonomics from a programming point of view. Ideally, we would like to minimize

the pain of the transition between a uni-database and a concurrent one, to that*
 * of

linking with a separate library. We still believe that this can be achieved, bu*
 *t at

the moment, it is necessary to recompile your application with the preprocessor

symbol 'MULTI' defined. You must also link with the libraries dbmulti and dia-

mond in that order, rather than just diamond. Upon execution, your application

will attempt to connect to a running server in order to perform its transactions

rather than doing the accesses directly. So, to reiterate the steps:


   fflMake sure that your code includes diarel.h rather than dbase.h to get at

     any database definitions.


   fflPut -DMULTI into your application CFLAGS


   fflLink your application with -ldbmulti -ldiamond


   fflEnsure that the server process jeweler is running.


   Note: This version of the code creates all rendezvous points with octal per-

mission 600, meaning that the client and server must be run by the same user.

The permissions can be edited to suit your taste.  The whole area of security

management needs careful examination.

32_________________________________________________Multiuser_DiamondBase___


    The ergonomics of files used to manage the connection are also less than

satisfactory at the moment.  Any relation which you wish to use via a server

process must be in the server.db file in the directory in which the server proc*
 *ess

jeweler is run. This file is in the same format as the relation file whose name*
 * is

normally passed to the diamondBase object in the uni-version.

    The server and client processes initially make their rendezvous via message

passing, and the filename used to make the connection is gibraltar. (It seemed *
 *like

a solid place to build a database - sorry). Currently, both the server and clie*
 *nt

library code look in the current directory for this file.  It must therefore ex*
 *ist -

although its contents are irrelevant for the running of the server. The server *
 *itself

creates the file when it is run. It also constrains the clients to run from the*
 * same

directory as the server. This is messy and we will endeavor to rectify it. If y*
 *ou

have strong opinions on how databases, servers and clients should be managed

system-wide, we would love to hear from you.

    So, in order to run an application/server pair:


    fflcreate a file server.db listing your relations


    fflrun jeweler (optionally in the background)


    fflrun your application in the same directory


    When you wish to shut the server down, either send it an interrupt message

(press control-C), or kill -SIGTERM the process. The server process will output

some messages on either stdout or stderr, so you may wish to log these to a fil*
 *e -

they contain some useful information like diagnostics when relation names can't

be looked up.

    If there are clients still attached, then you will have to wait until those*
 * clients

are gone before the server will shut itself down.


6.8    Caveats - PLEASE READ THIS CAREFULLY.


This is the section where I warn you about the nasty issues/problems which are

still out there and not dealt with. As mentioned above, there are many ergonomic

issues which we will address as we write some more concurrent applications.  I

will list some other problems which I am aware of.  I will try to fix the more

serious ones at least - but there are just so many little potential problems th*
 *at

this may take a while. There has been some pressure to get this out and testable

as soon as possible. My little list of problems follows:


    fflVariable length strings and memory aren't supported yet. [I expect to ha*
 *ve

      these working very soon. Kev]


    fflThere is no configuration management, utilities to shut the server down,

      or monitor it etc. Changes to the server.db file won't take place until i*
 *t is

DiamondBase_____________________________________________________________33_


     restarted.  At the moment, sending a suggestive signal like SIGTERM or

     SIGHUP will cause it to exit when convenient.


   fflThere is a static limit on the size of relation which can be handled (4K *
 *at

     the moment). Getting around this efficiently is a little sticky.


   fflThe rendezvous point between client and server is clumsy and essentially

     means they need to be run in the same directory.  Configuration at in-

     stallation time is one possibility, or at runtime which is probably more

     acceptable.


   fflThe schema compiler (DSC) is still a uni application, so it can blow away

     the database from under the server's feet if you are stupid enough.  This

     shouldn't be permitted.


   fflNon-vital commands in diaRel like verStr aren't implemented yet.


   fflDeath of server will freeze the application using it.


   fflSecurity is reliant on protecting the gibraltar file, and the code opens *
 *the

     IPC services as 600 currently, so only processes under the same uid as the

     server can access it.  This needs to be greatly improved.  Ideally, client

     instances should not be able to interfere with one another. We may have to

     resort to encryption to achieve this and there will be performance penalti*
 *es.

     Denial of service attacks are also possible.


   Just remember that if I have missed something, Use The Source Luke. The

files gib.cc and dclient.cc contain the relevant source for the server and clie*
 *nt

library respectively. They aren't too complicated.


6.9    Still to do


   fflReduce the size of the caveat list.


   fflProfile the code to remove bottlenecks


   fflReplacing the linked lists with hash tables for the client and object lis*
 *ts.


   fflUsing semaphores instead of message queues for efficiency


   fflAn OS/2 port.


   fflMore thorough test suite.


   fflAdd dbString support.


   fflPort/test on other unix platforms.


   fflGo to bed.

34_________________________________________________________DSC_Users_Guide_


7     DSC Users Guide


The DiamondBase Schema Compiler (DSC) is a utility provided with the Di-

amondBase database package which takes a lot of the hard work out of writing

database applications.  It does this by generating most of the code required to

maintain the database automatically.  In addition DSC generates new, empty

databases for each schema you specify.

    Throughout this section and the next, it is expected that the reader has a

good understanding of the concepts involved in relational database design[1].


7.1    A Simple Record Library


Let us begin with a small example.  Consider the application of producing a

database for storing record library information.  One part of such a database

would be a relation which stores information about a single artist. The informa-

tion we would like to store is the artist's name and the artist's abbreviation.*
 * The

specification of this relation is given below.

relation Artist

f

     // Field specification.


     char     artName[100];    // Artist's Name.

     char     abbr[10];        // Artist's Abbreviations.

     long     titles = 0;         // Number of titles.


     unique  artId;           // A unique number to identify

                                // this artist by.


     // Index specification


     index    name on artName;     // An index for artist names

     index    abbrev on abbr;      // An index for artist

                                    // abbreviations.


     // Constructors


     construct using titles,artName index name;

     construct titles;

g

    To make starting your DiamondBase application as easy as possible, the

specification of a schema is based on the specification of data structures in t*
 *he C

or C++ languages. As such some of the schema will be obvious, however before

continuing we will look at the important parts it.

DiamondBase_____________________________________________________________35_


   fflrelation Artist


   The first word here is a keyword, indicating that this is the start of a new

relation specification. The second is the name for the relation. This will beco*
 *me

the name of the class associated with this relation, thus becoming a type name.*
 * If

you wanted to use "Author" as a variable name in your program, consider calling

the relation something else in the schema.

   Note also the opening "brace" (`f'). The information belonging to this relat*
 *ion

is contained within these braces.  You must finish the relation with a closing

"brace" (`g').

   There is no limit to the number of relations which can appear in a single

schema file. Each must begin with the keyword relation and must be enclosed

within braces.

   There can be an optional "called name" after the relation and before the

brace. It allows the relation to have a different name to the underlying struct*
 *ure.


7.1.1  Field Specification


   fflchar artName[100];


   Now we begin the business of specifying the fields for the relation. First t*
 *he

type of the field is given, in this case a character (or as we shall see, an ar*
 *ray

or characters).  Next a name must be given to the field.  In this case the name

is "artName".  You will be able to access a class member called "artName" to

manipulate this data in the programs you write.

   Lastly, you have the option of specifying that more than one "type" (in this

case character) should be allocated. This is the syntax for specifying an array*
 * of

data items, and in the case of characters is also the method for declaring stri*
 *ngs.

   Default arguments can be given using an = sign. These are assigned to the

relation during construction.

   The valid types for use in a relation are given in the DSC Reference Manual.

   Note that each field specification must be terminated by a semi-colon (`;').


   fflunique authId;


   The unique type is special, in that it is never assigned a value by Diamond-

Base applications. Instead DiamondBase assigns a value to these fields when

a new record is created, guaranteeing that the value it gives is unique for that

relation at that time.

   It is important to note that these unique numbers are only instantaneously

unique. If a record is deleted from the relation, the unique value it once had *
 *is

then free to be re-used.

36_________________________________________________________DSC_Users_Guide_


7.1.2   Index Specification


Indicies are used as mechanisms for locating records within the relation based

on some sorting field or key.  As such their creation does not involve allocati*
 *ng

any additional storage within a relation. Instead DSC generates the appropriate

code to generate any keys which are required dynamically. The examples of index

specification above are similar, so the first will be used as an example.


    fflindex name on artName;


    The first word is the index keyword. This is followed by an identifier. Thi*
 *s is

the name of the index and can be used in application code when specifying which

index should be used for record queries 1.

    The name of the index is followed by another keyword on, and then a list of

one or more identifiers, seperated by commas. These identifiers are the names of

the fields which should be used as sort keys for the index and should be specif*
 *ied

in the order of importance. For example,

      index myindex on myfield1, myfield2;

would cause a b-tree to be created with myfield1 as a primary sorting key, and

myfield2 as a secondary sort key.


7.1.3   Constructor Specification


Constructors may be declared for your relation. You should make sure that you

do not create ambiguous constructors.  Your compiler will give an error if you

do. You should also not use a constructor that takes a char* and a long in that

order.  Such a constructor exists already so that you can construct using a key

and an index number.


    fflconstruct using titles,artName index name;


    This creates a constructor that accepts a long and char* and then does a get

using the name index. The "index name" portion is optional.


7.2    Compiling a Schema


After creating a schema you will want to compile it using DSC. This process

may generate a number of files, which by default will be created in the current

directory. For large databases which is inconvenient from a programming point

of view, if not simply messy.  So as an example, you might set up a directory

structure in your project path which had subdirectories called schema (for stor*
 *ing

schema descriptions) and dbfiles (for storing the generated database files).
______________________________
   1In fact the name which is made available to application programmers if pref*
 *ixed by

"dbIdx_", so that the index above would be refered to in code as "dbIdx_name".

DiamondBase_____________________________________________________________37_


   Your application source files might go in the root directory of your project

path, while schemas are located in the schema path, and the database files

themselves (containing the data) are stored in the dbfiles path.

   Let us assume that such a directory structure exists, and that the schema

file used in section 7.1 is in the schema path and is called artist.ds (the .ds

extension denotes a DiamondBase Schema file)2.  To compile this schema and

place the database files in the dbfiles path, we use the following command :


                             Specify the database file path
                                  z_____"______-
        dsc    I_schema_-z____"   O dbfiles            artist___-z__"

            Specify the schema path                The schema filename


   This could generate 3 source files in the current path called artist.h, arti*
 *st.cc

and artist_s.h.

   It then may create a new data file for each relation in your schema (in this

case only one), named after the relation which has a .db extension. This file is

placed in the dbfiles path.

   Finally DSC may create one .id file for each index specified in the schema.

This extension is suffixed by a number from 0 to 9, using a different number for

each index specified in the schema. These index files will also be located in t*
 *he

dbfiles path.

   The database files will only be generated if you use the -D flag and the code

described above will only be generated if you specify the -C flag.  This ensures

that you do not overwrite anything.


7.3    Derived relations


It is often the case that you would want to inherit a DSC relation into another

class so that you may augment its functionality.  To help with this, there is

another command line argument for DSC.


   ffldsc -G base derived file


   This will create "file.h" and "file.cc" files containing a class named deriv*
 *ed

which has a base class base.  The class will appear as a skeleton class based on

your base class.


7.4    Getting on with the Application


You now have everything that is required to start building your own application.

There are a few more command line options which may become useful down the

track as your application develops.  These are explained in detail in the next

section.
______________________________
   2schema files should have the extension ".ds" for clarity

38_____________________________________________________DSC_Reference_Guide_


8     DSC Reference Guide


8.1    Command Line Arguments


The command syntax for DSC is as follows :

     dsc [options] <schema name>

    The command line arguments to DSC are shown in figure 1.

    If no extension is given for the schema file, the extension .ds is used.


       _______________________________________________________________
       _ Option             _Action                                   _
       _________________________________________________________________
       _ -C                 _Create generated source code              _
       _                    _                                          _
       _ -D                 _Create database files                      _
       _                    _                                           _
       _ -G base derived fileG_enerate a derived class                   _
       _                      _                                          _
       _ -O < path >       _ Place generated database files in < path > _
       _                   _                                            _
       _ -S < path >        _Place generated source code in < path >   _
       _                    _                                          _
       _ -I < path >        _Use schema files located in < path >       _
       _______________________________________________________________  _


                   Figure 1: Command-line options to DSC


8.2    DSC Data Types


The types available in DSC are similar to those which are available in C++ with

three exceptions, and several ommissions. Table 2 shows all the legal types in a

DSC schema file, their C++ equivalents, their type identifiers and gives a short

description of their properties.


8.3    DSC Syntax Definition


The syntax for schema description is modelled on the C and C++ languages.

There may be one or more relations descripted in a schema file, with each being

preceeded by the relation phrase, and each enclosed in a pair or brackets (`f g*
 *').

Table 3 shows the complete syntax for DSC in BNF.

    NOTE This BNF is out of date. It is 3am and I have no plans to try

to persuade LaTEXwhat the current BNF looks like.  I suggest reading

dsc.y. It makes for many hours of fun for the whole family. Note that

the version below does not include default arguments and constructors

(at least).

DiamondBase_____________________________________________________________39_


___________________________________________________________________________
_ DSC Type _ C++ Type        _  Id _ Decsription                           _
_____________________________________________________________________________
_ long       _long int          0_ _ A long integer. The size of this type    _
_            _                   _ _                                          _
_            _                 _    _will vary, but is usually 32 bits.        *
 * _
___________________________________________________________________________    *
 * _
_ ulong      _unsigned long int 1_ _ A long integer which may only         _
_            _                   _ _                                       _
_            _                 _    _take positive values.                   _
___________________________________________________________________________  _
_ short      _short int         2_ _ A short integer. The size of this        _
_            _                   _ _                                          _
_            _                 _    _type will vary between compilers.       _
___________________________________________________________________________  _
_ ushort     _unsigned short int3_ _ A short integer which may only         _
_            _                   _ _                                        _
_            _                 _    _contain positive values.                _
___________________________________________________________________________  _
_ double     _double            _4  _A double precision floating point       _
_            _                  _   _                                        _
_            _                 _    _number.                              _
____________________________________________________________________________
_ float      f_loat             5_ _ A single precission floating point        _
_             _                  _ _                                           _
_            _                 _    _number.                              _
____________________________________________________________________________
_ money     _moneyType       _  6  _ A type defined to store currency        _
_           _                _     _                                         _
_            _                 _    _information.                           _
___________________________________________________________________________ _
_ date       _dateType         _7  _ A type defined to store date            _
_            _                 _   _                                         _
_            _                 _    _information.                           _
___________________________________________________________________________ _
_ char       _char              _8  _A character, or 8-bit number           _
___________________________________________________________________________ _
_ unique     _uniqueType       _9  _ A type whose value is guaranteed to be _
_            _                 _   _                                        _
_            _                 _    _instantaneously unique.                _
___________________________________________________________________________ _
_ dbString   _dbString          _10 _A resizable string class                 _
___________________________________________________________________________   _
_ dbData    _dbData           _ 11 _ A generic resizable binary data object   _
___________________________________________________________________________   _
_ ichar      c_har              _12 _A case insensitive char                 _
___________________________________________________________________________  _


               Table 2: Valid types and their integer identifiers

40_____________________________________________________DSC_Reference_Guide_


           _______________________________________________________
           _ schemaFile    !   schemaFile relation                 _
           _                                                       _
           _               !   null                               _
           _                                                      _
           _                                                     _
           _                                                     _
           _ relation      !   relation ident structName fieldList _
           _                                                       _
           _                                                     _
           _                                                     _
           _ structName    !   [is] called ident                   _
           _                                                       _
           _               !   null                               _
           _                                                      _
           _                                                     _
           _                                                     _
           _ fieldList     !   f fieldList1 indexList g             _
           _                                                        _
           _                                                     _
           _                                                     _
           _ fieldList1    !   fieldList1 type ident size ;          _
           _                                                         _
           _               !   null                               _
           _                                                      _
           _                                                     _
           _                                                     _
           _ type          !   short                             _
           _                                                     _
           _               !   ushort                            _
           _                                                     _
           _               !   long                              _
           _                                                     _
           _               !   ulong                             _
           _                                                     _
           _               !   float                              _
           _                                                      _
           _               !   double                            _
           _                                                     _
           _               !   char                              _
           _                                                     _
           _               !   money                            _
           _                                                    _
           _               !   date                              _
           _                                                     _
           _               !   unique                            _
           _                                                     _
           _               !   dbString                         _
           _                                                    _
           _               !   dbData                           _
           _                                                    _
           _                                                     _
           _                                                     _
           _ size          !   [ number ]                         _
           _                                                      _
           _               !   null                               _
           _                                                      _
           _                                                     _
           _                                                     _
           _ indexList     !   indexList indexSpec ;               _
           _                                                       _
           _               !   null                               _
           _                                                      _
           _                                                     _
           _                                                     _
           _ indexSpec     !   indexSpec2 , ident                 _
           _                                                      _
           _               !   null                               _
           ________________________________________________________


                        Table 3: BNF Syntax for DSC

DiamondBase_____________________________________________________________41_


8.4    File Formats


DSC is responsible for creating only the header for the actual .db file associa*
 *ted

with each relation - the creation of b-trees and storage of data in the .db fil*
 *e are

delegated out to other parts of DiamondBase . Consequently this section deals

only with the header information for the data file.


8.4.1  List Headers


Each data file contains enough information to derive the structure of each rela*
 *tion,

including the names of fields and the list of indicies. The information is brok*
 *en

into two parts - the field information and the index information. This informat*
 *ion

is encapsulated in two classes, the fieldList and indexList classes. These clas*
 *ses

are given in figure 2 and figure 3.


struct fieldList

f

    int          numFields;  // The number of fields for this

                               // relation

    fieldInfo  fields;      // A linked list of individual field

                               // information


    fieldList() ffields=0;numFields=0;g // Initialise class members


    fieldList() fdelete fields;g        // Destroy the list


    friend ostream& operator o (ostream&, fieldList&)

    friend ofstream& operator o (ofstream&, fieldList&)

    friend ifstream& operator AE (ifstream&, fieldList&)

g


                Figure 2: Class description for the Field Lists


   As you will see these are only place keepers for the heads of two linked lis*
 *ts.

Since they are both essentially the same only the first will be discussed.

   Firstly the number of entries in the linked list is stored to make reading t*
 *he

list from a file easier.  The only other data member is a pointer to the head of

the list.

   Function-wise there is a constructor which simply sets it's data members to

zero and a destructor which causes the memory allocated for the list to be dele*
 *ted.

   Finally there are three operators for input/output.  The first is an inserti*
 *on

operator for normal output streams. This just causes the information stored in

the linked list to be placed on the stream given. The second insertion operator

42_____________________________________________________DSC_Reference_Guide_


struct indexList

f

     int          numIndicies;  // The number of indicies for this

                                  // relation

     indexInfo  indicies;      // A linked list of individual index

                                  // information


     // Initialise class members

     indexList() findicies=0;numIndicies=0;g


     // Destroy the list

     indexList() fdelete indicies;g


     friend ostream& operator o (ostream&, indexList&)

     friend ofstream& operator o (ofstream&, indexList&)

     friend ifstream& operator AE (ifstream&, indexList&)

g


                 Figure 3: Class description for the Index List


writes the binary data to an output file. Lastly there is an extraction operator

for retreiving the binary information from an input file.


8.4.2   List Information


The actual information for each field or index if stored in the structures shown

in figure 4 and figure 5. There is little similarity between the two, so both m*
 *ust

be discussed.


    fflThe fieldInfo Structure


    Field information is located in the fieldInfo structure.  These are chained

together in a linked list, the head of which is stored in the fieldList structu*
 *re.

    The structure contains the name of the field (as a character array), the ty*
 *pe

of the data stored in that field (as an integer. The list of valid types and th*
 *eir

identifiers can be found in section 8.2), the size of the type in bytes and a p*
 *ointer

to the next fieldInfo structure in the list.

    The constructor for the class creates a new fieldInfo object with the values

passed to it as parameters.

    The destructor deletes the space allocated for the field name and then dele*
 *tes

the next fieldInfo structure in the list, if it exists.


    fflThe indexInfo Structure

DiamondBase_____________________________________________________________43_


   The indexInfo structure really only differes in the information it stores. T*
 *here

is the index type (not currently used), an array of fields which are used in the

index (listed in decreasing sort precedence), and a pointer to the next indexIn*
 *fo

structure.


8.4.3  Database File Headers


The structure of the file is shown in figure 6.


struct fieldInfo

f

    int          fldType;

    int          fldSize;

    char        fldName;

    fieldInfo  nextFld;


    fieldInfo(char name, int ty, int sz)

    f

         fldName = new char[strlen(name)+1];

         strcpy(fldName,name);

         fldType = ty;

         fldSize = sz;


         nextFld = 0;

    g

    fieldInfo()

    f

         delete fldName;

         if (nextFld)

             delete nextFld;

    g

    friend ostream& operator o (ostream&, fieldInfo&);

    friend ofstream& operator o (ofstream&, fieldInfo&);

g;


               Figure 4: Structures for storing field information

44_____________________________________________________DSC_Reference_Guide_


struct indexInfo

f

     int          idxType;

     int          idxFields[MAX_FIELDS_IN_INDEX];

     indexInfo  nextIdx;


     indexInfo(int type, TIndicies indicies )

     f

         idxType = type;

         for(int i=0;i<MAX_FIELDS_IN_INDEX;i++)

         f

              idxFields[i] = indicies[i];

         g

         nextIdx = 0;

     g

     indexInfo()

     f

         if (nextIdx)

              delete nextIdx;

     g

     friend ostream& operator o (ostream&, indexInfo&);

     friend ofstream& operator o (ofstream&, indexInfo&);

g;


               Figure 5: Structures for storing index information

DiamondBase_____________________________________________________________45_


                      _______________________________
                      _         Size of header         _
                      _                                _
                      _           (4 bytes)           _
                      _________________________________
                      _         fieldList struct         _
                      _                                  _
                      _           (8 bytes)           _
                      _______________________________ _
                      _  1 or more fieldInfo structures  _
                      _                                  _
                      _        (16 bytes each)        _
                      _________________________________
                      _       indexList structure       _
                      _                                 _
                      _           (8 bytes)           _
                      _______________________________ _
                      _ 0 or more indexInfo structures _
                      _                                _
                      _        (12 bytes each)        _
                      _________________________________
                      _          Data records          _
                      _                                _
                      _               :               _
                      _                               _
                      _               :               _
                      _                               _
                      _               :               _
                      _                               _
                      _               :               _
                      _______________________________ _


                  Figure 6: File structure for database files.

46______________________________________________________________File_formats_


9     File formats


9.1    Database Config file


This file just specifies the directory where the files related to a particular *
 *relation

reside. It contains lines in the form:


relationName=directory-path


9.2    recServer


The record server is a generic block storage system. Upon creation, a record si*
 *ze

is nominated, and space for 4096 records is allocated.  The object inheriting a

recServer can then request a new record id, store the record at that location or

delete the record at that location.

    The actual layout of the file is documented in the next section.  The file

basically consists of a header followed by alternate bit pools and record stora*
 *ge

areas.  The file will expand in size to accomodate demand for records.  It is

currently unable to shrink as demand for records decrease.  There are ways of

implementing this but they are currently not used.

    When the database becomes 80% full, the number of free slots is increased by

25% or 1 - whichever is bigger.  To keep the file size to a minimum, the record

space after the last bitPool isn't grabbed from the file system until records a*
 *re

actually stored there.

    Since the recServer is designed to be used as a storage facility for other *
 *pro-

gram modules, it has an offset which can be supplied at construction to reserve

a certain number of bytes at the beginning of the file for other purposes.


9.2.1   Header


The header keeps track of the current status of the database. Namely the size of

each record, the number of active records and the total storage capacity of the

database in records.


9.2.2   Bit Pools


Bitpools are essentially bitmaps which record which records are used in the fol-

lowing record storage area.  It also has a header to record how many bits are

used. This allows a quick scan of bits to determine which record slots are empty

in the following record storage area. Bitpools are currently a fixed size of 40*
 *96

slots - which are bit packed to occupy 512 bytes.

DiamondBase_____________________________________________________________47_


9.2.3  Record Pools


The record pools are a section of the file which is 4096 * lengthOfRecord long.

There is no flag on the actual record to indicate whether it is used or not - t*
 *his is

simply a storage area. The preceding bitPool must be interrogated to determine

the status of a particular bitPool.


9.2.4  bTree bucket layout


The bTree is where the real magic in the database happens. For a detailed view -

we suggest reading the comments in bucket.cc. The bTree consists of a header

and a data area. The header has pointers to the previous, next and parent bucke*
 *ts

as well as a flag to indicate interior or leaf node.  Note that the parent poin*
 *ter

may be incorrect if you were in the right hand side of the parent bucket when it

split. These pointers are repaired during tree traversal for efficiency reasons*
 *. It

also stores the number of active keys in the data area.

   The data area consists of alternating pointer key pairs with one more pointer

than key. The pointers are long integers which either point to child buckets (in

an interior node) or the record in the recServer at a leaf node. The key data is

long word aligned to avoid nasty problems during key comparison. This makes

the index storage larger than necessary but is faster. Intel architectures coul*
 *d be

optimized as the processor is alignment friendly.

   The bTree inherits a recServer to store its actual buckets. There is a global

header before the recServer header which records the unaligned length of the key

and where the root bucket is.

   I refuse to describe the full bTree manipulation details.  I can recommend

Smith, "Data Structures and Algorithms" for a good discussion. If you want to

know how ours works - read Kevin's code. If there are sufficient enquiries we w*
 *ill

describe - when we have looked at the code again.


9.3    Variable Length Fields


The format of the .str file used by the memServer for variable length fields (t*
 *ypes

dbData and dbString) are described in section 5.

48____________________________________________________________Miscellaneous_


10      Miscellaneous


10.1     ToDo


    fflImplementing multi-user version.

      This is very high priority - and will probably involve some use of shared

      memory and/or a central locking process.

      This has been implemented for a single machine, multi client model.


    fflImplementing duplicate records

      Not likely!  We've come to the conclusion that the inclusion of a unique

      field in an index gives exactly what ducplicate indexes gives you except *
 *the

      semantics are far better.


    fflA query language


    fflBtree reconstruction facility


    fflRecord dump facility This now exists in the db2txt program.  The recon-

      struction facility will be the reverse of this once written. It is next o*
 *n the

      agenda.


    fflMonitoring facility to count database operations


    fflDatabase browser

      This has been made easier by the recent addition of diaGRel.


    fflInteractive relation creation


    fflEliminate bTree indexing of completely unique fields


    fflBetter fan-out specification in database

      The bTree buckets are fixed size currently - and will take as many keys as

      will fit. The number of keys should be set and the size of the bTree buck*
 *et

      varied.


    fflefficient bTree deletion

      bTree nodes aren't merged when they drop below half full. They should be

      for efficiency reasons.


    fflDatabase update tool

      When adding fields to a database, it would be nice to be able to upgrade

      the existing database automatically. This has to be done by hand currentl*
 *y.

DiamondBase_____________________________________________________________49_


10.2    Profiling data


We include some sample profiling data for those speed freaks out there who want

to improve performance. We found some very interesting things after our initial

profiling experiments - like the fact that our memory checking library had 90%

of the CPU time.

   The bcopy function was also overused quite a bit and this led to far too many

function calls where much better alternatives were available.  Use of bcopy has

been reduced to a minimum.

   We have attempted to speed up all the very frequently used functions and

eliminate unnecessary calls to them.

   Kevin is to provide this bit. [Well, I will one day when I can collect all t*
 *he

data and get it into some format that might be vaguely useful for something

or someone and possibly formatted in a way that could fit here, in addition to

making this sentence shorter in some way or another].


References


 [1]C.J. Date, An Introduction to Database Systems, Volume 1, Fifth Edition,

    Addison-Wesley, 1992, pp 245-398.