Issue #010 May, 1996 Contents: Introduction to Templates Part 2 - Class Templates Introduction to Stream I/O Part 5 - Streambuf Using C++ as a Better C Part 10 - General Initializers Performance - Per-class New/Delete INTRODUCTION TO TEMPLATES PART 2 - CLASS TEMPLATES To continue our introduction of C++ templates, we'll be saying a few things about class templates in this issue. Templates are a part of the language still undergoing major changes, and it's tricky to figure out just what to say. But we'll cover some basics that are well accepted and in current usage. A skeleton for a class template, and definitions of a member function and a static data item, looks like this: template class A { void f(); static T x; }; template void A::f() { // stuff } template T A::x = 0; T is a placeholder for a template type argument, and is bound to that argument when the template is instantiated. For example, if I say: A a; then the type value of T is "double". The binding of template arguments and the generation of an actual class from a template is a process known as "instantiation". You can view a template as a skeleton or macro or framework. When specific types, such as double, are added to this skeleton, the result is an actual C++ class. Template arguments may also be constant expressions: template struct A { // stuff }; A<-37> a; This feature is useful in the case where you want to pass a size into the template. For example, a Vector template might accept a type argument that tells what type of elements will be operated on, and a size argument giving the vector length: template class Vector { // stuff }; Vector v; A template argument may have a default specified (this feature is not widely available as yet): template class Vector { // stuff }; Vector v1; // Vector Vector v2; // Vector Vector<> v3; // Vector To see how some of these basic ideas fit together, let's actually build a simple Vector template, with set() and get() functions: template class Vector { T vec[N]; public: void set(int pos, T val); T get(int pos); }; template void Vector::set(int pos, T val) { if (pos < 0 || pos >= N) ; // give error of some kind vec[pos] = val; } template T Vector::get(int pos) { if (pos < 0 || pos >= N) ; // give error of some kind return vec[pos]; } // driver program int main() { Vector v; int i = 0; double d = 0.0; // set locations in vector for (i = 0; i < 10; i++) v.set(i, double(i * i)); // get location values from vector for (i = 0; i < 10; i++) d = v.get(i); return 0; } Actual values are stored in a private vector of type T and length N. In a real Vector class we might overload operator[] to provide a natural sort of interface such as an actual vector has. What would happen if we said something like: Vector v; This is an example of code that is legal until the template is actually instantiated into a class. Because a member like: char vec[-1000]; is not valid (you can't have arrays of negative or zero size), this usage will be flagged as an error when instantiation is done. The process of instantiation itself is a bit tricky. If I have 10 translation units (source files), and each uses an instantiated class: Vector where does the code for the instantiated class's member functions go? The template definition itself resides most commonly in a header file, so that it can be accessed everywhere and because template code has some different properties than other source code. This is an extremely hard problem for a compiler to solve. One solution is to make all template functions inline and duplicate the code for them per translation unit. This results in very fast but potentially bulky code. Another approach, which works if you have control over the object file format and the linker, is to generate duplicate instantiations per object file and then use the linker to merge them. Yet another approach is to create auxiliary files or directories ("repositories") that have a memory of what has been instantiated in which object file, and use that state file in conjunction with the compiler and linker to control the instantiation process. There are also schemes for explicitly forcing instantiation to take place. We'll discuss these in a future issue. The instantiation issue is usually hidden from a programmer, but sometimes becomes visible, for example if the programmer notices that object file sizes seem bloated. INTRODUCTION TO STREAM I/O PART 5 - STREAMBUF In previous issues we talked about various ways of copying files using stream I/O, some of the ways of affecting I/O operations by specifying unit buffering or not and tying one stream to another, and so on. Another way of copying input to output using stream I/O is to say: #include int main() { int c; while ((c = cin.rdbuf()->sbumpc()) != EOF) cout.rdbuf()->sputc(c); return 0; } This scheme uses what are known as streambufs, underlying buffers used in the stream I/O package. An expression: cin.rdbuf()->sbumpc() says "obtain the streambuf pointer for the standard input stream (cin) and grab the next character from it and then advance the internal pointer within the buffer". Similarly, cout.rdbuf()->sputc(c) adds a character to the output buffer. Doing I/O in this way is lower-level than some other approaches, but correspondingly faster. If we summarize the four file-copying methods we've studied (see issues #008 and #009 for code examples of them), from slowest to fastest, they might be as follows. Copy a character at a time with >> and <<: cin.tie(0); cin.unsetf(ios::skipws); while (cin >> c) cout << c; Copy using get() and put(): ifstream ifs(argv[1], ios::in | ios::binary); ofstream ofs(argv[2], ios::out | ios::binary); while (ifs.get(c)) ofs.put(c); Copy with streambufs (above): while ((c = cin.rdbuf()->sbumpc()) != EOF) cout.rdbuf()->sputc(c); Copy with streambufs but explicit copying buried: ifstream ifs(argv[1], ios::in | ios::binary); ofstream ofs(argv[2], ios::out | ios::binary); ofs << ifs.rdbuf(); A table of relative times, for one popular C++ compiler, comes out like so: >>, << 100 get/put 72 streambuf 62 streambuf hidden 43 Actual times will vary for a given library. Perhaps the most critical factor is whether functions that are used in a given case are inlined or not. Note also that if you are copying binary files you need to be careful with the way copying is done. Why the time differences? All of these methods use streambufs in some form. But the slowest method, using >> and <<, also does additional processing. For example, it calls internal functions like ipfx() and opfx() to handle unit buffering, elision of whitespace on input, and so on. get/put also call these functions. The fastest two approaches do not worry about such processing, but simply allow one to manipulate the underlying buffer directly. They offer fewer services but are correspondingly faster. USING C++ AS A BETTER C PART 10 - GENERAL INITIALIZERS In C, usage like: int f() {return 37;} int i = 47; int j; for global variables is legal. Typically, in an object file and an executable program these types of declarations might be lumped into sections with names like "text", "data", and "bss", meaning "program code", "data with an initializer", and "data with no initializer". When a program is loaded by the operating system for execution, a common scheme will have the text and data stored within the binary file on disk that represents the program, and the bss section simply stored as an entry in a symbol table and created and zeroed dynamically when the program is loaded. There are variations on this scheme, such as shared libraries, that are not our concern here. Rather, we want to discuss the workings of an extension that C++ makes to this scheme, namely general initializers for globals. For example, I can say: int f() {return 37;} int i = 47; int j = f() + i; In some simple cases a clever compiler can compute the value that should go into j, but in general such values are not computable at compile time. Note also that sequences like: class A { public: A(); ~A(); }; A a; are legal, with the global "a" object constructed before the program "really" starts, and destructed "after" the program terminates. Since values cannot be computed at compile time, they must be computed at run time. How is this done? One way is to generate a dummy function per object file: int f() {return 37;} int i = 47; int j; // = f() + i; static void __startup() { j = f() + i; } and a similar function for shutdown as would be needed for calling destructors. Using a small tool that will modify binaries, and an auxiliary data structure generated by the compiler, it's possible to link all these __startup() function instances together in a linked list, that can be traversed when the program starts. Typically this is done by immediately generating a call from within main() to a C++ library function _main() that iterates over all the __startup() functions. On program exit, similar magic takes place, typically tied to exit() function processing. This approach is used in some compilers but is not required; the standard mandates "what" rather than "how". Some aspects of this processing have precedent in C. For example, when a program starts, standard I/O streams stdin, stdout, and stderr are established for doing I/O. Within a given translation unit (source file), objects are initialized in the order of occurrence, and destructed in reverse order (last in first out). No ordering is imposed between files. Some ambitious standards proposals have been made with regard to initialization ordering, but none have caught on. The draft standard says simply that all static objects in a translation unit (objects that persist for the life of the program) are zeroed, then constant initializers are applied (as in C), then dynamic general initializers are applied "before the first use of a function or object defined in that translation unit". Calling the function abort() defined in the standard library will terminate the program without destructors for global static objects being called. Note that some libraries, for example stream I/O, rely on destruction of global class objects as a hook for flushing I/O buffers. You should not rely on any particular order of initialization of global objects, and using a startup() function called from main(), just as in C, still can make sense as a program structuring mechanism for initializing global objects. PERFORMANCE - PER-CLASS NEW/DELETE Some types of applications tend to use many small blocks of space for allocating nodes for particular types of data structures, small strings, and so on. In issue #002 we talked about a technique for efficiently allocating many small strings. Another way of tackling this problem is to overload the new/delete operators on a per-class basis. That is, take over responsibility for allocating and deallocating the storage required by class objects. Here is an example of what this would look like for a class A: #include #include class A { int data; A* next; #ifdef USE_ND static A* freelist; // pointer to free list #endif public: A(); ~A(); #ifdef USE_ND void* operator new(size_t); // overloaded new() void operator delete(void*); // overloaded delete() #endif }; #ifdef USE_ND A* A::freelist = 0; inline void* A::operator new(size_t sz) { // get free node from freelist if any if (freelist) { A* p = freelist; freelist = freelist->next; return p; } // call malloc() otherwise return malloc(sz); } inline void A::operator delete(void* vp) { A* p = (A*)vp; // link freed node onto freelist p->next = freelist; freelist = p; } #endif A::A() {} A::~A() {} #ifdef DRIVER const int N = 1000; A* aptr[N]; int main() { int i; int j; // repeatedly allocate / deallocate A objects for (i = 1; i <= N; i++) { for (j = 0; j < N; j++) aptr[j] = new A(); for (j = 0; j < N; j++) delete aptr[j]; } return 0; } #endif We've also included a driver program. For this example, that recycles the memory for object instances, the new approach is about 4-5X faster than the standard approach. When new() is called for an A type, the overloaded function checks the free list to see if any old recycled instances are around, and if so one of them is used instead of calling malloc(). The free list is shared across all object instances (the freelist variable is static). delete() simply returns a no-longer-needed instance to the free list. This technique is useful only for dynamically-created objects. For static or local objects, the storage has already been allocated (on the stack, for example). We have again sidestepped the issue of whether a failure in new() should throw an exception instead of returning an error value. This is an area in transition in the language. There are other issues with writing your own storage allocator. For example, you have to make sure that the memory for an object is aligned correctly. A double of 8-byte length may need to be aligned, say, on a 4-byte boundary for performance reasons or to avoid addressing exceptions ("bus error - core dumped" on a UNIX system). Other issues include fragmentation and support for program threads. ACKNOWLEDGEMENTS Thanks to Nathan Myers, Eric Nagler, David Nelson, Terry Rudd, Jonathan Schilling, and Clay Wilson for help with proofreading. SUBSCRIPTION INFORMATION / BACK ISSUES To subscribe to the newsletter, send mail to majordomo@world.std.com with this line as its message body: subscribe c_plus_plus Back issues are available via FTP from: rmii.com /pub2/glenm/newslett or on the Web at: http://www.rmii.com/~glenm There is also a Java newsletter. To subscribe to it, say: subscribe java_letter using the same majordomo@world.std.com address. ------------------------- Copyright (c) 1996 Glen McCluskey. All Rights Reserved. This newsletter may be further distributed provided that it is copied in its entirety, including the newsletter number at the top and the copyright and contact information at the bottom. Glen McCluskey & Associates Professional C++ Consulting Internet: glenm@glenmccl.com Phone: (800) 722-1613 or (970) 490-2462 Fax: (970) 490-2463 FTP: rmii.com /pub2/glenm/newslett (for back issues) Web: http://www.rmii.com/~glenm