This article describes a new C++ string class called which is based on the old C++ string class of an earlier article.
The new string class differs from the old string class in how strings are copied. The reason for this change is to make the behaviour of the string class more like the behaviour of Java's string class.
Behaviour of the earlier C++ string class:
String s = "abc"; String t = s; t << "def"; // appends "def" to the end of string t cout << s; // prints out "abcdef" cout << t; // prints out "abcdef" |
Contrast this with the behaviour of the new C++ string class:
String s = "abc"; String t = s; t << "def"; // appends "def" to the end of string t cout << s; // prints out "abc" cout << t; // prints out "abcdef" |
The result of this change is that the old string class now maps onto the new string class of Java under the following translation:
Under the above translation scheme, the previous example can be translated into Java code to produce identical output:
String s = "abc"; String t = s; t += "def"; System.out.println(s); // prints out "abc" System.out.println(t); // prints out "abcdef" |
The new string class that is described in this article uses operator << for string concatenation. In designing a C++ string class that is as close as possible to Java's string class, one would consider replacing operator << with operator += .
Unfortunately C++'s operator += is right associative, which means that:
String x; // x now holds "" x += "123" += "456" += "789"; // error |
String x; // x now holds "" ((x += "123") += "456") += "789"; // x now holds "123456789" |
String x; // x now holds "" x += "123"; // x now holds "123" x += "456"; // x now holds "123456" x += "789"; // x now holds "123456789" |
to get the desired effect. The operator << is left associative, so the example can be rewritten more compactly:
String x; // x now holds "" x << "123" << "456" << "789" // x now holds "123456789" |
This is the reason that operator << is still used.
C++'s operator + is not used because it would require that a new string be returned, which would mean that in the following example:
a = "abcdefghijklmnopqrstuvwxyz." b = "banana." c = "carrot." |
The expression a+b+c would require the following operations to be carried out:
String temp1; temp1 << a; // appends "abcdefghijklmnopqrstuvwxyz." onto temp1 temp1 << b; // appends "banana." onto temp1 String temp2; temp2 << temp1; // appends "abcdefghijklmnopqrstuvwxyz.banana." onto temp2 temp2 << c; // appends "carrot." onto temp2 string result = temp2; |
This is an excessive amount of string concatenation, compared with the equivalent example involving operator <<:
String result; result << a; // appends "abcdefghijklmnopqrstuvwxyz." onto result result << b; // appends "banana." onto result result << c; // appends "carrot." onto result |
Java achieves this same level of efficiency. With a, b and c defined as above, Java parses a+b+c as follows:
StringBuffer temp = new StringBuffer(); temp.append(a); // appends "abcdefghijklmnopqrstuvwxyz." onto temp temp.append(b); // appends "banana." onto temp temp.append(c); // appends "carrot." onto temp String result = temp.toString(); |
If operator + was present in the string class, then this would allow for anonymous strings, which are strings that are built from other strings that don't have a name. If x, y and z are strings and foo is a function taking a string argument then in the following expression:
foo(x+y+z) |
the string x+y+z is anonymous. Because operator + is not present, a temporary variable has to be introduced for storing the value of the argument to the function:
string temp; temp << x << y << z; foo(temp); |
or alternatively:
string temp; foo(temp << x << y << z); |
This is considered by the author to be so close to the ideal that anonymous strings are not needed. As stated above Java parses foo(x+y+z) as:
foo((new StringBuffer()).append(x).append(y).append(z).toString()); |
Java therefore benefits from being more up-to-date in it's fundamental design that it allows for anonymous strings to be implemented efficiently. In spite of this, in a later article a string system is presented that has anonymous strings with an efficient implementation. The above remarks of the impossibility of an efficient system of anonymous strings should be taken as an idea that has been superseded by later research.
The size of string objects is limited to the value of String_Internal::ARRAY_SIZE constant, which is currently set at 1000. This number could be increased but only at the expense of memory. A more ideal solution would be for the String_Internal class objects to have the capacity to re-size themselves as required. Such a solution is implemented in a later article.
The following sources are intended for the GNU C++ compiler. Customised I/O routines are provided and cout and cerr are defined to be instances of the customised output class called File_Writer. C++'s streams are not used. The customised output routines internally call the C printf function and the upshot of this is that output calls can be interspersed with calls to the printf function in the client code.
|
string.hh | ||||
| gccprefs.hh | ||||
|
writer.hh | ||||
|
reader.hh | ||||
|
|
||||
|
t-io.cc | ||||
|
t-string.cc | ||||
|
string2.tar.gz |
Back to Research Projects |
This page has the following hit count:
|