GNU   davin.50webs.com/research
Bringing to you notes for the ages

       Main Menu          Research Projects         Photo Album            Curriculum Vitae      The Greatest Artists
    Email Address       Computer Games          Web Design          Java Training Wheels      The Fly (A Story)   
  Political Activism   Scruff the Cat       My Life Story          Smoking Cessation          Other Links      
Debugging Macros     String Class I     Linked List System I Java for C Programmers Naming Convention
    String Class II         How I use m4              Strings III                 Symmetrical I/O             Linked Lists II     
Run-Time Type Info   Virtual Methods      An Array System        Science & Religion            Submodes       
  Nested Packages      Memory Leaks    Garbage Collection      Internet & Poverty      What is Knowledge?
Limits of Evolution   Emacs Additions      Function Plotter           Romantic Love        The Next Big Thing
    Science Fiction     Faster Compilation Theory of Morality         Elisp Scoping               Elisp Advice      
  S.O.G.M. Pattern       Safe Properties         School Bullying          Charisma Control          Life and Death    
     Splitting Java          Multiple Ctors       Religious Beliefs         Conversation 1           Conversation 2    
   J.T.W. Language    Emacs Additions II      Build Counter             Relation Plotter          Lisp++ Language  
  Memory Leaks II   Super Constructors CRUD Implementation Order a Website Form There Is An Afterlife
More Occam's Razor C to Java Translator Theory of Morality II


An improved string class Free Stuff

Abstract

This article describes a new C++ string class called which is based on the old C++ string class of an earlier article.

1. Introduction

The new string class differs from the old string class in how strings are copied. The reason for this change is to make the behaviour of the string class more like the behaviour of Java's string class.


Behaviour of the earlier C++ string class:

String s = "abc";
String t = s;
t << "def"; // appends "def" to the end of string t
cout << s;  // prints out "abcdef"
cout << t;  // prints out "abcdef"

Contrast this with the behaviour of the new C++ string class:

String s = "abc";
String t = s;
t << "def"; // appends "def" to the end of string t
cout << s;  // prints out "abc"
cout << t;  // prints out "abcdef"

The result of this change is that the old string class now maps onto the new string class of Java under the following translation:

  1. Replace C++ expressions of the form s << t with Java expressions s += t.
  2. Replace C++ expressions involving cout with Java expressions involving System.out.println(...).

Under the above translation scheme, the previous example can be translated into Java code to produce identical output:

String s = "abc";
String t = s;
t += "def";
System.out.println(s); // prints out "abc"
System.out.println(t); // prints out "abcdef"

2. Isn't operator += more natural than operator <<?

The new string class that is described in this article uses operator << for string concatenation. In designing a C++ string class that is as close as possible to Java's string class, one would consider replacing operator << with operator += .


Unfortunately C++'s operator += is right associative, which means that:

String x;                        // x now holds ""
x += "123" += "456" += "789";    // error
would need to be rewritten as:
String x;                           // x now holds ""
((x += "123") += "456") += "789";   // x now holds "123456789"
or the more long-winded:
String x;     // x now holds ""
x += "123";   // x now holds "123"
x += "456";   // x now holds "123456"
x += "789";   // x now holds "123456789"

to get the desired effect. The operator << is left associative, so the example can be rewritten more compactly:

String x;                      // x now holds ""
x << "123" << "456" << "789"   // x now holds "123456789"

This is the reason that operator << is still used.

3. Why is operator + not used to concatenate strings?

C++'s operator + is not used because it would require that a new string be returned, which would mean that in the following example:

a = "abcdefghijklmnopqrstuvwxyz."
b = "banana."
c = "carrot."

The expression a+b+c would require the following operations to be carried out:

String temp1;
temp1 << a;     // appends "abcdefghijklmnopqrstuvwxyz." onto temp1
temp1 << b;     // appends "banana." onto temp1

String temp2;
temp2 << temp1; // appends "abcdefghijklmnopqrstuvwxyz.banana." onto temp2
temp2 << c;     // appends "carrot." onto temp2

string result = temp2;

This is an excessive amount of string concatenation, compared with the equivalent example involving operator <<:

String result;
result << a;    // appends "abcdefghijklmnopqrstuvwxyz." onto result
result << b;    // appends "banana." onto result
result << c;    // appends "carrot." onto result

Java achieves this same level of efficiency. With a, b and c defined as above, Java parses a+b+c as follows:

StringBuffer temp = new StringBuffer();
temp.append(a);   // appends "abcdefghijklmnopqrstuvwxyz." onto temp
temp.append(b);   // appends "banana." onto temp
temp.append(c);   // appends "carrot." onto temp
String result = temp.toString();

4. Anonymous Strings

If operator + was present in the string class, then this would allow for anonymous strings, which are strings that are built from other strings that don't have a name. If x, y and z are strings and foo is a function taking a string argument then in the following expression:

foo(x+y+z)

the string x+y+z is anonymous. Because operator + is not present, a temporary variable has to be introduced for storing the value of the argument to the function:

string temp;
temp << x << y << z;
foo(temp);

or alternatively:

string temp;
foo(temp << x << y << z);

This is considered by the author to be so close to the ideal that anonymous strings are not needed. As stated above Java parses foo(x+y+z) as:

foo((new StringBuffer()).append(x).append(y).append(z).toString());

Java therefore benefits from being more up-to-date in it's fundamental design that it allows for anonymous strings to be implemented efficiently. In spite of this, in a later article a string system is presented that has anonymous strings with an efficient implementation. The above remarks of the impossibility of an efficient system of anonymous strings should be taken as an idea that has been superseded by later research.

5. A Limitation of the string Class

The size of string objects is limited to the value of String_Internal::ARRAY_SIZE constant, which is currently set at 1000. This number could be increased but only at the expense of memory. A more ideal solution would be for the String_Internal class objects to have the capacity to re-size themselves as required. Such a solution is implemented in a later article.

6. The Sources

The following sources are intended for the GNU C++ compiler. Customised I/O routines are provided and cout and cerr are defined to be instances of the customised output class called File_Writer. C++'s streams are not used. The customised output routines internally call the C printf function and the upshot of this is that output calls can be interspersed with calls to the printf function in the client code.


  • String interface file:
  string.hh
  • Personal coding preferences:
  gccprefs.hh
  • Output routines:
  writer.hh
  • Parsing routines:
  reader.hh
  • Nonstandard I/O library:
  • Interface file:
  io.hh
  • Implementation file:
  io.cc
  • I/O tester module:
  t-io.cc
  • String tester module:
  t-string.cc
  • The complete archive:
  string2.tar.gz

Back to Research Projects
This page has the following hit count:
| Main Menu | Research Projects | Photo Album | Curriculum Vitae | The Greatest Artists |
| Email Address | Computer Games | Web Design | Java Training Wheels | The Fly (A Story) |
| Political Activism | Scruff the Cat | My Life Story | Smoking Cessation | Other Links |
| Debugging Macros | String Class I | Linked List System I | Java for C Programmers | Naming Convention |
| String Class II | How I use m4 | Strings III | Symmetrical I/O | Linked Lists II |
| Run-Time Type Info | Virtual Methods | An Array System | Science & Religion | Submodes |
| Nested Packages | Memory Leaks | Garbage Collection | Internet & Poverty | What is Knowledge? |
| Limits of Evolution | Emacs Additions | Function Plotter | Romantic Love | The Next Big Thing |
| Science Fiction | Faster Compilation | Theory of Morality | Elisp Scoping | Elisp Advice |
| S.O.G.M. Pattern | Safe Properties | School Bullying | Charisma Control | Life and Death |
| Splitting Java | Multiple Ctors | Religious Beliefs | Conversation 1 | Conversation 2 |
| J.T.W. Language | Emacs Additions II | Build Counter | Relation Plotter | Lisp++ Language |
| Memory Leaks II | Super Constructors | CRUD Implementation | Order a Website Form | There Is An Afterlife |
| More Occam's Razor | C to Java Translator | Theory of Morality II
Last modified: Sun Sep 25 16:11:32 NZDT 2016
Best viewed at 800x600 or above resolution.
© Copyright 1999-2016 Davin Pearson.
Please report any broken links to