GNU   davin.50webs.com/research
Bringing to you notes for the ages

       Main Menu          Research Projects         Photo Album            Curriculum Vitae      The Greatest Artists
    Email Address       Computer Games          Web Design          Java Training Wheels      The Fly (A Story)   
  Political Activism   Scruff the Cat       My Life Story          Smoking Cessation          Other Links      
Debugging Macros     String Class I     Linked List System I Java for C Programmers Naming Convention
    String Class II         How I use m4              Strings III                 Symmetrical I/O             Linked Lists II     
Run-Time Type Info   Virtual Methods      An Array System        Science & Religion            Submodes       
  Nested Packages      Memory Leaks    Garbage Collection      Internet & Poverty      What is Knowledge?
Limits of Evolution   Emacs Additions      Function Plotter           Romantic Love        The Next Big Thing
    Science Fiction     Faster Compilation Theory of Morality         Elisp Scoping               Elisp Advice      
  S.O.G.M. Pattern       Safe Properties         School Bullying          Charisma Control          Life and Death    
     Splitting Java          Multiple Ctors       Religious Beliefs         Conversation 1           Conversation 2    
   J.T.W. Language    Emacs Additions II      Build Counter             Relation Plotter          Lisp++ Language  
  Memory Leaks II   Super Constructors CRUD Implementation Order a Website Form There Is An Afterlife
More Occam's Razor C to Java Translator Theory of Morality II


Efficient anonymous strings in C++ Free Stuff

Abstract

This article presents a C++ string manipulation system that features anonymous strings with an efficient implementation. It is inspired by Java's string manipulation system and is largely compatible with it so that the large number of Java programmers will more easily be able to program in C++.

1.0 Definition

An anonymous string is one that does not have a name. For example, if a, b and c are strings, then the string built from concatenating these three strings a+b+c is anonymous.

2.0 Why I didn't implement anonymous strings in earlier systems

My earlier limited understanding of C++ prevented me from implementing an efficient anonymous string system. Whenever anonymous strings could be used I would make use of a temporary variable and use this variable to build what was an anonymous string. For example, if a, b and c are string objects, and foo is a function with argument string, then foo(a+b+c) would be rewritten as the more long-winded:

String temp;
temp << a;
temp << b;
temp << c;
foo(temp);

or the slightly shorter:

String temp;
foo(temp << a << b << c);

See an earlier article for a "proof" that efficient anonymous strings could not be implemented. This proof was only valid while my understanding of C++ was limited, and made invalid after further research.

3.0 Splitting my string class into three classes

This article presents a system for string manipulation that is based on an earlier article. The system mentioned in that article had one single string class, whereas my new system splits this class into three classes for greater compatibility with Java.

See another article for a diagram showing how my string manipulation system forms a part of my broader non-standard I/O library.

3.1 Moving operator<< into String_Buffer

My previous string class had four operator <<'s for writing to four built-in types:

// Inside class string...
virtual Writer& operator << (char ch);
virtual Writer& operator << (int i);
virtual Writer& operator << (double d);
virtual Writer& operator << (const char* s);

These four methods have been moved into the string_buffer class so that only string_buffer objects can be appended to and string objects are read-only. These functions have been changed to non-virtual in accordance with the efficiency motivated design pattern established in a later article.

// Inside class string_buffer...
string_buffer& operator << (char ch);
string_buffer& operator << (int i);
string_buffer& operator << (double d);
string_buffer& operator << (const char* s);

3.2 Moving set_char_at into String_Buffer

The method set_char_at from class string has been moved to string_buffer so that string objects are read-only and string_buffer objects can be written to. The method set_char_at(int i, char ch) writes ch to location i in the the string_buffer object this.

3.3 Renaming * to get_*

I have renamed two methods:

The reason I have done this is so the Emacs can syntax highlight all string methods in a different colour from the rest of the text.

4.0 A class for efficient anonymous strings

One day I discovered that the operator + can be defined with strings in such a way that it gives rise to efficient anonymous strings, comparable in efficiency to Java's implementation of anonymous strings. The solution is to create a new class which I call string_buffer2 for manipulating anonymous strings. Two operator +'s are then defined like so:

String_Buffer2 operator + (const String& s1, const String& s2)
{
   string_buffer2 result;
   result << s1;
   result << s2;
   return result;
}

String_Buffer2& operator + (const String_Buffer2& csb, const String& s)
{
   String_Buffer2& sb = const_cast<String_Buffer2&>(csb); // Ugly but essential cast
   sb << s;
   return sb;
}

With the above definitions in force and if a, b, and c are strings (or can be converted to strings via a conversion operator), an expression like a+b+c will be parsed as follows:

String_Buffer2 temp;
temp << a;
temp << b;
temp << c;
String result = temp;

Compare the above C++ code with how Java parses the anonymous string a+b+c and you will see that in terms of efficiency, they are practically identical:

StringBuffer temp = new StringBuffer();
temp.append(a);
temp.append(b);
temp.append(c);
String result = temp.toString();

Regarding the ugly but essential cast, Bjarne Stroustrup told me in an email that a const_cast is guaranteed to work unless the original object was declared const.

4.1 Adding a to_string function for using anonymous strings with arbitrary types

You want to build an anonymous string out of arbitrary types it is necessary to call a function like to_string that is analogous to Java's toString method. Inside the file io.hh I define the following template function for this purpose.

template<class T>
String to_string(const T& t)
{
   String_Buffer sb;
   sb >> t;
   return sb;
}

EXAMPLE: If a, b and c are strings and t is an instance of arbitrary class T, then here is how to build an anonymous string and pass it to method foo:

foo(a+b+c+to_string(t))

To instantiate this template function, simple call the function. Note that unlike Java's toString method, the explicit call to the to_string function cannot be omitted. The to_string function internally calls the operator << (Writer&, const T&) function where T is the class name of the object passed as argument to function to_string so operator << (Writer&, const T&) must be defined in class T for the to_string function to work.

5.0 Problems with anonymous strings

What follows is a list of potential problems faced by the new anonymous string system that is outlined in this article, together with solutions to the problems.

5.1 Counter-intuitive behaviour

A second problem is that with the above definitions in force, the following counter-intuitive behaviour is evidenced:

String_Buffer2 b;
b << "apple";
String s = b + " banana";
cout << s << endl;  // outputs "apple banana" as expected
cout << b << endl;  // outputs "apple banana" when "apple" was expected

A solution to this behaviour is to make the string_buffer2 constructor private, so that string_buffer2 objects can only be created anonymously with operator +, as they should be.

5.2 Can't suppress default semantics

Unfortunately the following operators have existing semantics and cannot be redefined:

operator + (int,char*)
operator + (int,char)
operator + (char*,char)
operator + (char*,int)
operator + (char,char*)
operator + (char,int)
operator + (char,char)

The following operator has no existing semantics but still cannot be defined:

operator + (char*,char*)

The consequence of this is that an expression like so:

  String s = "hello, " + "world";

won't compile. It needs to be rewritten with a explicit call to a string constructor like so:

String s = String() + "hello, " + "world";

It should be noted that if two strings values are known at compile time then the strings can be concatenated at compile time like so:

String s = "hello, " "world";

Therefore this string class only applies to strings whose values are not known at compile time. Worse still, an expression like so has an unexpected result, due to pointer arithmetic being employed:

String s = "hello" + 2;
cout << s << endl;        // outputs "llo" bizarrely

To achieve the concatenation of hello onto the number two requires, as above, an explicit call to the string constructor like so:

String s = String() + "hello" + 2;
cout << s << endl;         // outputs "hello2" as expected

Java has a similar problem. If a, b and c are arbitrary types then to send the concatenation of the string built from a, b and c to a method foo one should write:

foo(string() + a + b + c)

Otherwise, if a and b are of type int, then foo(a+b+c) is not the concatenation of a, b and c, but the arithmetic sum a+b concatenated with c.

5.3 Limited string size

A limitation of two earlier string classes developed by the author (1 and 2) was that the maximum size of the string objects was limited to a compile-time constant. Increasing the value of this constant meant that memory was wasted as every string object was sized at the value of this constant.


The string class featured in this article grows the internal string data structure as the size of the string dictates, resulting in both speed and conservative memory use.

5.4 Not allowed to change the standard library's string class

In an email Andrew Koenig told me that it was generally not acceptable to change the behaviour of the standard C++ string class. This is because the large amount of existing code that uses the string class and assumes that the class behaves in a certain way would be broken by a change to the behaviour of the string class.


The string manipulating classes presented in this article stand in their own right as a complete C++ Java-like string system. I hope that other people will find my system useful in their own programs, otherwise my system will languish with me as the only user of it!

6.0 Compatibility with C strings

My String class has two methods for converting String objects to a char* or const char* for accessing C functions:

const char* const_char_star() const;
char*       char_star(int mem_size, char* s) const;

Here is how to use them:

void foo(String s)
{
   printf("%s", s.const_char_star());  // Function printf expects a const char*

   const int MAX_LENGTH = 200
   char array[MAX_LENGTH];

   bar(s.char_star(MAX_LENGTH,array))  // Function bar expects a char*
}

It should be noted that the method const_char_star returns a pointer that can only be guaranteed to exist until the next string manipulation. If a long term pointer is needed use the method char_star.

7.0 Emacs syntax highlighting

Add the following lines of Emacs Lisp code to your .emacs file to achieve correct syntax highlighting of the string classes:

;; The following code highlights the string classes:
(kill-local-variable 'c++-font-lock-extra-types)
(if (not (boundp 'c++-font-lock-extra-types))
   (setq c++-font-lock-extra-types nil))
(setq-default c++-font-lock-extra-types
      (append '("[A-Z]" "[A-Z0-9_]+[a-z][a-zA-Z0-9_]*")
              c++-font-lock-extra-types))

8.0 The source code

The following program listings are intended for the GNU C++ compiler, but will probably work on other compilers too.


  • String class
   string.hh
  • My non-standard I/O library:
   Interface file:io.hh
   Implementation file:io.cc
  • Tester module:
   t-string.cc
  • Java tester module:
   TString.java
  • The complete archive:
   io.tar.gz

Back to Research Projects
This page has the following hit count:
| Main Menu | Research Projects | Photo Album | Curriculum Vitae | The Greatest Artists |
| Email Address | Computer Games | Web Design | Java Training Wheels | The Fly (A Story) |
| Political Activism | Scruff the Cat | My Life Story | Smoking Cessation | Other Links |
| Debugging Macros | String Class I | Linked List System I | Java for C Programmers | Naming Convention |
| String Class II | How I use m4 | Strings III | Symmetrical I/O | Linked Lists II |
| Run-Time Type Info | Virtual Methods | An Array System | Science & Religion | Submodes |
| Nested Packages | Memory Leaks | Garbage Collection | Internet & Poverty | What is Knowledge? |
| Limits of Evolution | Emacs Additions | Function Plotter | Romantic Love | The Next Big Thing |
| Science Fiction | Faster Compilation | Theory of Morality | Elisp Scoping | Elisp Advice |
| S.O.G.M. Pattern | Safe Properties | School Bullying | Charisma Control | Life and Death |
| Splitting Java | Multiple Ctors | Religious Beliefs | Conversation 1 | Conversation 2 |
| J.T.W. Language | Emacs Additions II | Build Counter | Relation Plotter | Lisp++ Language |
| Memory Leaks II | Super Constructors | CRUD Implementation | Order a Website Form | There Is An Afterlife |
| More Occam's Razor | C to Java Translator | Theory of Morality II
Last modified: Sun Sep 25 16:11:42 NZDT 2016
Best viewed at 800x600 or above resolution.
© Copyright 1999-2016 Davin Pearson.
Please report any broken links to