Courageous said:
That's one of the things I don't understand. In the large, string internalization compacts memory, because instead of using the number of bytes in the string each time it is instanced, you get a pointer to a lexigraphically identical string.
I guess this is as obviously compact on 64 bit systems as it is on 32 bit ones, but be that as it may...
C//
p.s. have implemented string internalization tables myself, for special projects.
As interning pertains to commonly used strings, I don't disagree with you at all Courageous, I agree completely, interning can produce a great increase in the efficiency of both memory usage and certain computations involving the instances.
What the main problem stems from is either the automatic interning of strings or the reckless interning of strings (essentially one in the same). This is where string interning fails, since there may very well be thousands of strings in the intern table that have only been used once and will never be used again, yet they still are present (unless of course some kind of caching and weak reference system was used, but that's another topic) until at least the end of the application. And those could just be the strings that were blatantly created in the program, also stored in the table would be all strings resulting from operations on a string (one such example I could think of was profiling operations, such as ("Running time: " + msRan + " ms"), now, likely, every time would represent a different entry in the table).
Also, although string internalization can greatly help lower the memory usage of your application, even judicious usage could potentially hamper you. For example, say that my application interns 100 kilobytes worth of strings that it uses commonly throughout the program. I now have an extra 100 kilobytes of memory added on to my working set at all times. Also, consider if at any one time I was only using five percent of the strings in the table. Normally only 5 kilobytes would need to be present in memory at a time, since the garbage collector would have cleaned up the rest when the program was not using them and reallocated them when they were needed again, now all of the data must constantly reside in memory for at least the life of the application. Essentially, my concerns boil down to the fact that doing this seems to tread on the ground that the GC is best at walking and saying that you will be doing a better job than it, at least in this instance (of which I have no problem with, just so long as it's done carefully and is backed up with statistics confirming the reasoning used, and, of course, this is really only true if this is done solely in the name of memory conservation and efficiency and not, for instance, for string comparison efficiency gains or some other advantage interning offers).
Given all this, I've reached the same conclusion I had before, all things in moderation, lest they be misused through unintentional overuse. String interning suffers from the same caveats as any other feature and, just like everything else, needs to be used judiciously and have logic backing up the reasons for its use.