Saturday, April 15, 2023

Strings, Strang, Strung, Unicode, Shmoonicode, ASCII - The Wonderful Life of Strings

 Stringssssssssss Thy Precious'

 


Intro

Strings are fascinating entities in the world of programming. For C# and Java programmers, they may seem easy to work with, but for those of us working in C and C++, they can be quite complex beasts. Imagine them like chameleons on a plain colored leaf, waiting to deceive the observer with their next background.

To understand the habitat of strings, we must go back to their distant ancestors - the typewriters. These mechanical beasts ruled the days before their electronic kin were brought into being. They were at the beginning purely mechanical in their form. Users had to move the type bars that impressed the types upon paper, using their own energy. But with the advent of electric typewriters and teletypewriters, things changed. These machines were driven by an arcane power called electricity, which converted the user's input into electrical signals, and moved these impulses into the arms that impressed the type.

However, when you decouple one part from another, an intermediate form becomes a necessity. To make the mechanical parts of the typewriter interact with the electrical parts, sets of character encodings were devised. These eventually begat the unified ASCII standard of character encoding. ASCII defined a 7-bit code for denoting all characters that can be typed, which allowed for the addressing of 2^7 characters or 128 of them. At the time, this was sufficient as English can be written with 26 upper and 26 lower case letters, and some punctuation.

But with the rise of computers and video displays, ASCII became not only the internal storage format but also the character set that was displayed on screen. The emptiness of video displays meant that it was useful to have lines, borders, and other characters drawn on screen to make the information look nicer. This caused the ASCII character set to grow organically, adding 1 more bit to itself and 128 more characters (all "special" characters for drawing stuff with).

Now, extended ASCII needs 8 bits to store each character, which is 1 byte on most architectures. If each character is 8 bits or 1 byte long, then for the computer programmer, a "string" of such characters is n bytes long, with each byte in the string being a character encoded in ASCII.

The evolution of strings is like the evolution of our species. They started out simple, but as the need for more complex forms arose, they evolved and adapted to their environment. They went from being purely mechanical to being driven by electricity and eventually evolved into the ASCII standard of character encoding we know today.

Introducing the C String

If you're a programmer, you may have heard of the C string - the grandfather of all string implementations. But what exactly is it? In its simplest form, a C string is an array of characters encoded in ASCII, and it's terminated by a Null character. Think of it like a train, with each character in the array being a carriage, and the null terminator acting as the caboose.

Here's an example: 

[cpp]char discworld [16] = “Discworld”;[/cpp]

 In this line of code, we define an array of type char with a length of 16. However, it can hold only 15 characters at most, because the last character is (meant to be) reserved for the null terminator (‘\0’). This is a crucial aspect of C strings, as the null terminator indicates where the string ends. It's like a period at the end of a sentence - without it, the string would keep going on and on.

In a C string, the length of the char array that holds the characters must be at least one character larger than the string contained therein. However, it could be much larger if required - it's like a parking lot with only one car parked in it. The parking lot could hold many more cars, but it's currently only occupied by one.

Let's take a closer look at some examples!

Printing a string to the console:

#include int main() { 
 char myString[10] = "Hello"; 
 
 printf("%s\n", myString); 
 
 return 0; 
}
 

Concatenating two strings:

 
#include 
int main() { 
 char myString[10] = "Hello";
 int length = strlen(myString); 
 
 for (int i = 0; i < length / 2; i++) { 
     char temp = myString[i]; 
     myString[i] = myString[length - i - 1]; 
     myString[length - i - 1] = temp; 
printf("%s\n", myString);
return 0; }

 Reversing a string:

#include int main() { 
char myString[10] = "Hello"; 
int length = strlen(myString); 
 
for (int i = 0; i < length / 2; i++) { 
     char temp = myString[i];
     myString[i] = myString[length - i - 1]; 
     myString[length - i - 1] = temp;
printf("%s\n", myString);
return 0; 
}

Comparing two strings:

}#include <stdio.h>
#include <string.h>

int main() {
   char myString[10] = "Hello";
   int length = strlen(myString);
   for (int i = 0; i < length / 2; i++) {
      char temp = myString[i];
      myString[i] = myString[length - i - 1];
      myString[length - i - 1] = temp;
   }
   printf("%s\n", myString);
   return 0;
}
 

There you have it a very quick run down on strings. There will be a future post that is longer and boring but I'll save that one for later!

No comments:

Post a Comment

A Guide to Multi-Level Pointer Analysis

  A Comprehensive Guide to Multi-Level Pointer Analysis   A regular pointer points to only one address, but when it's accompanied by a l...