Most errors in the source code live up to a next compilation or up to the next unit test. However, did you know that sometimes the price of one undetected error may rack up to millions of US dollars? In this article, we will discuss such errors in relation to C programming language, their impact in the past and, most importantly, how to avoid them in the future.

Nowadays, many human activities are automated. Computerization offers many advantages; one of those is that humans make errors too often, and programs do not. However, despite the best efforts, the possibility for errors still exist. Moreover, with all the positives, there are also negatives. Sometimes, humans make errors, yes, and computer programs make errors, too. Worst of all, the same program under the same conditions would repeat the same error!

It is just a mere annoyance when such error makes the bird fly towards the incorrect direction in the game for your smartphone, but things change drastically in fly-by-wire systems in commercial jets or controllers in nuclear power plants.

In this article, we would not be talking about trivial errors which always lead to the incorrect result and which has to be fixed anyway. Of course, this also covers syntax errors, i.e. when the program does not compile or run at all. Today we are going to speak about potentially dangerous places and code snippets.

Low-level programming errors

Perhaps the most well-known error in low-level programming is a buffer overflow. These errors are most probably exclusive to the unmanaged code, and languages such as C# are not affected, at least in non-unsafe sections of code. Generally, this error occurs when you are trying to fit something big into something small.

Let us start with the example straight away. Imagine that we are writing a simple authorization system in C. We will use GCC 5.4.0 under Linux in 64-bit mode.

#include <string.h>
int main() {
char input[16], password[16];
strcpy(password, "SecretPassword");
// Let us suppose that this is the user input of some length
strcpy(input, "IHaveNoIdea");
printf("%s\n", (strncmp(input, password, 16) == 0) ? "Password accepted!" : "Access denied!");
return 0;
}

The idea of the code should be trivial. We have the “SecretPassword” stored in the program itself, which is an extremely bad idea but is good for demonstration purposes. In real code, we would probably get it from the database as “salted hash”. Next, we retrieve input from the user. Finally, we compare the input with the real password itself via a strncmp function which is also not the best idea: it may open a possibility for a so-called side-channel attack.

Anyway, after the code is executed, there is a message “Access denied!” until we modify a user input string to make it SecretPassword: this string actually passes the test. What happens if we try to input the long string “IHaveNoIdeaaaaaaIHaveNoIdeaaaaaa”? As you might have guessed, we will see “Password accepted!” on the screen, while this string does not look like “SecretPassword” at all! What happened?

To investigate the issue, we will use the power of gdb, GNU debugging tool. First, let us examine the normal chain of events. After creating a breakpoint before to print, we will ask to see the first 32 bytes on top of the stack via x/32xc $sp command.

0x7fffffffe110: 73 ‘I’ 72 ‘H’ 97 ‘a’ 118 ‘v’ 101 ‘e’ 78 ‘N’ 111 ‘o’ 73 ‘I’

0x7fffffffe118: 100 ‘d’ 101 ‘e’ 97 ‘a’ 0 ‘\000’ 0 ‘\000’ 0 ‘\000’ 0 ‘\000’ 0 ‘\000’

0x7fffffffe120: 83 ‘S’ 101 ‘e’ 99 ‘c’ 114 ‘r’ 101 ‘e’ 116 ‘t’ 80 ‘P’ 97 ‘a’

0x7fffffffe128: 115 ‘s’ 115 ‘s’ 119 ‘w’ 111 ‘o’ 114 ‘r’ 100 ‘d’ 0 ‘\000’ 0 ‘\000’

This should be self-explanatory: we have the contents of both our variables; empty line was added just for visual simplicity. What about our malicious usage?

0x7fffffffe110: 73 ‘I’ 72 ‘H’ 97 ‘a’ 118 ‘v’ 101 ‘e’ 78 ‘N’ 111 ‘o’ 73 ‘I’

0x7fffffffe118: 100 ‘d’ 101 ‘e’ 97 ‘a’ 97 ‘a’ 97 ‘a’ 97 ‘a’ 97 ‘a’ 97 ‘a’

0x7fffffffe120: 73 ‘I’ 72 ‘H’ 97 ‘a’ 118 ‘v’ 101 ‘e’ 78 ‘N’ 111 ‘o’ 73 ‘I’

0x7fffffffe128: 100 ‘d’ 101 ‘e’ 97 ‘a’ 97 ‘a’ 97 ‘a’ 97 ‘a’ 97 ‘a’ 97 ‘a’

As you can see, large string was too large for the small buffer, so symbols were added later. When the input and the password are identical, there is no reason to deny the authorization: everything is correct. The only thing is that “password” entered is nowhere near “SecretPassword”!

If you may think that this error can now be  considered purely theoretical, I am afraid I have to remind you about one of the bugs found relatively recently, in 2014, in the heartbeat extension of OpenSSL. In normal usage, heartbeat protocol works this way: client automatically forms some random word with its length X and sends this data to the server. The server allocates the buffer for the actual word length and then sends X bytes back to the client. After getting the word WORD and number 4, the memory may schematically look like this:

xxxx Received:  4 WORD xxxx xxxx SomeUserName xxxx SomeUserPassword xxxx

BOLD is the string returned to the client. It should match the original word, and it clearly is, which means that our connection is healthy. However, in malicious usage, the user may ask for more bytes, say 49. Then, the server memory may look like this:

xxxx Received: 49 WORD xxxx xxxx SomeUserName xxxx SomeUserPassword xxxx

Using this exploit, in addition to the word WORD the user can retrieve a large chunk of memory from the server (up to around 64K), which can contain sensitive data. Who said that troubles arise only when we try to push something large into something small? Pulling something large from something small may also be dangerous. Despite the trivial fix of just one simple check, the total amount of damage due to this little bug was insane; the estimate is to be around 500 million US dollars.

Integer arithmetic

In the real world, with each number, you can always tell the next one by adding one to it. For one, it is two. For 37, it is 38. For 83534, it is 83535. It is not the case for the programming world: integer numbers here are stored in the finite amount of bits. For 16-bit number, next number in after 32767 may be 32768 or -32768. Nowadays, 32-bit numbers and 64-bit numbers are common with far greater ranges: around two billion in both directions for 32 bits and around nine quintillions in both directions for 64 bits. Is it enough? Usually, it is. More generally, it depends on what you want to store. Sometimes, values still tend to overflow, especially in different conversions from floating-point values. Sometimes, they do not. It all depends on the usage itself. When you create a new variable that may host a large interval of values, always think if this would be enough for every possible situation and maybe for a couple of impossible ones.

Have you heard about the problem of the “Year 2038”? It still seems so distant from now. However, they say that time-related problem have already been struck, resulting in the loss of contact with the Deep Impact spacecraft. According to the speculation, spacecraft tracked time from January 1, 2000, in one-tenth of a second. On August 11, 2013, a 32-bit variable has overflown, resetting time to January 1, 2000, which lead to continuous rebooting of spacecraft computers, and the connection was lost forever. Fortunately, Deep Impact has already done its primary mission at this moment. Nevertheless, the exact reason behind this malfunction is still unknown.

The Integer overflow has also contributed to the infamous Therac-25 tragedy.

Floating-point arithmetic and precision losses

The floating-point arithmetic was invented to greatly increase relative precision over the large interval of possible values. In contrast with fixed-point arithmetic, floating-point numbers get more precise near the zero and lose precision as their absolute value increases. This is an extremely useful concept in measuring real-life values, and it can easily accommodate both, the smallest and the largest values, such as the distance between atoms in the molecule and the distance between galaxies with the same relative precision. However, as with integers, their range is finite too. They are also prone to the precision loss over various operations. In fact, floating-point values can sometimes lose precision extremely quickly! Here is an example.

int main() {
float res = 1.0f;
printf("%10.8f\n", res);
res = (res - 0.999f) * 1000.0f;
printf("%10.8f\n", res);
return 0; 
}

This little code snippet is supposed to print the number 1 two times. The first is obvious, the second is calculated as (1 – 0.999) * 1000, i.e. 0.001 * 1000, which is 1. However, the program output is as follows:

1.00000000

0.99998713

As we can see, even two basic mathematical operations can result in a quick loss of precision. Is this loss acceptable or not? It is up to you to decide. I would have to say that once such loss of precision in floating-point arithmetic resulted in a loss of life of 28 American soldiers in Saudi Arabia on February 25, 1991. The Patriot missile defense system was unable to intercept Scud rocket. The problem was in the integer number representing time: the time was tracked in one-tenth of a second, initialized with zero on the system startup. Patriot systems were designed to intercept Soviet missiles and aircrafts were supposed to operate for only a few hours to avoid detection. However, the system in question was operating for more than one hundred hours without rebooting. After the incoming missile detection, the system tried to track it. The conversion of this integer variable into 24-bit float resulted in an error comparable to one-third of a second. This delay made it impossible to track the rocket in the next point. Then, the system decided that it was a false alarm, but it clearly was not, and the missile hit its target.

To make matters worse, the software update has reached this system on the next day after the accident.

Unauthorized code execution

We are going to start this one with an example right off the bat. Do you think, what could go possibly wrong in this little C program?

int main() {
return system("ls");
}

It just executes “ls” program and exits. Of course, Windows equivalent would be “dir”. Take note that program “ls” is executed with the same privileges as the main program itself, and if the main program were run by a user/administrator, then “ls” would be executed with maximum permissions too.

Is there any way to affect this program? Yes, there is. Let us run this program this way:

PATH="" ./system.out

This way, we will overwrite the environmental variable $PATH with an empty string for this run. The output of the program now is:

sh: ls: No such file or directory

Our program tried to find and execute “ls” program, but there is no “ls” in any directories in $PATH. In fact, with nowhere else to search, our program tried to find the “ls” in the very same directory it resides, and there is no “ls” either. It is time to fix it! Let us create a simple text file in this directory with the name “ls” and the contents of

echo “HACKED”

Now, next run with altered $PATH will output the string HACKED to the output. Of course, we can write any commands we want in this bogus file: they would be executed with the same level of permissions as the main program.

Code injection is one of the most dangerous techniques. It can be used either to inject malicious code, such as viruses or to retrieve and manipulate the information. One of the recent security breaches, reported in September 2017 by Equifax Inc., has an estimated impact of around 143 million U.S. customers. It is most probably related to the nine-years-old code injection vulnerability in Apache Struts.

Overall, you should take extreme caution when you are communicating with the user, especially if this part is close to the “bridge” to any other programming language, as a prime example – SQL with its injections. The basic guideline of how to avoid the possibility of unauthorized code is to remember some things: binary file may be compromised, the program’s environment may be altered, and every user is captured by an enemy and will try to disrupt your system. Always check user input carefully, no matter how irrelevant it seems to be.

If you are asking the user to provide a path in a subdirectory, remember that he or she may enter “../../../../etc/shadow” or even create a symbolic link in a current directory pointing to the same “/etc/shadow” file.

If you are printing some data from user input using printf/fprintf/sprintf, always check that it looks like printf(“%s”, buffer) and not printf(buffer) – the latter can be exploited by inserting control sequences into the string.

If you are passing some data with user input to some sort of interpreter, always double-check it! You should be sure that the user’s name is not “Robert’); DROP TABLE students;–“, and if it actually is, replace ‘ with \’. There are a lot of manuals and articles dedicated to SQL injections, so we would not go into details here, but we highly recommend reading about this topic if you are working with SQL from C, C++ or virtually any other programming language.

Conclusion

To be honest, this topic is extremely large and complicated. It is extremely hard, if not impossible, to cover everything in one article. Stay tuned for the second part! We will discuss more dangerous techniques, as well as general guidelines of how to write a safer code.