Today I Learned

Some of the things I've learned every day since Oct 10, 2016

Category Archives: engineering

180: Software Aging and Rejuvenation

Software aging is the degradation of a software system’s ability to perform correctly after running continuously for long periods of time. Common causes include memory bloating, memory leaks, and data corruption.

To prevent undesirable effects of software aging, software rejuvenation can be done proactively. This can take many forms, the most well-known of which is a simple system reboot. Other forms include flushing OS kernel tables, garbage collection, or Apache’s method of killing and then recreating processes after serving a certain number of requests.


168: Memory Leaks

In computing, a memory leak occurs when a program fails to release memory which is no longer required. For instance, a C program where the programmer dynamically allocates memory for a task but forgets to de-allocate it.

In certain circumstances, memory leakage may not cause big problems or might even be asymptomatic, like in programs with a little memory leakage which never actually run for long enough periods of time that memory usage becomes a problem. However, in worse cases memory leakage can lead to the complete failure of a program.

A couple types of programs at high risk for memory leakage are those in embedded systems (which can run continuously for years at a time, giving even small leaks the potential for being problematic) and those where memory is allocated extremely frequently for one-time tasks (such as rendering in a video or game).

While many languages like Java have built-in garbage collection which automatically cleans up unused or unaccessible memory, other languages like C and C++ require the programmer to be on top of making sure these leaks don’t happen in the first place. However, there are memory management libraries and memory debugging tools written to help with these languages. It should also be noted that automatic garbage collection comes with a performance overhead, so whether it’s desirable varies depending on the situation.

164: Instruction Pipelining

Instruction pipelining is a technique for increasing the throughput of a computer processor by executing instructions in parallel rather than sequentially.

Suppose you have several tasks to perform which each consist of stages s_1, \dots, s_n and each of these stages uses different resources than the others. One approach to completing a series of such tasks would be to complete one entirely before moving on to the next one, but since each stage of a task uses different resources, this approach leaves most resources sitting idle for unnecessary periods of time. To take advantage of this idleness, you can instead wait until stage 1 of task 1 is done, move it on to stage 2, and then while that is going just start task 2 on stage 1 using the stage 1 resources which are now available, continuing indefinitely in this manner.

A picture makes this idea clearer:


One sees that ideally, pipelining would increase the throughput of a processor by a factor of n, where n is the number of stages to an instruction. However, actually implementing pipelining is much more complicated than this abstraction, and the complications involved actually increase the latency of each individual instruction. Despite this increase in latency, if well-implemented pipelining still increases throughput by a significant factor. In fact, being clever about pipelining is usually the single most important factor in determining the CPI (cycles per instruction) of a processor.

163: Iron Law of Performance

In the study of computation, the Iron Law of Performance refers to the equation

\frac{\text{time}}{\text{program}} = \frac{\text{time}}{\text{cycle}} \cdot \frac{\text{cycles}}{\text{instruction}} \cdot \frac{\text{instructions}}{\text{program}},

which is just a way of roughly compartmentalizing the factors involved in determining how long it takes a program to execute.

While the first 2 terms depend on the instruction set architecture (ISA), microarchitecture, and base hardware technology involved, the 3rd depends mostly on things like the source code, the language the source code is written in, and the way the code is compiled.

160: Soak Testing (Software)

In software development, soak testing is a form of load testing where a system is exposed to some ‘normal’ (anticipated) load for a relatively long period of time to expose issues that would develop under these circumstances. For instance, certain problems like memory leaks in a system under normal usage might take a long time to have noticeable effects. The period of time required or desired varies depending on specifics, but some companies have been known to perform soak tests up to several months long. (Contrasting with the rapid deployment cycles of many systems.)

Ideally, one would be able to vary the load on the system in order to better simulate the fluctuations that it would encounter in deployment, instead of just using a flat average expected load. If this isn’t possible, one might instead set the flat load to one that’s higher than expected in order to get a better simulation. (“Passed tests give you no information.”)

158: Stress Testing (Software Development)

In software testing, stress testing is generally testing software by subjecting it to a ‘heavier load’ than it would ordinarily be expected to encounter during ‘normal’ usage. It differs from load testing in that its focus is less on controlled variations in load and more on chaotic, unpredictable causes of stress to a system.

This can be advantageous because it’s not always possible to forsee all the different circumstances in which the software may be used in the future. For instance:

  • Operating systems and middleware will eventually have to support software which hasn’t even been written yet.
  • Users may run the software on systems with less computational resources than those on which the software was tested in development.

In some cases, stress testing over a short period of time can also serve as a good simulation of ‘normal’ use over a long period of time, potentially exposing bugs with effects that take time to become noticeable. Additionally, stress testing is good for finding concurrency problems like race conditions and deadlocks.

146: Multiplexers

In circuit design, a multiplexer is a way of taking multiple inputs X_i and selecting which of those inputs to give as output based on the value of additional inputs S_j, called the select lines.

The simplest example which illustrates the idea is a multiplexer which selects between 2 binary inputs A and B based on the value of an additional binary input S. We could arbitrarily choose to give the value of A as the output if S = 0 and that of B as the output if S = 1.

The simplest boolean expression for the output Z which satisfies this requirement is

Z = A \cdot \bar{S} + B \cdot S.

It’s easy to see upon inspection that Z = A if S = 0 and Z = B otherwise, as required. (It should be noted, however, that building the multiplexer circuit as such, with just 2 AND gates, 1 OR gate, and 1 NOT gate would make the circuit susceptible to race conditions due to the physical realities of transistors.)

More generally, to make a multiplexer which selects output from n different inputs X_1, \dots, X_n, one needs to supply \lceil \log _2 n \rceil selector lines. The boolean expressions describing these multiplexers will get more complicated, of course, but the general idea is the same.

59: Bathtub Curve

In reliability engineering, a bathtub curve is a particular form (shape) which the hazard function of a system can take. Its namesake is the cross-section of the typical bathtub: relatively steep slopes on either side with a relatively flat section in between. This corresponds to:

  1. a decreasing failure rate at the beginning of the expected life of the system (due to ‘infant mortality’ or ‘early failures’), followed by
  2. a relatively constant failure rate in the middle of the expected life of the system, followed by
  3. an increasing failure rate towards the end of the expected life of the system (due to ‘wear-out’ failures).