Today I Learned

Some of the things I've learned every day since Oct 10, 2016

Category Archives: programming

170: Dangling Pointers

Dangling pointers are essentially pointers which are “left hanging” after the memory location they point to has become invalid. More specifically, a dangling pointer is a non-NULL pointer which points to memory which has become deallocated or otherwise has become invalid memory for the pointer to be pointing to.

For instance, an example in C (a language prone to this problem) taken from Wikipedia:

{
   char *dp = NULL;
   /* ... */
   {
       char c;
       dp = &c;
   } 
     /* c falls out of scope */
     /* dp is now a dangling pointer */
}

If a dangling pointer is attempted to be dereferenced, unpredictable (incorrect/garbage) behavior can result.

Advertisements

163: Iron Law of Performance

In the study of computation, the Iron Law of Performance refers to the equation

\frac{\text{time}}{\text{program}} = \frac{\text{time}}{\text{cycle}} \cdot \frac{\text{cycles}}{\text{instruction}} \cdot \frac{\text{instructions}}{\text{program}},

which is just a way of roughly compartmentalizing the factors involved in determining how long it takes a program to execute.

While the first 2 terms depend on the instruction set architecture (ISA), microarchitecture, and base hardware technology involved, the 3rd depends mostly on things like the source code, the language the source code is written in, and the way the code is compiled.

129: UUIDs

Universally Unique Identifier (UUID) is a 128-bit number used to uniquely identify things, whatever those things might be, in computer systems. The way they are represented in text varies, but they typically look something like

123e4567-e89b-12d3-a456-426655440000

or

00112233-4455-6677-8899-aabbccddeeff,

with 32 hexadecimal digits being broken into 5 groups by hyphens. The usefulness of UUIDs comes from the sheer improbability of the same UUID being generated 2 different times.

To give an idea of the improbability: the number of Version 4 UUIDs you would have to generate to have at least a 50% probability of a collision would be around 2.7 quintillion. If a computer generated 1 billion UUIDs every second, it would take 85 years just to compute that many, and even then the amount of physical storage required would be around 45 exabytes, much larger than the largest databases currently in existence.

This is to say that for now, UUIDs are a very good way of uniquely identifying things with almost impossible collisions.

127: Linters

Linters in the modern sense generally refer to programs which check code for potential errors (or even just style errors) in a static context (without running the code). Potential such errors might include syntactic errors, “variables being used before being set, division by zero, conditions that are constant, and calculations whose result is likely to be outside the range of values representable in the type used.” [Wikipedia]

The name comes from a program called Lint that performed this function for C code, but has since become a more general term.

Linting can be helpful for debugging interpreted languages like Python, since they don’t have a compilation phase where things like syntactic errors would be caught.

105: Loitering Objects (Java)

In Java (and possibly other languages, not sure), loitering objects refers to objects in memory which are no longer desired or will no longer be used in the future but still have pointers pointing to them, preventing garbage collection from freeing their place in memory. It’s generally good form to not waste memory in this fashion, but in extreme cases loitering objects can even lead to unwanted program termination due to a \texttt{java.lang.OutOfMemoryError}.

102: Kernel Space vs. User Space

Kernel space and user space are separate regions of virtual memory distinguished from each other in an operating system for purposes of security, stability, and centralizing control. Typically, user space is the space which user applications have direct access to and can work with, while kernel space is a privileged space which only especially stable and trustworthy programs, such as the kernel, can access. This barrier essentially insulates possibly insecure or unstable applications from power, preventing them from doing things like crashing the system or messing with other applications’ memory.

100: Primitive Types vs. Reference Types (Java)

In the Java programming language, values either have a primitive type or a reference type. The primitive types are:

  • boolean
  • byte
  • short
  • char
  • int
  • long
  • float
  • double

All other types not in this list are reference types.

A value with a primitive type stores the information determining the value itself, such as the binary code for a given integer or character. By contrast, a value with a reference type stores the address of the location in memory where this information can be found. For instance, after declaring and instantiating a variable

\texttt{String s = "string";}

the variable \texttt{s} doesn’t store the code for the String itself, but rather a ‘pointer’ to where this code can be found.

This has important consequences for how primitive values are assigned to variables or passed as parameters, as opposed to reference values. A primitive value passed or assigned will have the bits describing the value itself copied to the target, whereas a reference value will merely have the address of these bits copied.

87: Fuzzing (Software Testing)

In software testing, fuzzing is testing a program by running it on randomly-generated input. A fuzzer can generate input by mutating given examples of input, or by simply generating instances of input based on a model.

On the plus side, fuzzing can often find problems that the testers wouldn’t think to check for. At the same time, though, fuzzing often only finds very simple faults (e.g., input which produces ‘completely wrong’ behavior, but possibly not ‘deeper’ program faults). As always, fuzzing is not a substitute for other testing methods and should be viewed as a complement to them.

71: Ropes (Data Structure)

rope is a data structure which stores long strings as a type of binary tree rather than a ‘monolithic’ list of characters. The nodes of the tree contain weights, and the leaves of the tree contain small substrings of the larger string.

A rope is advantageous over the latter structure with respect to speed of destructive concatenation, insertion, deletion, as well as extra memory required during operations, but is disadvantageous in speed of splitting, appending, and extra memory required while the structure is not being operated on.

70: Corecursion

Corecursion, a sort of dual procedure to recursion, starts with a base case and builds upon it iteratively, as opposed to recursion which breaks down towards a base case.

A good example of corecursion is the evaluation of a stream, which starts from a base case (the first item in the stream as well as a method for computing the rest of the stream), and uses it to iteratively produce the rest of the infinite data structure.