Today I Learned

Some of the things I've learned every day since Oct 10, 2016

Category Archives: programming

184: TreeMap vs. HashMap vs. LinkedHashMap (Java)

In Java, TreeMap, HashMap, and LinkedHashMap are all implementations of the Map interface, and are all just different ways of mapping keys to values. Importantly, though, they differ in their lookup and insertion times, as well as ordering they maintain.

  • HashMap is implemented as an array of linked lists, and is the implementation of a hash table most people are probably most familiar with. It offers O(1)  lookup and insertion, but doesn’t maintain any ordering of its keys.
  • TreeMap is implemented as a red-black tree, and so offers O(\text{log} N ) lookup and insertion. However, its keys are ordered according to its keys’ implementation of the \texttt{Comparable} interface.
  • LinkedHashMap is implemented as a linked list and like HashMap offers O(1) time, but in addition maintains ordering of its keys according to the order in which they were added to the map.

When to use which? If you need to retrieve keys ordered by their implementation of \texttt{Comparable} (perhaps when the keys are Integers or Strings), TreeMap can do that. If you need to retrieve keys ordered by their insertion time (as in a caching system), LinkedHashMap can do that. If neither of these orderings is needed, HashMap is the best go-to, being typically faster with less overhead than the other two options.

Advertisements

182: final (Java)

In Java, the \texttt{final} keyword has several different uses depending on whether it’s used on a variable, method, or class. In all uses, it’s generally used to prevent something from being modified.

A variable declared \texttt{final} can’t have its value modified after declaration. When the variable is primitive, this is straightforward; \texttt{final int x = 5} will always have the value 5. If the variable is a reference type, however, it being final just means it will always point to the same object. The object itself, if mutable like a list, can still change.

A method declared \texttt{final} can’t be overridden.

A class declared \texttt{final} can’t be extended.

181: Vectors (Java)

The Vector class in Java is essentially a legacy version of ArrayList (ArrayList was actually the Java 1.2 Collections replacement for it) which by default synchronized each individual operation. While this makes the structure automatically thread-safe (unlike ArrayList), the better design is to lock sequences of operations rather than individual operations. The result is that in addition to not providing any control over synchronization, the Vector class performs considerably worse than alternatives, such as using Collections.synchronizedList to decorate a non-thread-safe collection such as ArrayList for concurrency. Thus it’s generally recommended that use of the Vectors class be avoided.

170: Dangling Pointers

Dangling pointers are essentially pointers which are “left hanging” after the memory location they point to has become invalid. More specifically, a dangling pointer is a non-NULL pointer which points to memory which has become deallocated or otherwise has become invalid memory for the pointer to be pointing to.

For instance, an example in C (a language prone to this problem) taken from Wikipedia:

{
   char *dp = NULL;
   /* ... */
   {
       char c;
       dp = &c;
   } 
     /* c falls out of scope */
     /* dp is now a dangling pointer */
}

If a dangling pointer is attempted to be dereferenced, unpredictable (incorrect/garbage) behavior can result.

163: Iron Law of Performance

In the study of computation, the Iron Law of Performance refers to the equation

\frac{\text{time}}{\text{program}} = \frac{\text{time}}{\text{cycle}} \cdot \frac{\text{cycles}}{\text{instruction}} \cdot \frac{\text{instructions}}{\text{program}},

which is just a way of roughly compartmentalizing the factors involved in determining how long it takes a program to execute.

While the first 2 terms depend on the instruction set architecture (ISA), microarchitecture, and base hardware technology involved, the 3rd depends mostly on things like the source code, the language the source code is written in, and the way the code is compiled.

129: UUIDs

Universally Unique Identifier (UUID) is a 128-bit number used to uniquely identify things, whatever those things might be, in computer systems. The way they are represented in text varies, but they typically look something like

123e4567-e89b-12d3-a456-426655440000

or

00112233-4455-6677-8899-aabbccddeeff,

with 32 hexadecimal digits being broken into 5 groups by hyphens. The usefulness of UUIDs comes from the sheer improbability of the same UUID being generated 2 different times.

To give an idea of the improbability: the number of Version 4 UUIDs you would have to generate to have at least a 50% probability of a collision would be around 2.7 quintillion. If a computer generated 1 billion UUIDs every second, it would take 85 years just to compute that many, and even then the amount of physical storage required would be around 45 exabytes, much larger than the largest databases currently in existence.

This is to say that for now, UUIDs are a very good way of uniquely identifying things with almost impossible collisions.

127: Linters

Linters in the modern sense generally refer to programs which check code for potential errors (or even just style errors) in a static context (without running the code). Potential such errors might include syntactic errors, “variables being used before being set, division by zero, conditions that are constant, and calculations whose result is likely to be outside the range of values representable in the type used.” [Wikipedia]

The name comes from a program called Lint that performed this function for C code, but has since become a more general term.

Linting can be helpful for debugging interpreted languages like Python, since they don’t have a compilation phase where things like syntactic errors would be caught.

105: Loitering Objects (Java)

In Java (and possibly other languages, not sure), loitering objects refers to objects in memory which are no longer desired or will no longer be used in the future but still have pointers pointing to them, preventing garbage collection from freeing their place in memory. It’s generally good form to not waste memory in this fashion, but in extreme cases loitering objects can even lead to unwanted program termination due to a \texttt{java.lang.OutOfMemoryError}.

102: Kernel Space vs. User Space

Kernel space and user space are separate regions of virtual memory distinguished from each other in an operating system for purposes of security, stability, and centralizing control. Typically, user space is the space which user applications have direct access to and can work with, while kernel space is a privileged space which only especially stable and trustworthy programs, such as the kernel, can access. This barrier essentially insulates possibly insecure or unstable applications from power, preventing them from doing things like crashing the system or messing with other applications’ memory.

100: Primitive Types vs. Reference Types (Java)

In the Java programming language, values either have a primitive type or a reference type. The primitive types are:

  • boolean
  • byte
  • short
  • char
  • int
  • long
  • float
  • double

All other types not in this list are reference types.

A value with a primitive type stores the information determining the value itself, such as the binary code for a given integer or character. By contrast, a value with a reference type stores the address of the location in memory where this information can be found. For instance, after declaring and instantiating a variable

\texttt{String s = "string";}

the variable \texttt{s} doesn’t store the code for the String itself, but rather a ‘pointer’ to where this code can be found.

This has important consequences for how primitive values are assigned to variables or passed as parameters, as opposed to reference values. A primitive value passed or assigned will have the bits describing the value itself copied to the target, whereas a reference value will merely have the address of these bits copied.