Today I Learned

Some of the things I've learned every day since Oct 10, 2016

Category Archives: data structures

An unrolled linked list is a data structure which is a sort of hybrid of an array and a linked list. Typically, a linked list node containing integer values would have attributes something like

int value

Node next.

The unrolled linked list simply replaces the single value of each node with an array of a fixed size, making its attributes look something like

Array values

int numberValues

Node next.

Below is a small unrolled linked list containing ordered integers.

Note that all node arrays have the same size, but not all (in fact none) of them are full. The way one decides to handle this depends on the application, but typically when one array fills up it can be split up into 2 new nodes. Point being that you can re-distribute elements between nodes however is convenient for the problem you’re solving.

One advantage of unrolled linked lists over regular linked lists is that they reduce the amount of memory overhead per element. (In a linked list, the overhead associated with the Node object typically far outweighs the amount of memory actually required to store the value itself.) Another is that in the case of ordered values, they make linear search faster than with a traditional linked list, up to a constant factor. The reason should hopefully be obvious.

213: B-trees

A B-tree is a type of balanced search tree which is well-suited and commonly used for storing large amounts of data, such as in a database system. It is described using a positive constant $m \in \mathbb{Z}$, which defines the maximum number of children a single node can have. (Take BSTs as a specific example with $m = 2$.) Using $m$ the tree is defined by the following properties:

• Every leaf node is at the same depth in the tree
• Where a node has $b$ children, it has $b - 1$ values. The values are used as ‘bookmarks’ sandwiched in between the children in a manner generalized from BSTs: the values are sorted, and a child in between 2 values contains all values between them in the tree.
• Every node except possibly the root is at least half-full, having between $\frac{m}{2}$ and $m$ values.
• Each leaf node obviously has no children, but can still hold multiple values up to some arbitrary maximum (usually defined by disk block size or some other sensible constraint, depending on the setting).

As an example, below is an example of a B-tree with $m = 5$ and where leaves have enough room for 3 values each:

Upon inspection one can see that the sorting invariant similar to that used in BSTs is satisfied, e.g. all leaf values in between 15 and 20 are found by going to the child to the left of ’20’ in the root node, then to the child to the right of ’15’.

For obvious reasons, B-trees result in log-time operations. Lookup is straightforward, although insertions and deletions are more involved and difficult to implement. However, B-trees offer advantages over other search trees such as being optimized for reading/writing large blocks of data without too much unnecessary disk access.

194: Array Stride

In an array, the stride is the space between adjacent elements. More precisely it is the number of memory locations in between the beginning addresses of 2 adjacent items in the array. An array with stride exactly equal to the size of its elements is said to have unit stride.

Stride can’t be smaller than the size of the elements in the array for obvious reasons, but can be bigger. In fact, despite the immediate reduction in space efficiency, it’s often useful to make the stride larger than the element size, e.g. for more efficiency in multi-dimensional arrays.

184: TreeMap vs. HashMap vs. LinkedHashMap (Java)

In Java, TreeMap, HashMap, and LinkedHashMap are all implementations of the Map interface, and are all just different ways of mapping keys to values. Importantly, though, they differ in their lookup and insertion times, as well as ordering they maintain.

• HashMap is implemented as an array of linked lists, and is the implementation of a hash table most people are probably most familiar with. It offers $O(1)$  lookup and insertion, but doesn’t maintain any ordering of its keys.
• TreeMap is implemented as a red-black tree, and so offers $O(\text{log} N )$ lookup and insertion. However, its keys are ordered according to its keys’ implementation of the $\texttt{Comparable}$ interface.
• LinkedHashMap is implemented as a linked list and like HashMap offers $O(1)$ time, but in addition maintains ordering of its keys according to the order in which they were added to the map.

When to use which? If you need to retrieve keys ordered by their implementation of $\texttt{Comparable}$ (perhaps when the keys are Integers or Strings), TreeMap can do that. If you need to retrieve keys ordered by their insertion time (as in a caching system), LinkedHashMap can do that. If neither of these orderings is needed, HashMap is the best go-to, being typically faster with less overhead than the other two options.

157: Dijkstra’s and Negative Weights

Dijkstra’s algorithm, commonly used to find the path of minimum weight to any given node from a starting node in a weighted graph, does not work if negative weights are allowed.

Intuitively, the reason is that Dijkstra’s makes the assumption while working through the graph that paths can only become ‘heavier’, and so while visiting a node does not consider neighbors which themselves have already been visited. For instance, consider running the algorithm on the following graph with A as the starting node:

Image: https://stackoverflow.com/q/13159337

The algorithm will look at as a neighbor of A and set its minimum known distance to 2. It will then look at B and set its to 5. It will then move on to visit C without any changes since C has no neighbors. It will then visit B, but will not consider the edge from to C because C has already been visited. So the algorithm will wrongly conclude that the shortest path to C from A is 2, not -5.

One adaptation which does work on certain types of graphs with negative weights is Johnson’s algorithm, which itself uses Dijkstra’s as a subroutine.

149: Floyd’s Cycle-Finding Algorithm

Suppose you want to detect a cycle in a linked list. More specifically, you want to find the node $N$ where the cycle begins. One way to do this is Floyd’s cycle-finding algorithm, sometimes referred to as the “Tortoise and the Hare” algorithm.

Informally, the way it works is:

• Initialize two pointers $P, Q$, with $P$ = next(head), $Q$ = next(next(head)).
• While $P \neq Q$, increment $P$ by 1 node and $Q$ by 2. ($P$ is the “slow” pointer and $Q$ is the “fast” one. At each step of this loop the distance between the two will increment by 1.)
• Initialize counter $n = 0$, and reset $P$ to point to the head of the list.
• While $P \neq Q$, increment each of $P, Q$ by 1 node and $n$ by 1.
• Return node $N$ which is $n$ nodes after the head of the list.

The runtime of this algorithm is $O(n + \lambda)$, where $\lambda$ is the period of the cycle being found.

123: Skip Lists (Intro)

skip list is a data structure similar to a linked list which stores an ordered sequence of elements and is built specifically for fast search within the sequence. The way it’s commonly implemented is as a hierarchy of linked lists, with the first list containing all the elements of the skip list, and each successive linked list containing a sparser subset of the elements in the previous one.

To search for an element within the skip list one then starts with the sparsest linked list, getting as ‘close’ to the desired element as possible, then moves directly down to that same position in the next linked list and continues searching until the item is found (or not).

This animation depicts the insertion of the element 80 into a skip list of positive integers (insertion being equivalent to searching):

Credit: Artyom Kalinin

In the average case, the skip list uses $O(n)$ space and takes just $O(\log{} n)$ time for searching operations.

112: Persistent Data Structures

A data structure is said to be persistent if whenever it is modified or updated, it preserves the ‘old’ version of itself. This results in a kind of immutability. A good example of this is the simple singly-linked list, which whenever added onto contains the old version of itself in the new version.