# Today I Learned

Some of the things I've learned every day since Oct 10, 2016

## 216: Universal Approximation Theorem (Neural Nets)

The Universal Approximation Theorem (due to Kurt Hornik and others) essentially states that you can approximate an arbitrary continuous function on a compact subset of $\mathbb{R}^n$ with a neural net to arbitrary precision. (This is similar to the way single-variable polynomials are universal approximators of continuous functions on $\mathbb{R}$.) In fact, you can do this with a net with just a single hidden layer. One possible statement of the theorem is below.

Theorem: Suppose $\sigma : \mathbb{R} \rightarrow \mathbb{R}$ is non-constant, bounded, non-decreasing, and continuous, and that $S \subset \mathbb{R}^d$ is closed and bounded (compact). Then for any continuous function $f : S \rightarrow \mathbb{R}$ and $\epsilon > 0$, there exists a 2-layer neural network of the form

$h(\mathbf{x}) = \sum _{j=1} ^k c_j \sigma(\mathbf{a}_j^T \mathbf{x} + b_j)$

for some real vectors $\mathbf{a}_j$ and scalars $b_j, c_j$, such that

$|h(\mathbf{x}) - f(\mathbf{x})| < \epsilon$

for all $\mathbf{x} \in S$.

Note that a nonlinear function such as $\sigma$ is not even necessary at the output layer for this to work.

More important to note is that while this theorem guarantees there exists a network that approximates any function, it says nothing about how we train this network or obtain its parameters. Hence why we don’t just use the above network to solve all our problems.

An unrolled linked list is a data structure which is a sort of hybrid of an array and a linked list. Typically, a linked list node containing integer values would have attributes something like

int value

Node next.

The unrolled linked list simply replaces the single value of each node with an array of a fixed size, making its attributes look something like

Array values

int numberValues

Node next.

Below is a small unrolled linked list containing ordered integers.

Note that all node arrays have the same size, but not all (in fact none) of them are full. The way one decides to handle this depends on the application, but typically when one array fills up it can be split up into 2 new nodes. Point being that you can re-distribute elements between nodes however is convenient for the problem you’re solving.

One advantage of unrolled linked lists over regular linked lists is that they reduce the amount of memory overhead per element. (In a linked list, the overhead associated with the Node object typically far outweighs the amount of memory actually required to store the value itself.) Another is that in the case of ordered values, they make linear search faster than with a traditional linked list, up to a constant factor. The reason should hopefully be obvious.

## 214: Orthogonal Matrices

In linear algebra, an orthogonal matrix is a square matrix $A$ such that

$AT^T = A^T A = I$.

Equivalently, the rows and columns of $A$ are, respectively, orthonormal vectors: vectors which are all unit length and orthogonal to one another.

Also equivalently, $A^{-1} = A^T$.

A fun fact about orthogonal matrices is that their determinants are always $\pm 1$.

## 213: B-trees

A B-tree is a type of balanced search tree which is well-suited and commonly used for storing large amounts of data, such as in a database system. It is described using a positive constant $m \in \mathbb{Z}$, which defines the maximum number of children a single node can have. (Take BSTs as a specific example with $m = 2$.) Using $m$ the tree is defined by the following properties:

• Every leaf node is at the same depth in the tree
• Where a node has $b$ children, it has $b - 1$ values. The values are used as ‘bookmarks’ sandwiched in between the children in a manner generalized from BSTs: the values are sorted, and a child in between 2 values contains all values between them in the tree.
• Every node except possibly the root is at least half-full, having between $\frac{m}{2}$ and $m$ values.
• Each leaf node obviously has no children, but can still hold multiple values up to some arbitrary maximum (usually defined by disk block size or some other sensible constraint, depending on the setting).

As an example, below is an example of a B-tree with $m = 5$ and where leaves have enough room for 3 values each:

Upon inspection one can see that the sorting invariant similar to that used in BSTs is satisfied, e.g. all leaf values in between 15 and 20 are found by going to the child to the left of ’20’ in the root node, then to the child to the right of ’15’.

For obvious reasons, B-trees result in log-time operations. Lookup is straightforward, although insertions and deletions are more involved and difficult to implement. However, B-trees offer advantages over other search trees such as being optimized for reading/writing large blocks of data without too much unnecessary disk access.

In Haskell, fromIntegral is a handy function for getting ints and floats (or other Num types) to play nice together. (Ints can automatically be converted to floats in some cases, such as evaluating the expression 3 + 0.0 which returns the float 3.0, but this is not always the case.)

The type declaration of fromIntegral is

fromIntegral :: (Num b, Integral a) => a -> b.

That is, as long as a is an Integral type (a number that acts like an integer), it returns it as the equivalent Num object, the type of which the compiler infers from context. This enables you to safely evaluate expressions like fromIntegral (length [0, 1, 2]) + 0.0, which again returns the float 3.0.

In Haskell, typeclasses are roughly the equivalent of interfaces in languages like Java.

Consider what happens when we inspect the type of the < operator with :t (<). We get

(<) :: Ord a => a -> a -> Bool.

The Ord a => bit coming before the type of the operator is called a class constraint, and just means that the type a which the less-than operator operates on must be in the Ordinal typeclass. That is, it must have some sort of interface for determining whether one element of the class is less than another.

A type can belong to many different typeclasses (similarly to interfaces in Java), so for other functions you might instead see multiple typeclass constraints for a function, e.g. (Ord a, Num a) => if the type a is required to be an ordinal number type.

Here are some common typeclasses in Haskell:

• Eq: types whose members can be tested against each other for equality
• Show: types whose members can be represented as strings
• Enum: types whose members can be enumerated (and thus can be used in list ranges)
• Num: types whose members can act like numbers

In Haskell, type variables are roughly the equivalent of generics in languages like Java. They enable a function to operate on and return different types, so long as the function doesn’t depend on any specific behavior of the types involved. (These are called polymorphic functions.)

For instance, if in GHCI we use :t head to examine the type of the head function which returns the first element of a list, we see it has the type head :: [a] -> a. a is a type variable, meaning it could be any type. This just means head takes a list with elements of any type a and returns a single element of that same type.

As another example, :t fst gives fst :: (a, b) -> a. Since pairs in Haskell aren’t homogenous, it’s possible that a pair’s first and second elements have different types a and b respectively (although they could be the same). fst just returns a single element of type a.

## 209: Intro To Haskell Types

In Haskell, everything’s an expression, and every expression has a type.

Haskell has a static type system, so it knows everything’s type at compile time. In GHCI, you can use :t expr to get the type of expr. For instance, the :t (True, 'a') gives (True, 'a') :: (Bool, Char) . (The notation ‘::’ is read as “has type”.)

Haskell also has type inference, meaning the Haskell compiler infers the type from context, unlike languages like Java where you have to explicitly state the type of everything. Despite this, you can still explicitly declare the type of functions, which is considered good practice. The type is typically declared on the line before the function definition, also using ‘::’ notation.

As an example, this line would go before a function removeNonUppercase which filters a string to include only uppercase characters:

removeNonUppercase :: [Char] -> [Char].

To clarify, [Char] -> [Char] is itself the type of the function: that of functions which map lists of Chars to lists of Chars.

As a less trivial example, the following could be the definition of a function which adds 3 numbers:

addThree :: Int -> Int -> Int -> Int

addThree x y z = x + y + z

The last Int refers to the return type and the first 3 are the input parameters. (The notation seems to imply currying, but I’ll have to get into the details of that another time.)

## 208: The Einstellung Effect

The Einstellung Effect is a cognitive phenomenon wherein an agent attempting to solve a problem fails to think outside the box, where the ‘box’ results from its previous experience solving problems.

More specifically, the agent working on a problem is predisposed to attempt solutions similar to those it’s had success with in the past — even though these may be sub-optimal or not viable solutions at all for the problem at hand — rather than go in with an “open mind” and experiment with other approaches. Previous experience is overly relied on or mistaken to be more useful than it is, so the area of the possible solution space the agent examines is narrow/specific to a fault.

Perhaps unsurprisingly, studies have implied the effect seems to increase with age in humans, which goes along with stereotypes of younger minds being more flexible and open to possibilities while older minds have a stronger tendency to “stay on the tracks”.

The Einstellung Effect is one of the obstacles to good problem-solving which can be effectively avoided in many cases by periodic reminders to the problem solver to simply take a step back and re-examine things.

• Unlike lists in Haskell, tuples are not homogenous structures and can contain elements of different types. That is, (5, "five") is a valid tuple.
• The type of a tuple depends on its number of elements. To illustrate this point, since lists are homogenous [(0, 1), (1, 0)] is a valid list, since all its elements are pairs (tuples with 2 elements), but [(0, 1), (1, 0, 0)] is not, since it contains both a pair and a triple.
• The type of a tuple also depends on the type of its elements. This means [(0, 1), ("one", 0)] is not a valid list either.
Two methods which are specific to 2-tuples are fst and snd, which respectively return the first and second elements of a pair.
Finally, a list method which involves tuples in its return type is zip. The expression zip a b, where a and b are lists, returns a list of 2-tuples containing the lists respective 1st elements, 2nd elements, and so on. For instance, zip [1, 2, 3] ["one", "two", "three"] returns the list [(1, "one"), (2, "two"), (3, "three")]. If the two lists aren’t the same length, the method behaves true to the lazy nature of Haskell and stops with the last element of the shorter list, meaning you can safely use zip when one of the lists is infinite.