The **Universal Approximation Theorem** (due to Kurt Hornik and others) essentially states that you can approximate an arbitrary continuous function on a compact subset of with a neural net to arbitrary precision. (This is similar to the way single-variable polynomials are *universal approximators* of continuous functions on .) In fact, you can do this with a net with just a single hidden layer. One possible statement of the theorem is below.

**Theorem: **Suppose is non-constant, bounded, non-decreasing, and continuous, and that is closed and bounded (compact). Then for *any* continuous function and , there exists a 2-layer neural network of the form

for some real vectors and scalars , such that

for all .

Note that a nonlinear function such as is not even necessary at the output layer for this to work.

More important to note is that while this theorem guarantees there *exists* a network that approximates any function, it says nothing about how we train this network or obtain its parameters. Hence why we don’t just use the above network to solve all our problems.