Noise Foundry

Noise processes interest scientists and engineers in physics because noise hinders technological advancement. A long-standing problem in research is modeling noise processes and evolving techniques to analyze and design systems with robust noise handling.

The statistical characteristic of a random data sequence depends on the generating procedure (a linearly-dependent random process) and its input sequence. In a sequential machine (like a digital computer), a random sequence of symbols (noise) generated by a linearly-dependent process with a Markovian or multinomial input does not yield another Markovian or multinomial noise.

An interesting machine learning problem is to learn about a system by probing with noise alone. The idea is to learn criticalities of a function using noise as an input signal. A straightforward application is image compression. Consider a neural network visual representation trained to detect image features. This network can have its descriptors in a boolean network (binary image encoding). How best can we encode such a network (or an image) to approach the theoretical minimum?

The goal is to learn and encode the statistical properties of an unknown noise process (in its visual representation) to help reconstitute it without loss of information. This approach differs from popular engineered encoders like PNG, JPEG2000, and SReC in that we encode the pixel occupancy function, factored on the learned statistical properties. It's lossless due to its entropy-based encoding.

We find the smallest minimal boolean circuit (only AND gates) that outputs the rows [as control signals] based on columns [as states], given an unknown process that generates a binary image. A solution exists by assigning a literal to every row. To make this the smallest circuit, we get to the CNF representation using recursive reduction.

We form a lattice of the binary image as an n-cube, \( 3^{n}-1 \) subcubes and group them by the number of 1s. Then, we have a standard bootstrapping problem that can be solved without prior information. The learned function/model cascade only assumes a stream of bits denoting the control signal (i.e rows as outputs) and a causal relation from control to states (represented by computed CNFs).

The situation is similar to formulating a precise invariance-based learning model and discovering strong results in control theory. In addition to a worthy signal reconstitution problem, it also studies aspects of neural processing. A probabilistic encoder estimating the occupancy of a matrix based on conditional probabilities of the row and column sums can further de-duplicate the data.