# Learning around the Non Sequiturs

If Solomonoff Induction and its conceptual neighbors have not yet found application in enhancing human reasoning, there are definitely areas where they have potential value.  Automatic, unsupervised learning of sequential patterns is an intriguing area of application. It also fits closely with the sequence inferencing problem that is at the heart of algorithmic information theory.

Pragmatically, the problem of how children learn the interchangeability of words that is the basic operation of grammaticality is one area where this kind of system might be useful. Given a sequence of words or symbols, what sort of information is available for figuring out the grammatical groupings? Not much beyond memories of repetitions, often inferred implicitly.

Could we apply some variant of Solomonoff Induction at this point? Recall that we want to find the most compact explanation for the observed symbol stream. Recall also that the form of the explanation is a computer program of some sort that consists of logical functions. It turns out that creating a program that, for every possible sequence, finds the absolutely most compact program is uncomputable. The notion of what is “uncomputable” (or incomputable) is a mathematical result that has to do with how many different potential programs must be investigated to try to find the shortest one. If that number grows faster than the length of a program, it becomes uncomputable. Being uncomputable is not a death sentence, however. We can come up with approximate methods that try to follow the same procedure because any method that incrementally compresses the explanatory program will get closer to the hypothetical best program.

Sequitur by Nevill-Manning and Witten is an example of a procedure that approximates Algorithmic Information Theory optimization for string sequences. In the Sequitur algorithm, when repetitions occur in a sequence, a new rule is created that encompasses the repeated grouping. New rules can be abandoned as well when they no longer cover more than one occurrence due to some exemplars being absorbed by other examples. A slightly more sophisticated version of Sequitur involves creating new rules only when the number of bits required to represent both the rules and the encoded string is lower than without the rule. This approach comes even closer to Solomonoff Induction but requires more calculation of probabilities and the information content of the rules and sequences.

I’ve applied this latter approach in various contexts, including using the method to try to explain human guessing performance on predicting symbol sequences. Neural correlates of this capability likely exist since there are abstractly related capabilities that are used for edge detection in visual processing (an edge is a compact “guess” drawn from a noisy region of a visual field). Finding a grammar inferencing brain region (likely in Wernicke’s area) but that is used for a range of sequencing problems would support the argument that Solomonoff-like capabilities have a biological reality.