Be Persistent and Evolve

If we think about the evolution of living things we generally start from the idea that evolution requires replicators, variation, and selection. But what if we loosened that up to the more everyday semantics of the word “evolution” when we talk about the evolution of galaxies or of societies or of crystals? Each changes, grows, contracts, and has some kind of persistence that is mediated by a range of internal and external forces. For crystals, the availability of heat and access to the necessary chemicals is key. For galaxies, elements and gravity and nuclear forces are paramount. In societies, technological invention and social revolution overlay the human replicators and their biological evolution. Should we make a leap and just declare that there is some kind of impetus or law to the universe such that when there are composable subsystems and composition constraints, there will be an exploration of the allowed state space for composition? Does this add to our understanding of the universe?

Wong, et. al. say exactly that in “On the roles of function and selection in evolving systems” in PNAS. The paper reminds me of the various efforts to explain genetic information growth given raw conceptions of entropy and, indeed, some of those papers appear in the cites. It was once considered an intriguing problem how organisms become increasingly complex in the face of, well, the grinding dissolution of entropy. It wasn’t really that hard for most scientists: Earth receives an enormous load of solar energy that supports the push of informational systems towards negentropy. But, to the earlier point about composability and constraints, the energy is in a proportion that supports the persistence of systems that are complex.… Read the rest

Find the Alien

Assembly Theory (AT) (original paper) is some new theoretical chemistry that tries to assess the relative complexity of the molecular underpinnings of life, even when the chemistry might be completely alien. For instance, if we send a probe to a Jovian moon and there are new microscopic creatures in the ocean, how will we figure that out? In AT, it is assumed that all living organisms require a certain complexity in order to function since that is a minimal requirement for life on Earth. The chemists experimentally confirmed that mass spectrometry is a fairly reliable way of differentiating the complexity of living things and their byproducts from other substances. Of course, they only have Earthly living things to test, but they had no false positives in their comparison set of samples, though some substances like beer tended to be unusually high in their spectral analysis. The theory is that when a mass spec ionizes a sample and routes it through a magnetic and electric field, the complexity of the original molecules is represented in the complexity of the spray of molecular masses recorded by the detectors.

But what is “complexity” exactly? There are a great number of candidates, as Seth Lloyd notes in this little round-up paper that I linked to previously. Complexity intuitively involves something like a trade-off between randomness and uniformity, but also reflects internal repetition with variety. There is a mathematical formalism that in full attribution is “Solomonoff-Chaitin-Kolmogorov Complexity”—but we can just call it algorithmic complexity (AC) for short—that has always been an idealized way to think about complexity: take the smallest algorithm (in terms of bits) that can produce a pattern and the length of the algorithm in bits is the complexity.… Read the rest

Bereitschaftspotential and the Rehabilitation of Free Will

The question of whether we, as people, have free will or not is both abstract and occasionally deeply relevant. We certainly act as if we have something like libertarian free will, and we have built entire systems of justice around this idea, where people are responsible for choices they make that result in harms to others. But that may be somewhat illusory for several reasons. First, if we take a hard deterministic view of the universe as a clockwork-like collection of physical interactions, our wills are just a mindless outcome of a calculation of sorts, driven by a wetware calculator with a state completely determined by molecular history. Second, there has been, until very recently, some experimental evidence that our decision-making occurs before we achieve a conscious realization of the decision itself.

But this latter claim appears to be without merit, as reported in this Atlantic article. Instead, what was previously believed to be signals of brain activity that were related to choice (Bereitschaftspotential) may just be associated with general waves of neural activity. The new experimental evidence puts the timing of action in line with conscious awareness of the decision. More experimental work is needed—as always—but the tentative result suggests a more tightly coupled pairing of conscious awareness with decision making.

Indeed, the results of this newer experimental result gets closer to my suggested model of how modular systems combined with perceptual and environmental uncertainty can combine to produce what is effectively free will (or at least a functional model for a compatibilist position). Jettisoning the Chaitin-Kolmogorov complexity part of that argument and just focusing on the minimal requirements for decision making in the face of uncertainty, we know we need a thresholding apparatus that fires various responses given a multivariate statistical topology.… Read the rest

Free Will and Algorithmic Information Theory (Part II)

Bad monkey

So we get some mild form of source determinism out of Algorithmic Information Complexity (AIC), but we haven’t addressed the form of free will that deals with moral culpability at all. That free will requires that we, as moral agents, are capable of making choices that have moral consequences. Another way of saying it is that given the same circumstances we could have done otherwise. After all, all we have is a series of if/then statements that must be implemented in wetware and they still respond to known stimuli in deterministic ways. Just responding in model-predictable ways to new stimuli doesn’t amount directly to making choices.

Let’s expand the problem a bit, however. Instead of a lock-and-key recognition of integer “foodstuffs” we have uncertain patterns of foodstuffs and fallible recognition systems. Suddenly we have a probability problem with P(food|n) [or even P(food|q(n)) where q is some perception function] governed by Bayesian statistics. Clearly we expect evolution to optimize towards better models, though we know that all kinds of historical and physical contingencies may derail perfect optimization. Still, if we did have perfect optimization, we know what that would look like for certain types of statistical patterns.

What is an optimal induction machine? AIC and variants have been used to define that machine. First, we have Solomonoff induction from around 1960. But we also have Jorma Rissanen’s Minimum Description Length (MDL) theory from 1978 that casts the problem more in terms of continuous distributions. Variants are available, too, from Minimum Message Length, to Akaike’s Information Criterion (AIC, confusingly again), Bayesian Information Criterion (BIC), and on to Structural Risk Minimization via Vapnik-Chervonenkis learning theory.

All of these theories involve some kind of trade-off between model parameters, the relative complexity of model parameters, and the success of the model on the trained exemplars.… Read the rest

The Goldilocks Complexity Zone

FractalSince my time in the early 90s at Santa Fe Institute, I’ve been fascinated by the informational physics of complex systems. What are the requirements of an abstract system that is capable of complex behavior? How do our intuitions about complex behavior or form match up with mathematical approaches to describing complexity? For instance, we might consider a snowflake complex, but it is also regular in it’s structure, driven by an interaction between crystal growth and the surrounding air. The classic examples of coastlines and fractal self-symmetry also seem complex but are not capable of complex behavior.

So what is a good way of thinking about complexity? There is actually a good range of ideas about how to characterize complexity. Seth Lloyd rounds up many of them, here. The intuition that drives many of them is that complexity seems to be associated with distributions of relationships and objects that are somehow juxtapositioned between a single state and a uniformly random set of states. Complex things, be they living organisms or computers running algorithms, should exist in a Goldilocks zone when each part is examined and those parts are somehow summed up to a single measure.

We can easily construct a complexity measure that captures some of these intuitions. Let’s look at three strings of characters:

x = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

y = menlqphsfyjubaoitwzrvcgxdkbwohqyxplerz

z = the fox met the hare and the fox saw the hare

Now we would likely all agree that y and z are more complex than x, and I suspect most would agree that y looks like gibberish compared with z. Of course, y could be a sequence of weirdly coded measurements or something, or encrypted such that the message appears random.… Read the rest

Entanglement and Information

shannons-formula-smallResearch can flow into interesting little eddies that cohere into larger circulations that become transformative phase shifts. That happened to me this morning between a morning drive in the Northern California hills and departing for lunch at one of our favorite restaurants in Danville.

The topic I’ve been working on since my retirement is whether there are preferential representations for optimal automated inference methods. We have this grab-bag of machine learning techniques that use differing data structures but that all implement some variation on fitting functions to data exemplars; at the most general they all look like some kind of gradient descent on an error surface. Getting the right mix of parameters, nodes, etc. falls to some kind of statistical regularization or bottlenecking for the algorithms. Or maybe you perform a grid search in the hyperparameter space, narrowing down the right mix. Or you can throw up your hands and try to evolve your way to a solution, suspecting that there may be local optima that are distracting the algorithms from global success.

Yet, algorithmic information theory (AIT) gives us, via Solomonoff, a framework for balancing parameterization of an inference algorithm against the error rate on the training set. But, first, it’s all uncomputable and, second, the AIT framework just uses strings of binary as the coded Turing machines, so I would have to flip 2^N bits and test each representation to get anywhere with the theory. Yet, I and many others have had incremental success at using variations on this framework, whether via Minimum Description Length (MDL) principles, it’s first cousin Minimum Message Length (MML), and other statistical regularization approaches that are somewhat proxies for these techniques.… Read the rest

Machine Learning and the Coming Robot Apocalypse

Daliesque creepy dogsSlides from a talk I gave today on current advances in machine learning are available in PDF, below. The agenda is pretty straightforward: starting with some theory about overfitting based on algorithmic information theory, we proceed on through a taxonomy of ML types (not exhaustive), then dip into ensemble learning and deep learning approaches. An analysis of the difficulty and types of performance we get from various algorithms and problems is presented. We end with a discussion of whether we should be frightened about the progress we see around us.

Note: click on the gray square if you don’t see the embedded PDF…browsers vary.Read the rest

Learning around the Non Sequiturs

If Solomonoff Induction and its conceptual neighbors have not yet found application in enhancing human reasoning, there are definitely areas where they have potential value.  Automatic, unsupervised learning of sequential patterns is an intriguing area of application. It also fits closely with the sequence inferencing problem that is at the heart of algorithmic information theory.

Pragmatically, the problem of how children learn the interchangeability of words that is the basic operation of grammaticality is one area where this kind of system might be useful. Given a sequence of words or symbols, what sort of information is available for figuring out the grammatical groupings? Not much beyond memories of repetitions, often inferred implicitly.

Could we apply some variant of Solomonoff Induction at this point? Recall that we want to find the most compact explanation for the observed symbol stream. Recall also that the form of the explanation is a computer program of some sort that consists of logical functions. It turns out that creating a program that, for every possible sequence, finds the absolutely most compact program is uncomputable. The notion of what is “uncomputable” (or incomputable) is a mathematical result that has to do with how many different potential programs must be investigated to try to find the shortest one. If that number grows faster than the length of a program, it becomes uncomputable. Being uncomputable is not a death sentence, however. We can come up with approximate methods that try to follow the same procedure because any method that incrementally compresses the explanatory program will get closer to the hypothetical best program.

Sequitur by Nevill-Manning and Witten is an example of a procedure that approximates Algorithmic Information Theory optimization for string sequences.… Read the rest