Indifference and the Cosmos

I am a political independent, though that does not mean that I vote willy-nilly. I have, in fact, been reliably center left for most of my adult life, save one youthfully rebellious moment when I voted Libertarian, more as a statement than a commitment to the principles of libertarianism per se. I regret that vote now, given additional exposure to the party and the kinds of people it attracts. To me, the extremes of the American political system build around radical positions, and the increasingly noxious conspiracy theories and unhinged rhetoric is nothing like the cautious, problem-solving utopia that might make me politically happy, or at least wince less.

Some might claim I am indifferent. I would not argue with that. In the face of revolution, I would require a likely impossible proof of a better outcome before committing. How can we possibly see into such a permeable and contingent future, or weigh the goods and harms in the face of the unknown? This idea of indifference, as a tempering of our epistemic insights, serves as a basis for an essential idea in probabilistic reasoning where it even has the name, the principle of indifference, or, variously, and in contradistinction with Leibniz’s principle of sufficient reason, the principle of insufficient reason.

So how does indifference work in probabilistic reasoning? Consider a Bayesian formulation: we inductively guess based on a combination of a priori probabilities combined with a posteriori evidences. What is the likelihood of the next word in an English sentence being “is”? Indifference suggests that we treat each word as likely as any other, but we know straight away that “is” occurs much more often than “Manichaeistic” in English texts because we can count words.… Read the rest

Incompressibility and the Mathematics of Ethical Magnetism

One of the most intriguing aspects of the current U.S. border crisis is the way that human rights and American decency get articulated in the public sphere of discourse. An initial pull is raw emotion and empathy, then there are counterweights where the long-term consequences of existing policies are weighed against the exigent effects of the policy, and then there are crackpot theories of “crisis actors” and whatnot as bizarro-world distractions. But, if we accept the general thesis of our enlightenment values carrying us ever forward into increasing rights for all, reduced violence and war, and the closing of the curtain on the long human history of despair, poverty, and hunger, we must also ask more generally how this comes to be. Steven Pinker certainly has rounded up some social theories, but what kind of meta-ethics might be at work that seems to push human civilization towards these positive outcomes?

Per the last post, I take the position that we can potentially formulate meaningful sentences about what “ought” to be done, and that those meaningful sentences are, in fact, meaningful precisely because they are grounded in the semantics we derive from real world interactions. How does this work? Well, we can invoke the so-called Cornell Realists argument that the semantics of a word like “ought” is not as flexible as Moore’s Open Question argument suggests. Indeed, if we instead look at the natural world and the theories that we have built up about it (generally “scientific theories” but, also, perhaps “folk scientific ideas” or “developing scientific theories”), certain concepts take on the character of being so-called “joints of reality.” That is, they are less changeable than other concepts and become referential magnets that have an elite status among the concepts we use for the world.… Read the rest

Entanglement and Information

shannons-formula-smallResearch can flow into interesting little eddies that cohere into larger circulations that become transformative phase shifts. That happened to me this morning between a morning drive in the Northern California hills and departing for lunch at one of our favorite restaurants in Danville.

The topic I’ve been working on since my retirement is whether there are preferential representations for optimal automated inference methods. We have this grab-bag of machine learning techniques that use differing data structures but that all implement some variation on fitting functions to data exemplars; at the most general they all look like some kind of gradient descent on an error surface. Getting the right mix of parameters, nodes, etc. falls to some kind of statistical regularization or bottlenecking for the algorithms. Or maybe you perform a grid search in the hyperparameter space, narrowing down the right mix. Or you can throw up your hands and try to evolve your way to a solution, suspecting that there may be local optima that are distracting the algorithms from global success.

Yet, algorithmic information theory (AIT) gives us, via Solomonoff, a framework for balancing parameterization of an inference algorithm against the error rate on the training set. But, first, it’s all uncomputable and, second, the AIT framework just uses strings of binary as the coded Turing machines, so I would have to flip 2^N bits and test each representation to get anywhere with the theory. Yet, I and many others have had incremental success at using variations on this framework, whether via Minimum Description Length (MDL) principles, it’s first cousin Minimum Message Length (MML), and other statistical regularization approaches that are somewhat proxies for these techniques.… Read the rest

Learning around the Non Sequiturs

If Solomonoff Induction and its conceptual neighbors have not yet found application in enhancing human reasoning, there are definitely areas where they have potential value.  Automatic, unsupervised learning of sequential patterns is an intriguing area of application. It also fits closely with the sequence inferencing problem that is at the heart of algorithmic information theory.

Pragmatically, the problem of how children learn the interchangeability of words that is the basic operation of grammaticality is one area where this kind of system might be useful. Given a sequence of words or symbols, what sort of information is available for figuring out the grammatical groupings? Not much beyond memories of repetitions, often inferred implicitly.

Could we apply some variant of Solomonoff Induction at this point? Recall that we want to find the most compact explanation for the observed symbol stream. Recall also that the form of the explanation is a computer program of some sort that consists of logical functions. It turns out that creating a program that, for every possible sequence, finds the absolutely most compact program is uncomputable. The notion of what is “uncomputable” (or incomputable) is a mathematical result that has to do with how many different potential programs must be investigated to try to find the shortest one. If that number grows faster than the length of a program, it becomes uncomputable. Being uncomputable is not a death sentence, however. We can come up with approximate methods that try to follow the same procedure because any method that incrementally compresses the explanatory program will get closer to the hypothetical best program.

Sequitur by Nevill-Manning and Witten is an example of a procedure that approximates Algorithmic Information Theory optimization for string sequences.… Read the rest