Perhaps idiosyncratic to some is my focus in the previous post on the theoretical background to machine learning that derives predominantly from algorithmic information theory and, in particular, Solomonoff’s theory of induction. I do note that there are other theories that can be brought to bear, including Vapnik’s Structural Risk Minimization and Valiant’s PAC-learning theory. Moreover, perceptrons and vector quantization methods and so forth derive from completely separate principals that can then be cast into more fundamental problems in informational geometry and physics.
Artificial General Intelligence (AGI) is then perhaps the hard problem on the horizon that I disclaim as having had significant progress in the past twenty years of so. That is not to say that I am not an enthusiastic student of the topic and field, just that I don’t see risk levels from intelligent AIs rising to what we should consider a real threat. This topic of how to grade threats deserves deeper treatment, of course, and is at the heart of everything from so-called “nanny state” interventions in food and product safety to how to construct policy around global warming. Luckily–and unlike both those topics–killer AIs don’t threaten us at all quite yet.
But what about simply characterizing what AGIs might look like and how we can even tell when they arise? Mildly interesting is Simon Legg and Joel Veness’ idea of an Artificial Intelligence Quotient or AIQ that they expand on in An Approximation of the Universal Intelligence Measure. This measure is derived from, voilà, exactly the kind of algorithmic information theory (AIT) and compression arguments that I lead with in the slide deck. Is this the only theory around for AGI? Pretty much, but different perspectives tend to lead to slightly different focuses. For instance, there is little need to discuss AIT when dealing with Deep Learning Neural Networks. We just instead discuss statistical regularization and bottlenecking, which can be thought of as proxies for model compression.
So how can intelligent machines be characterized by something like AIQ? Well, the conclusion won’t be surprising. Intelligent machines are those machines that function well in terms of achieving goals over a highly varied collection of environments. This allows for tractable mathematical treatments insofar as the complexity of the landscapes can be characterized, but doesn’t really give us a good handle on what the particular machines might look like. They can still be neural networks or support vector machines, or maybe even something simpler, and through some selection and optimization process have the best performance over a complex topology of reward-driven goal states.
So still no reason to panic, but some interesting ideas that shed greater light on the still mysterious idea of intelligence and the human condition.
Donald Davidson argued that descriptive theories of semantics suffered from untenable complications that could, in turn, be solved by a holistic theory of meaning. Holism, in this sense, is due to the dependency of words and phrases as part of a complex linguistic interchange. He proposed “triangulation” as a solution, where we zero-in on a tentatively held belief about a word based on other beliefs about oneself, about others, and about the world we think we know.
This seems daringly obvious, but it is merely the starting point of the hard work of what mechanisms and steps are involved in fixing the meaning of words through triangulation. There are certainly some predispositions that are innate and fit nicely with triangulation. These are subsumed under The Principle of Charity and even the notion of the Intentional Stance in how we regard others like us.
Fixing meaning via model-making has some curious results. The language used to discuss aesthetics and art tends to borrow from other fields (“The narrative of the painting,” “The functional grammar of the architecture.”) Religious and spiritual terminology often has extremely porous models: I recently listened to Episcopalians discuss the meaning of “grace” for almost an hour with great glee but almost no progress; it was the belief that they were discussing something of ineffable greatness that was moving to them. Even seemingly simple scientific ideas become elaborately complex for both children and adults: we begin with atoms as billiard balls that mutate into mini solar systems that become vibrating clouds of probabilistic wave-particles around groups of properties in energetic suspension by virtual particle exchange.
Can we apply more formal models to the task of staking out this method of triangulation? For Davidson, language was both compositional and holistic, so it stands to reason that optimizing each vector of the triangulation can be rephrased as maximizing the agreement between the existing belief and new beliefs about terms and meaning, the models we hold about others’ beliefs about the terms, and any empirical facts or related desiderata that are at sway. And here we may have an application of Solomonoff Induction, again, as an extension to Bayesian model-making. How do I chose to order the meaning signals from each of my belief sources? Under what circumstances do I reorder them or abandon an existing model in an aha moment? If the meta-model for ordering and triangulation is a striving for parsimony, then radical revisionism by reorganizing the underlying explanatory model is optimal when it follows Solomonoff-like principles.
“Optimality” might be straining credulity here–especially given the above description of arguments about the meaning of “grace”–but there may be a modified sense of the word in that the mathematical purity of a Solomonoff result is implemented in cognition as a kind of heuristic that tends towards good results in the face of extremely noisy signals.