Category: Cognitive

Multitudes and the Mathematics of the Individual

The notion that there is a path from reciprocal altruism to big brains and advanced cognitive capabilities leads us to ask whether we can create “effective” procedures that shed additional light on the suppositions that are involved, and their consequences. Any skepticism about some virulent kind of scientism then gets whisked away by the imposition of a procedure combined with an earnest interest in careful evaluation of the outcomes. That may not be enough, but it is at least a start.

I turn back to Marcus Hutter, Solomonoff, and Chaitin-Kolmogorov at this point.  I’ll be primarily referencing Hutter’s Universal Algorithmic Intelligence (A Top-Down Approach) in what follows. And what follows is an attempt to break down how three separate factors related to intelligence can be explained through mathematical modeling. The first and the second are covered in Hutter’s paper, but the third may represent a new contribution, though perhaps an obvious one without the detail work that is needed to provide good support.

First, then, we start with a core requirement of any goal-seeking mechanism: the ability to predict patterns in the environment external to the mechanism. This is well-covered since Solomonoff in the 60s who formalized the implicit arguments in Kolmogorov algorithmic information theory (AIT), and that were subsequently expanded on by Greg Chaitin. In essence, given a range of possible models represented by bit sequences of computational states, the shortest sequence that predicts the observed data is also the optimal predictor for any future data also produced by the underlying generator function. The shortest sequence is not computable, but we can keep searching for shorter programs and come up with unique optimizations for specific data landscapes. And that should sound familiar because it recapitulates Occam’s Razor and, in a subset of cases, Epicurus’ Principle of Multiple Explanations. This represents the floor-plan of inductive inference, but it is only the first leg of the stool.

We should expect that evolutionary optimization might work according to this abstract formulation, but reality always intrudes. Instead, evolution is saddled by historical contingency that channels the movements through the search space. Biology ain’t computer science, in short, if for no other reason than it is tied to the physics and chemistry of the underlying operating system. Still the desire is there to identify such provable optimality in living systems because evolution is an optimizing engine, if not exactly an optimal one.

So we come to the second factor: optimality is not induction alone. Optimality is the interaction between the predictive mechanism and the environment. The “mechanism” might very well provide optimal or near optimal predictions of the past through a Solomonoff-style model, but applying those predictions introduces perturbations to the environment itself. Hutter elegantly simplifies this added complexity by abstracting the environment as a computing machine (a logical device; we assume here that the universe behaves deterministically even where it may have chaotic aspects) and running the model program at a slightly slower rate than the environmental program (it lags). Optimality is then a utility measure that combines prediction with resource allocation according to some objective function.

But what about the third factor that I promised and is missing? We get back to Fukuyama and the sociobiologists with this one: social interaction is the third factor. The exchange of information and the manipulation of the environment by groups of agents diffuses decision theory over inductive models of environments into a group of “mechanisms” that can, for example, transmit the location of optimal resource availability among the clan as a whole, increasing the utility of the individual agents with little cost to others. It seems appealing to expand Hutter’s model to include a social model, an agent model, and an environment within the purview of the mathematics. We might also get to the level where the social model overrides the agent model for a greater average utility, or where non-environmental signals from the social model interrupt function of the agent model, representing an irrational abstraction with group-selective payoff.

Bostrom on the Hardness of Evolving Intelligence

At 38,000 feet somewhere above Missouri, returning from a one day trip to Washington D.C., it is easy to take Nick Bostrom’s point that bird flight is not the end-all of what is possible for airborne objects and mechanical contrivances like airplanes in his paper, How Hard is Artificial Intelligence? Evolutionary Arguments and Selection Effects. His efforts to try to bound and distinguish the evolution of intelligence as either Hard or Not-Hard runs up against significant barriers, however. As a practitioner of the art, finding similarities between a purely physical phenomena like flying and something as complex as human intelligence falls flat for me.

But Bostrom is not taking flying as more than a starting point for arguing that there is an engineer-able possibility for intelligence. And that possibility might be bounded by a number of current and foreseeable limitations, not least of which is that computer simulations of evolution require a certain amount of computing power and representational detail in order to be a sufficient simulation. His conclusion is that we may need as much as another 100 years of improvements in computing technology just to get to a point where we might succeed at a massive-scale evolutionary simulation (I’ll leave to the reader to investigate his additional arguments concerning convergent evolution and observer selection effects).

Bostrom dismisses as pessimistic the assumption that a sufficient simulation would, in fact, require a highly detailed emulation of some significant portion of the real environment and the history of organism-environment interactions:

A skeptic might insist that an abstract environment would be inadequate for the evolution of general intelligence, believing instead that the virtual environment would need to closely resemble the actual biological environment in which our ancestors evolved … However, such extreme pessimism seems unlikely to be well founded; it seems unlikely that the best environment for evolving intelligence is one that mimics nature as closely as possible. It is, on the contrary, plausible that it would be more efficient to use an artificial selection environment, one quite unlike that of our ancestors, an environment specifically designed to promote adaptations that increase the type of intelligence we are seeking to evolve.

Unfortunately, I don’t see any easy way to bound the combined complexity of the needed substrate for evolutionary action (be it artificial organisms or just artificial neuronal networks) and the complexity of defining the necessary artificial environment for achieving the requested goal. It just makes it at least as hard and perhaps harder in that we can define a physical system much more easily than an abstract adaptive landscape designed to “promote…abstract reasoning and general problem-solving skills.”

Randomness and Meaning

The impossibility of the Chinese Room has implications across the board for understanding what meaning means. Mark Walker’s paper “On the Intertranslatability of all Natural Languages” describes how the translation of words and phrases may be achieved:

  1. Through a simple correspondence scheme (word for word)
  2. Through “syntactic” expansion of the languages to accommodate concepts that have no obvious equivalence (“optometrist” => “doctor for eye problems”, etc.)
  3. Through incorporation of foreign words and phrases as “loan words”
  4. Through “semantic” expansion where the foreign word is defined through its coherence within a larger knowledge network.

An example for (4) is the word “lepton” where many languages do not have a corresponding concept and, in fact, the concept is dependent on a bulwark of advanced concepts from particle physics. There may be no way to create a superposition of the meanings of other words using (2) to adequately handle “lepton.”

These problems present again for trying to understand how children acquire meaning in learning a language. As Walker points out, language learning for a second language must involve the same kinds of steps as learning translations, so any simple correspondence theory has to be supplemented.

So how do we make adequate judgments about meanings and so rapidly learn words, often initially with a course granularity but later with increasingly sharp levels of focus? What procedure is required for expanding correspondence theories to operate in larger networks? Methods like Latent Semantic Analysis and Random Indexing show how this can be achieved in ways that are illuminating about human cognition. In each case, the methods provide insights into how relatively simple transformations of terms and their occurrence contexts can be viewed as providing a form of “triangulation” about the meaning of words. And, importantly, this level of triangulation is sufficient for these methods to do very human-like things. Both methods can pass the TOEFL exam, for instance, and Latent Semantic Analysis is at the heart of automatic essay grading approaches that have sufficiently high success rates that they are widely used by standardized test makers.

How do they work? I’ll just briefly describe Random Indexing, since I recently presented the concept at the Big Data Science meetup at SGI in Fremont, California. In Random Indexing, we simply create a randomized sparse vector for each word we encounter in a large collection of texts. The vector can be binary as a first approximation, so something like:

The: 0000000000000100000010000000000000000001000000000000000…

quick: 000100000000000010000000000001000000000110000000000000…

fox: 0000000000000000000000100000000000000000000000000100100…

Now, as I encountered a given word in the text, I just add up the random vectors for the words around it to create a new “context” vector that is still sparse, but less so than the component parts. What is interesting about this approach is that if you consider the vectors as representing points in a hyperspace with the same dimensionality as the vectors are long, then words that have similar meanings tend to cluster in that space. Latent Semantic Analysis achieves a similar clustering using some rather complex linear algebra. A simple approximation of the LSA approach is also at the heart of Google’s PageRank algorithm, though operating on link structure rather than word co-occurrences.

So how do we solve the TOEFL test using an approach like Random Indexing? A large collection of texts are analyzed to create a Random Index, then for a sample question like:

In line 5, the word “pronounced” most closely means

  1. evident
  2. spoken
  3. described
  4. unfortunate

The question and the question text are converted into a context vector using the same random vectors for the index and then the answers vectors are compared to see which is closest in the index space. This is remarkably inexpensive to compute, requiring just an inner product between the context vectors for question and answer.

A method for compact coding using Algorithmic Information Theory can also be used to achieve similar results, demonstrating the wide applicability of context-based analysis to helping understand how intertranslateability and language learning are dependent on the rich contexts of word usage.

The Comets of Literary Cohesion

Every few years, with the hyperbolic regularity of Kahoutek’s orbit, I return to B.R. Myers’ 2001 Atlantic essay, A Reader’s Manifesto, where he plays the enfant terrible against the titans of serious literature. With savagery Myers tears out the elliptical heart of Annie Proulx and then beats regular holes in Cormac McCarthy and Don DeLillo in a conscious mockery of the strained repetitiveness of their sentences.

I return to Myers because I currently have four novels in process. I return because I hope to be saved from the delirium of the postmodern novel that wants to be written merely because there is nothing really left to write about, at least not without a self-conscious wink:

But today’s Serious Writers fail even on their own postmodern terms. They urge us to move beyond our old-fashioned preoccupation with content and plot, to focus on form instead—and then they subject us to the least-expressive form, the least-expressive sentences, in the history of the American novel. Time wasted on these books is time that could be spent reading something fun.

Myers’ essay hints at what he sees as good writing, quoting Nabakov, referencing T.S. Eliot, and analyzing the controlled lyricism of Saul Bellow. Evaporating the boundaries between the various “brows” and accepting that action, plot, and invention are acceptable literary conceits also marks Myers’ approach to literary analysis.

It is largely an atheoretic analysis but there is a hint at something more beneath the surface when Myers describes the disdain of European peasants for the transition away from the inscrutable Latin masses and benedictions and into the language of the common man: “Our parson…is a plain honest man… But…he is no Latiner.” Myers counts the fascination with arabesque prose, with labeling it as great even when it lacks content, as derived from the same fascination that gripped the peasants: majesty is inherent in obscurity. Anyone who has struggled with trying to translate foreign prose or tried to transcribe music from one instrument to another rapidly understands why the problems are unassailable cliffs to the outsider. So it is with literary prose. The less I understand, the more I feel it.

But what more is there to this? We break now away from literary criticism and to the psychology of text comprehension itself, bobbing and weaving a bit to avoid falling into the cliché of the postmodern novel. First, we know that reading comprehension is affected by two obvious factors: (a) our background knowledge of the topic, and (b) the cohesion of the text. We intuitively understand (a) when it comes to scientific texts. If we have a degree in the topic we have more background knowledge than if we don’t, for instance. (b) requires defining “cohesion” a bit. Cohesion can be measured by looking at repeated nouns and bridging concepts from one paragraph to another. Highly cohesive texts tie concepts together across sentences and paragraphs, reinforcing the relationships that are expressed in one sentence with those in others, forming semantic bridges to enhance the text. Less cohesive texts are more scatter-shot, leaving the reader to infer the relevant bridging principles.

The interaction between (a) and (b) could hardly be more interesting. When (a) is high, readers learn better and retain more when (b) is low. Repeat: low coherence texts are better for high knowledge learners. If one knows a lot, one gets easily bored by all the carefully chained concepts of high cohesion texts. Or, as a friend once said about software and hardware manuals, “They are for the weak of mind.” The opposite is obviously true. If you know little about a subject, the semantic bridges help get you from knowledge Midgard to Asgard, but they are just road noise for the knowledgeable.

It is aspirational, then, when obscurantist language and metaphors pile up like proton parts in the Large Hadron Collider. The authors are asking their readers to search for the God Participle. They want to be Latiners. They want Cormac McCarthy to transform the bloody mess of westward expansion into Exodus. They want it because they want low cohesion texts and the feeling of sailing through the vaulted ceilings of ancient cathedrals, like Leary on acid, like penitentes against the whip, like gurus stinking of enlightenment. Then the gentle readers can finally bask among the deconstructed mists after the dream has faded, waiting for the next cycle of literary critics to anoint the next round of elaborated prose and, as Kahoutek returns, so the gentle rush of spring will come again to the countryside.

Eusociality, Errors, and Behavioral Plasticity

I encountered an error in E.O. Wilson’s The Social Conquest of Earth where the authors intended to assert an alternative to “kin selection” but instead repeated “multilevel selection,” which is precisely what the authors wanted to draw a distinction with. I am sympathetic, however, if for no other reason than I keep finding errors and issues with my own books and papers.

The critical technical discussion from Nature concerning the topic is available here. As technical discussion, the issues debated are fraught with details like how halictid bees appear to live socially, but are in fact solitary animals that co-exist in tunnel arrangements.

Despite the focus on “spring-loaded traits” as determiners for haplodiploid animals like bees and wasps, the problem of big-brained behavioral plasticity keeps coming up in Wilson’s book. Humanity is a pinnacle because of taming fire, because of the relative levels of energy available in animal flesh versus plant matter, and because of our ability to outrun prey over long distances (yes, our identity emerges from marathon running). But these are solutions that correlate with the rapid growth of our craniums.

So if behavioral plasticity is so very central to who we are, we are faced with an awfully complex problem in trying to simulate that behavior. We can expect that there must be phalanxes of genes involved in setting our developmental path (our nature and the substrate for our nurture). We should, indeed, expect that almost no cognitive capacity is governed by a small set of genes, and that all the relevant genes work in networks through polygeny, epistasis, and related effects (pleiotropy). And we can expect no easy answers as a result, except to assert that AI is exactly as hard as we should have expected, and progress will be inevitably slow in understanding the mind, the brain, and the way we interact.

Mental Religious Math

The Los Angeles Times reports on an article from Science that shows that analytical thinking and religious belief may be inversely proportional. That may not be news to some, but at least one of the example problems (that the religious failed to complete at higher rates than the non-religious) is quite interesting:

A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?

Here there is an initial intuitive leap that says that $1 more than 10 cents is precisely $1.10. Therefore, the ball costs $0.10 or 10 cents. It is easy and clear, but the math undermines that result:

Bat + Ball = $1.10

Bat = Ball + $1.00

(Ball + $1.00) + Ball = $1.10

2*Ball + $1.00 = $1.10

2*Ball = $0.10

Ball = $0.05

Bat = $1.05

This result makes sense if one considers that the cost of the bat must be greater than $1.00 because it is $1.00 more than the ball (and not $1.00 itself). But it is obscured by the initial intuitive leap based on simple subtraction.

But that example is a pretty hard one to reconcile concerning historical and faith-based judgments. I doubt that there is good reason to suspect that mathematical prowess and intuitive dispositions about the costs of things correlates with faith decisions because I suspect that there are other reasons that the faithful believe what they believe. Specifically, most religious faithful believe because they were told to believe by their parents or community. The fact that they do so doesn’t seem to translate into mathematical prowess or ineptness. They may very well score more poorly on these kinds of results because of other individual differences like that those who are more predisposed to critical analytical skills are less likely to have come from highly religious backgrounds, and vice versa.

Interestingly, the subsequent reported experiments appear to show that religious belief might be mutable by more analytical tasks, which is a different sort of argument altogether, but one that shows that belief is more variable than our folk psychology often seems to suggest.

Fiction and Empathy

The New York Times reviews the neuroscience associated with reading fictional accounts, concluding that the brain states of readers show similar activation patterns to people experiencing the events described in the book. This, in turn, enhances and improves our own “theory of mind” about others when we read about social interactions:

[I]ndividuals who frequently read fiction seem to be better able to understand other people, empathize with them and see the world from their perspective. This relationship persisted even after the researchers accounted for the possibility that more empathetic individuals might prefer reading novels.

Steven Pinker’s The Better Angels of Our Nature suggests that the advent of the printing press is where we see the start of a shift in European societies’ attitudes about violence. The spread of reading and the growth of political satire correlate with reductions in state violence through the 19th Century and into the 20th (yes, he argues the 20th shows a reduction in violence, despite our intuitions about WWI and WWII). Think Voltaire. Think All Quiet on the Western Front.

AntiTerran Metatextuality

Intertextuality is a loaded word. It covers allusion and parody and reference. For some authors, it is the motivation to write, from Umberto Eco’s semiotic indulgences to Nabokov’s vast, layered palimpsest in Ada. I create deliberate allusions to Genesis in Teleology and references to Nabokov’s Ada in Signals and Noise.

The opposite of intertextuality might be centrality or concreteness, but it might also be the extension of the literature or artwork as references in other works that extend or reimagine the original work, creating a literary chain of sorts. Your intertextual references are referenced by my metatextual extensions.  Outertextuality? Whatever the term, we get a kind of referential landscape like a network that builds on an artificial landscape, the lives of imagined characters, and the universe of ideas that they inhabit.

Dieter Zimmer, who appears to have done the German translation of Ada, has a brilliant example of metatextuality in his Geography of AntiTerra. With methodical precision, he translates the textual descriptions into a map of the imagined world–a kind of fan cartography that solidifies the strange geography into a complete realization. I’m reminded of the Elven dictionaries in The Silmarillion or the detailed online fan fiction from adoring readers of current bestsellers.

I think there is likely a strong connection between the psychology of religious belief and the same motivators towards metatextuality. Imagined worlds are always interesting and plotted. Even when characters are harmed or injured, we feel only fleeting sensitivity to the idea of their injury. Moreover, the intertextuality is a network of coherence-supplying support for the narrative’s epistemology. The more detail, the greater sense of clarity of the imagined world, and the more buy in as to the reality of mysteries described therein.

Interestingly, there is both supporting and counter-evidence for this idea.  The previously discussed work on apophenia leads the way, but we can drill in even more closely on these notions by looking at experimental methods that show relationships between New Age belief and schizotypic personality indicators (although not traditional religious belief, interestingly), as well as the evidence that semantic association is greater among schizotypic personalities.  Building that palimpsest of associations is carefully-controlled madness.

Creativity and Proximate Causation

Combining aspects of the previous posts, what proximate mechanisms might be relevant to the notion of artistic fitness? Scott Barry Kauffman at rounds up some of the most interesting recent research and thinking on this topic in his post, Must One Risk Madness to Achieve Genius?

Touching on work by luminaries like Susan Blackmore and others, Scott drives from personality assessment concepts down through the role of dopamine in trying to identify whether there is a spectrum of observable traits that are linked to creativity and artistic achievement.


Daniel Nettle and Helen Clegg found that apophenia was positively related to a higher number of sexual partners for both men and women, and this relationship was explained by artistic creative activity. Similarly, in a more recent study conducted by Helen Cleff, Daniel Nettle, and Dorothy Miell, they found that more successful male artists (who are presumably higher in apophenia) had more sexual partners than less successful male artists.

Apophenia means seeing patterns in the environment where none may be present, a central theme in my second novel, Signals and Noise.

We can hypothesize also, based on the distribution from schizophrenia through schizotypy, through to “normal,” that there must be a large complement of interacting genes involved in these traits. This is supported by the evidence of genetic predispositions for schizophrenia, for instance, but also by the frustrating lack of success in identifying the genes that are involved.  This distribution may, in fact, be one of the most critical aspects of what it means to be human:

Were it not for those “disordered” genes, you wouldn’t have extremely creative, successful people.  Being in the absolute middle of every trait spectrum, not too extreme in any one direction, makes you balanced, but rather boring.  The tails of the spectrum, or the fringe, is where all the exciting stuff happens.  Some of the exciting stuff goes uncontrolled and ends up being a psychological disorder, but some of those people with the traits that define Bipolar Disorder, Schizophrenia, ADHD, and other psychological conditions, have the fortunate gift of high cognitive control paired with those traits, and end up being the creative geniuses that we admire, aspire to be like, and desperately need in this world.

Evolutionary Oneirology

I was recently contacted by a startup that is developing a dream-recording app. The startup wants to automatically extract semantic relationships and correlate the narratives that dreamers type into their phones. I assume that the goal is to help the user try to understand their dreams. But why should we assume that dreams are understandable? We now know that waking cognition is unreliable, that meta-cognitive strategies influence decision making, that base rate fallacies are the norm, that perceptions are shaped by apophenia, that framing and language choices dominate decision-making under ambiguity, and that moral judgments are driven by impulse and feeling rather than any rational calculus.

Yet there are some remarkable consistencies about dream content that have led to elaborate theorization down through the ages. Dreams, by being cryptic, want to be explained. But the content of dreams, when sorted out, leads us less to Kerkule’s Rings or to Freud and Jung, and more to asking why there is so much anxiety present in dreams? The Evolutionary Theory of Dreaming by Finnish researcher Revonsuo tries to explain the overrepresentation of threats and fear in dreams by suggesting that the brain is engaged in a process of reliving conflict events as a form of implicit learning. Evidence in support of this theory includes experimental observations that threatening dreams increase in frequency for children who experienced trauma in childhood combined with the cross-cultural persistence of threatening dream content (and likely cross-species as anyone who has watched a cat twitch in deep sleep suspects). To date, however, the question of whether these dream cycles result in learning or improved responses to future conflict remains unanswered.

I turned down consulting for the startup because of time constraints, but the topic of dream anxiety comes back to me every few years when I startle out of one of those recurring dreams where I have not studied for the final exam and am desperately pawing through a sheaf of empty paper trying to find my notes. I apparently still haven’t learned enough about deadlines, just like my ancient ancestors never learned enough about Sabertooth Tiger stalking patterns.