Category: Cognitive Science

Evolutionary Optimization and Environmental Coupling

Red QueensCarl Schulman and Nick Bostrom argue about anthropic principles in “How Hard is Artificial Intelligence? Evolutionary Arguments and Selection Effects” (Journal of Consciousness Studies, 2012, 19:7-8), focusing on specific models for how the assumption of human-level intelligence should be easy to automate are built upon a foundation of assumptions of what easy means because of observational bias (we assume we are intelligent, so the observation of intelligence seems likely).

Yet the analysis of this presumption is blocked by a prior consideration: given that we are intelligent, we should be able to achieve artificial, simulated intelligence. If this is not, in fact, true, then the utility of determining whether the assumption of our own intelligence being highly probable is warranted becomes irrelevant because we may not be able to demonstrate that artificial intelligence is achievable anyway. About this, the authors are dismissive concerning any requirement for simulating the environment that is a prerequisite for organismal and species optimization against that environment:

In the limiting case, if complete microphysical accuracy were insisted upon, the computational requirements would balloon to utterly infeasible proportions. However, such extreme pessimism seems unlikely to be well founded; it seems unlikely that the best environment for evolving intelligence is one that mimics nature as closely as possible. It is, on the contrary, plausible that it would be more efficient to use an artificial selection environment, one quite unlike that of our ancestors, an environment specifically designed to promote adaptations that increase the type of intelligence we are seeking to evolve (say, abstract reasoning and general problem-solving skills as opposed to maximally fast instinctual reactions or a highly optimized visual system).

Why is this “unlikely”? The argument is that there are classes of mental function that can be compartmentalized away from the broader, known evolutionary provocateurs. For instance, the Red Queen argument concerning sexual optimization in the face of significant parasitism is dismissed as merely a distraction to real intelligence:

And as mentioned above, evolution scatters much of its selection power on traits that are unrelated to intelligence, such as Red Queen’s races of co-evolution between immune systems and parasites. Evolution will continue to waste resources producing mutations that have been reliably lethal, and will fail to make use of statistical similarities in the effects of different mutations. All these represent inefficiencies in natural selection (when viewed as a means of evolving intelligence) that it would be relatively easy for a human engineer to avoid while using evolutionary algorithms to develop intelligent software.

Inefficiencies? Really? We know that sexual dimorphism and competition are essential to the evolution of advanced species. Even the growth of brain size and creative capabilities are likely tied to sexual competition, so why should we think that they can be uncoupled? Instead, we are left with a blocker to the core argument that states instead that simulated evolution may, in fact, not be capable of producing sufficient complexity to produce intelligence as we know it without, in turn, a sufficiently complex simulated fitness function to evolve against. Observational effects, aside, if we don’t get this right, we need not worry about the problem of whether there are 10 or ten billion planets suitable for life out there.

Active Deep Learning

BrainDeep Learning methods that use auto-associative neural networks to pre-train (with bottlenecking methods to ensure generalization) have recently been shown to perform as well and even better than human beings at certain tasks like image categorization. But what is missing from the proposed methods? There seem to be a range of challenges that revolve around temporal novelty and sequential activation/classification problems like those that occur in natural language understanding. The most recent achievements are more oriented around relatively static data presentations.

Jürgen Schmidhuber revisits the history of connectionist research (dating to the 1800s!) in his October 2014 technical report, Deep Learning in Neural Networks: An Overview. This is one comprehensive effort at documenting the history of this reinvigorated area of AI research. What is old is new again, enhanced by achievements in computing that allow for larger and larger scale simulation.

The conclusions section has an interesting suggestion: what is missing so far is the sensorimotor activity loop that allows for the active interrogation of the data source. Human vision roams over images while DL systems ingest the entire scene. And the real neural systems have energy constraints that lead to suppression of neural function away from the active neural clusters.

The Deep Computing Lessons of Apollo

Apollo 11With the arrival of the Apollo 11 mission’s 45th anniversary, and occasional planning and dreaming about a manned mission to Mars, the role of information technology comes again into focus. The next great mission will include a phalanx of computing resources, sensors, radars, hyper spectral cameras, laser rangefinders, and information fusion visualization and analysis tools to knit together everything needed for the astronauts to succeed. Some of these capabilities will be autonomous, predictive, and knowledgable.

But it all began with the Apollo Guidance Computer or AGC, the rather sophisticated for-its-time computer that ran the trigonometric and vector calculations for the original moonshot. The AGC was startlingly simple in many ways, made up exclusively of NOR gates to implement Arithmetic Logic Unit-like functionality, shifts, and register opcodes combined with core memory (tiny ferromagnetic loops) in both RAM and ROM forms (the latter hand-woven by graduate students).

Using NOR gates to create the entire logic of the central processing unit is guided by a few simple principles. A NOR gate combines both NOT and OR functionality together and has the following logical functionality:

[table id=1 /]

The NOT-OR logic can be read as “if INPUT1 or INPUT2 is set to 1, then the OUTPUT should be 1, but then take the logical inversion (NOT) of that”. And, amazingly, circuits built from NORs can create any Boolean logic. NOT A is just NOR(A,A), which you can see from the following table:

[table id=2 /]

AND and OR can similarly be constructed by layering NORs together. For Apollo, the use of just a single type of integrated circuit that packaged NORs into chips improved reliability.

This level of simplicity has another important theoretical result that bears on the transition from simple guidance systems to potentially intelligent technologies for future Mars missions: a single layer of Boolean functions can only compute simple things. And as you layer on the functions you get increased complexity but complexity that is bounded by the depth of the logical function network. In fact, it can be proved that there are functions that can be represented in a k-depth network that can only be represented in a k-1 depth network if that network has exponentially many hidden units relative to the input size.

This is a startling theoretical discovery and motivates much of the deep learning research: functions for classification of Martian hyper spectral imagery need deep networks precisely because the complexity of the classification task rules out the use of shallower ones. Now mostly we are using artificial neural node simplifications to do this rather than Boolean primitives, but the motivations are the same.

But back to the crawling that predates the running: besides basic logical operations, how can we do something more usefully complex using NORs? Here’s an example logic circuit from circuitstoday.com that shows an adding circuit for adding together bits:

where each of the little half-moons are NORs with their inputs on the left and their outputs to the right. A and B are the inputs while S is the output, and C is the “carry” to the next significant bit. By combining these together, they can add arbitrarily large binary representations of integers with a circuit depth of 7 per 2 bit adder.

Trees of Lives

Tree of LifeWith a brief respite between vacationing in the canyons of Colorado and leaving tomorrow for Australia, I’ve open-sourced an eight-year-old computer program for converting one’s DNA sequences into an artistic rendering. The input to the program are the allelic patterns from standard DNA analysis services that use the Short Tandem Repeat Polymorphisms from forensic analysis, as well as poetry reflecting one’s ethnic heritage. The output is generative art: a tree that overlays the sequences with the poetry and a background rendered from the sequences.

Generative art is perhaps one of the greatest aesthetic achievements of the late 20th Century. Generative art is, fundamentally, a recognition that the core of our humanity can be understood and converted into meaningful aesthetic products–it is the parallel of effective procedures in cognitive science, and developed in lock-step with the constructive efforts to reproduce and simulate human cognition.

To use Tree of Lives, install Java 1.8, unzip the package, and edit the supplied markconfig.txt to enter in your STRs and the allele variant numbers in sequence per line 15 of the configuration file. Lines 16+ are for lines of poetry that will be rendered on the limbs of the tree. Other configuration parameters can be discerned by examining com.treeoflives.CTreeConfig.java, and involve colors, paths, etc. Execute the program with:

java -cp treeoflives.jar:iText-4.2.0-com.itextpdf.jar com.treeoflives.CAlleleRenderer markconfig.txt

Inching Towards Shannon’s Oblivion

SkynetFollowing Bill Joy’s concerns over the future world of nanotechnology, biological engineering, and robotics in 2000’s Why the Future Doesn’t Need Us, it has become fashionable to worry over “existential threats” to humanity. Nuclear power and weapons used to be dreadful enough, and clearly remain in the top five, but these rapidly developing technologies, asteroids, and global climate change have joined Oppenheimer’s misquoted “destroyer of all things” in portending our doom. Here’s Max Tegmark, Stephen Hawking, and others in Huffington Post warning again about artificial intelligence:

One can imagine such technology outsmarting financial markets, out-inventing human researchers, out-manipulating human leaders, and developing weapons we cannot even understand. Whereas the short-term impact of AI depends on who controls it, the long-term impact depends on whether it can be controlled at all.

I almost always begin my public talks on Big Data and intelligent systems with a presentation on industrial revolutions that progresses through Robert Gordon’s phases and then highlights Paul Krugman’s argument that Big Data and the intelligent systems improvements we are seeing potentially represent a next industrial revolution. I am usually less enthusiastic about the timeline than nonspecialists, but after giving a talk at PASS Business Analytics Friday in San Jose, I stuck around to listen in on a highly technical talk concerning statistical regularization and deep learning and I found myself enthused about the topic once again. Deep learning is using artificial neural networks to classify information, but is distinct from traditional ANNs in that the systems are pre-trained using auto-encoders to have a general knowledge about the data domain. To be clear, though, most of the problems that have been tackled are “subsymbolic” for image recognition and speech problems. Still, the improvements have been fairly impressive based on some pretty simple ideas. First, the pre-training is accompanied by systematic bottlenecking of the number of nodes that can be used for learning. Second, the amount that each fires is kept low to avoid overfitting to nodes with dominating magnitudes. Together, the auto-encoders learn the patterns without training and can then be trained faster and easier to associate those patterns with classes.

I still have my doubts concerning the threat timeline, however. For one, these are mostly sub-symbolic systems that are not capable of the kinds of self-directed system modifications that many fear can lead to exponential self-improvement. Second, the tasks that are seeing improvements are not new but just relatively well-known classification problems. Finally, the improvements, while impressive, are incremental improvements. There is probably a meaningful threat profile that can convert into a decision tree for when action is needed. For global climate change there are consensus estimates about sea level changes for instance. For Evil AI I think we need to wait for a single act of machine intelligence out-of-control before spending excessively on containment, policy, or regulation. In the meantime, though, keep a close eye on your laptop.

And then there’s the mild misanthropy of Claude Shannon, possibly driven by living too long in New Jersey:

I visualize a time when we will be to robots what dogs are to humans, and I’m rooting for the machines.

Parsimonious Portmanteaus

portmanteauMeaning is a problem. We think we might know what something means but we keep being surprised by the facts, research, and logical difficulties that surround the notion of meaning. Putnam’s Representation and Reality runs through a few different ways of thinking about meaning, though without reaching any definitive conclusions beyond what meaning can’t be.

Children are a useful touchstone concerning meaning because we know that they acquire linguistic skills and consequently at least an operational understanding of meaning. And how they do so is rather interesting: first, presume that whole objects are the first topics for naming; next, assume that syntactic differences lead to semantic differences (“the dog” refers to the class of dogs while “Fido” refers to the instance); finally, prefer that linguistic differences point to semantic differences. Paul Bloom slices and dices the research in his Précis of How Children Learn the Meanings of Words, calling into question many core assumptions about the learning of words and meaning.

These preferences become useful if we want to try to formulate an algorithm that assigns meaning to objects or groups of objects. Probabilistic Latent Semantic Analysis, for example, assumes that words are signals from underlying probabilistic topic models and then derives those models by estimating all of the probabilities from the available signals. The outcome lacks labels, however: the “meaning” is expressed purely in terms of co-occurrences of terms. Reconciling an approach like PLSA with the observations about children’s meaning acquisition presents some difficulties. The process seems too slow, for example, which was always a complaint about connectionist architectures of artificial neural networks as well. As Bloom points out, kids don’t make many errors concerning meaning and when they do, they rapidly compensate.

I’ve previously proposed a model for lexical acquisition that uses a coding hierarchy based on co-occurrence or other features. As new terms are observed, the hierarchy builds, in an unsupervised manner, by making local swaps and consolidations based on minimum description length principles. Thus, it bears a close relationship to Nevill-Manning’s SEQUITUR approach to sequence learning. There is a limitation to the approach in that in a tree-like grammar the complexity of examining all possible re-arrangements of the grammar when new symbols arrive seems to put a massive burden on any cognitive correlates that we might claim exist. Thus the system just uses local swaps and consolidations.

It’s worth considering how such an approach might solve the cluster labeling problem. If we cluster things together based on the parsimonious coding approach, the objects and their grammatical coordinations move higher up the tree. What is missing is a preference for adding new, distinctive terms that differentiate one grouping from another. For instance, in the toy sample given in my paper, “Financial Institution” or “Retail Bank” are not applied to the appropriate bank cluster, nor is “River Bank” applied to the other bank cluster. Instead we are just left with the shared context terms. I think this might be correctable in a larger grouping, however, by allowing for a distinguishing series of portmanteaus to be constructed by composition from nearby (in the semantic region) concepts. So, as the co-occurrences of bank and teller and ATM and loan pile up and get coded into groupings, the nearby finance, bank, retail bank, investment bank grouping is used to create a common portmanteau out of the most distinctive terms out of the set, and such that they most distinguish from the river semantic set.

In Like Flynn

The exceptionally interesting James Flynn explains the cognitive history of the past century and what it means in terms of human intelligence in this TED talk:

What does the future hold? While we might decry the “twitch” generation and their inundation by social media, gaming stimulation, and instant interpersonal engagement, the slowing observed in the Flynn Effect might be getting ready for another ramp-up over the next 100 years.

Perhaps most intriguing is the discussion of the ability to think in terms of hypotheticals as a a core component of ethical reasoning. Ethics is about gaming outcomes and also about empathizing with others. The influence of media as a delivery mechanism for narratives about others emerged just as those changes in cognitive capabilities were beginning to mature in the 20th Century. Widespread media had a compounding effect on the core abstract thinking capacity, and with the expansion of smartphones and informational flow, we may only have a few generations to go before the necessary ingredients for good ethical reasoning are widespread even in hard-to-reach areas of the world.

Contingency and Irreducibility

JaredTarbell2Thomas Nagel returns to defend his doubt concerning the completeness—if not the efficacy—of materialism in the explanation of mental phenomena in the New York Times. He quickly lays out the possibilities:

  1. Consciousness is an easy product of neurophysiological processes
  2. Consciousness is an illusion
  3. Consciousness is a fluke side-effect of other processes
  4. Consciousness is a divine property supervened on the physical world

Nagel arrives at a conclusion that all four are incorrect and that a naturalistic explanation is possible that isn’t “merely” (1), but that is at least (1), yet something more. I previously commented on the argument, here, but the refinement of the specifications requires a more targeted response.

Let’s call Nagel’s new perspective Theory 1+ for simplicity. What form might 1+ take on? For Nagel, the notion seems to be a combination of Chalmers-style qualia combined with a deep appreciation for the contingencies that factor into the personal evolution of individual consciousness. The latter is certainly redundant in that individuality must be absolutely tied to personal experiences and narratives.

We might be able to get some traction on this concept by looking to biological evolution, though “ontogeny recapitulates phylogeny” is about as close as we can get to the topic because any kind of evolutionary psychology must be looking for patterns that reinforce the interpretation of basic aspects of cognitive evolution (sex, reproduction, etc.) rather than explore the more numinous aspects of conscious development. So we might instead look for parallel theories that focus on the uniqueness of outcomes, that reify the temporal evolution without reference to controlling biology, and we get to ideas like uncomputability as a backstop. More specifically, we can explore ideas like computational irreducibility to support the development of Nagel’s new theory; insofar as the environment lapses towards weak predictability, a consciousness that self-observes, regulates, and builds many complex models and metamodels is superior to those that do not.

I think we already knew that, though. Perhaps Nagel has been too much a philosopher and too little involved in the sciences that surround and enervate modern theories of learning and adaption to see the movement towards the exits?