Startup Next

I’m thrilled to announce my new startup, Like Human. The company is focused on making significant new advances to the state of the art in cognitive computing and artificial intelligence. We will remain a bit stealthy for another six months or so and then will open up shop for early adopters.

I’m also pleased to share with you Like Human’s logo that goes by the name Logo McLogoface, or LM for short. LM combines imagery from nuclear warning signs, Robby the Robot from Forbidden Planet, and Leonardo da Vinci’s Vitruvian Man. I think you will agree about Mr. McLogoface’s agreeability:


You can follow developments at @likehumancom on Twitter, and I will make a few announcements here as well.

Euhemerus and the Bullshit Artist

trump-minotaurSailing down through the Middle East, past the monuments of Egypt and the wild African coast, and then on into the Indian Ocean, past Arabia Felix, Euhemerus came upon an island. Maybe he came upon it. Maybe he sailed. He was perhaps—yes, perhaps; who can say?—sailing for Cassander in deconstructing the memory of Alexander the Great. And that island, Panchaea, held a temple of Zeus with a written history of the deeds of men who became the Greek gods.

They were elevated, they became fixed in the freckled amber of ancient history, their deeds escalated into myths and legends. And, likewise, the ancient tribes of the Levant brought their El and Yah-Wah, and Asherah and Baal, and then the Zoroastrians influenced the diaspora in refuge in Babylon, until they returned and had found dualism, elemental good and evil, and then reimagined their origins pantheon down through monolatry and into monotheism. These great men and women were reimagined into something transcendent and, ultimately, barely understandable.

Even the rational Yankee in Twain’s Connecticut Yankee in King Arthur’s Court realizes almost immediately why he would soon rule over the medieval world as he is declared a wild dragon when presented to the court. He waits for someone to point out that he doesn’t resemble a dragon, but the medieval mind does not seem to question the reasonableness of the mythic claims, even in the presence of evidence.

So it goes with the human mind.

And even today we have Fareed Zakaria justifying his use of the term “bullshit artist” for Donald Trump. Trump’s logorrhea is punctuated by so many incomprehensible and contradictory statements that it becomes a mythic whirlwind. He lets slip, now and again, that his method is deliberate:

DT: Therefore, he was the founder of ISIS.

HH: And that’s, I’d just use different language to communicate it, but let me close with this, because I know I’m keeping you long, and Hope’s going to kill me.

DT: But they wouldn’t talk about your language, and they do talk about my language, right?

Bullshit artist is the modern way of saying what Euhemerus was trying to say in his fictional “Sacred History.” Yet we keep getting entranced by these coordinated maelstroms of utter crap, from World Net Daily to Infowars to Fox News to Rush Limbaugh. Only the old Steven Colbert could contend with it through his own bullshit mythical inversion. Mockery seems the right approach, but it doesn’t seem to have a great deal of impact on the conspiratorial mind.

Motivation, Boredom, and Problem Solving

shatteredIn the New York Times Stone column, James Blachowicz of Loyola challenges the assumption that the scientific method is uniquely distinguishable from other ways of thinking and problem solving we regularly employ. In his example, he lays out how writing poetry involves some kind of alignment of words that conform to the requirements of the poem. Whether actively aware of the process or not, the poet is solving constraint satisfaction problems concerning formal requirements like meter and structure, linguistic problems like parts-of-speech and grammar, semantic problems concerning meaning, and pragmatic problems like referential extension and symbolism. Scientists do the same kinds of things in fitting a theory to data. And, in Blachowicz’s analysis, there is no special distinction between scientific method and other creative methods like the composition of poetry.

We can easily see how this extends to ideas like musical composition and, indeed, extends with even more constraints that range from formal through to possibly the neuropsychology of sound. I say “possibly” because there remains uncertainty on how much nurture versus nature is involved in the brain’s reaction to sounds and music.

In terms of a computational model of this creative process, if we presume that there is an objective function that governs possible fits to the given problem constraints, then we can clearly optimize towards a maximum fit. For many of the constraints there are, however, discrete parameterizations (which part of speech? which word?) that are not like curve fitting to scientific data. In fairness, discrete parameters occur there, too, especially in meta-analyses of broad theoretical possibilities (Quantum loop gravity vs. string theory? What will we tell the children?) The discrete parameterizations blow up the search space with their combinatorics, demonstrating on the one hand why we are so damned amazing, and on the other hand why a controlled randomization method like evolutionary epistemology’s blind search and selective retention gives us potential traction in the face of this curse of dimensionality. The blind search is likely weakened for active human engagement, though. Certainly the poet or the scientist would agree; they are using learned skills, maybe some intellectual talent of unknown origin, and experience on how to traverse the wells of improbability in finding the best fit for the problem. This certainly resembles pre-training in deep learning, though on a much more pervasive scale, including feedback from categorical model optimization into the generative basis model.

But does this extend outwards to other ways in which we form ideas? We certainly know that motivated reasoning is involved in key aspects of our belief formation, which plays strongly into how we solve these constraint problems. We tend to actively look for confirmations and avoid disconfirmations of fit. We positively bias recency of information, or repeated exposures, and tend to only reconsider in much slower cycles.

Also, as the constraints of certain problem domains become, in turn, extensions that can result in change—where there is a dynamic interplay between belief and success—the fixity of the search space itself is no longer guaranteed. Broad human goals like the search for meaning are an example of that. In come complex human factors, like how boredom correlates with motivation and ideological extremism (overview, here, journal article, here).

This latter data point concerning boredom crosses from mere bias that might preclude certain parts of a search space into motivation that focuses it, and that optimizes for novelty seeking and other behaviors.

Soul Optimization

Against SuperheroesI just did a victory lap around wooden columns in my kitchen and demanded high-fives all around: Against Superheroes is done. Well, technically it just topped the first hurdle.  Core writing is complete at 100,801 words. I will now do two editorial passes and then send it to my editor for clean-up. Finally, I’ll get some feedback from my wife before sending it out for independent review.

I try to write according to a daily schedule but I have historically been an inconsistent worker. I track everything using a spreadsheet and it doesn’t look pretty:


Note the long gaps. The gaps are problematic for several reasons, not the least of which is that I have to go back and read everything again to return to form. The gaps arrive with excuses, then get amplified by more excuses, then get massaged into to-do lists, and then always get resolved by unknown forces. Maybe they are unknowable.

The one consistency that I have found is that I always start strong and finish strong, bursts of enthusiasm for the project arriving with runner’s high on the trail, or while waiting in traffic. The plot thickets open to luxuriant fields. When I’m in the gap periods I distract myself too easily, finding the deep research topics an easy way to justify an additional pause of days, then weeks, sometimes months.

I guess I should resolve to find my triggers and work to overcome these tendencies, but I’m not certain that it matters. There is no rush, and those exuberant starts and ends are perhaps enough of a reward that no deeper optimization of my soul is needed.

Quantum Field Is-Oughts

teleologySean Carroll’s Oxford lecture on Poetic Naturalism is worth watching (below). In many ways it just reiterates several common themes. First, it reinforces the is-ought barrier between values and observations about the natural world. It does so with particular depth, though, by identifying how coarse-grained theories at different levels of explanation can be equally compatible with quantum field theory. Second, and related, he shows how entropy is an emergent property of atomic theory and the interactions of quantum fields (that we think of as particles much of the time) and, importantly, that we can project the same notion of boundary conditions that result in entropy into the future resulting in a kind of effective teleology. That is, there can be some boundary conditions for the evolution of large-scale particle systems that form into configurations that we can label purposeful or purposeful-like. I still like the term “teleonomy” to describe this alternative notion, but the language largely doesn’t matter except as an educational and distinguishing tool against the semantic embeddings of old scholastic monks.

Finally, the poetry aspect resolves in value theories of the world. Many are compatible with descriptive theories, and our resolution of them is through opinion, reason, communications, and, yes, violence and war. There is no monopoly of policy theories, religious claims, or idealizations that hold sway. Instead we have interests and collective movements, and the above, all working together to define our moral frontiers.


Local Minima and Coatimundi

CoatimundiEven given the basic conundrum of how deep learning neural networks might cope with temporal presentations or linear sequences, there is another oddity to deep learning that only seems obvious in hindsight. One of the main enhancements to traditional artificial neural networks is a phase of supervised pre-training that forces each layer to try to create a generative model of the input pattern. The deep learning networks then learn a discriminant model after the initial pre-training is done, focusing on the error relative to classification versus simply recognizing the phrase or image per se.

Why this makes a difference has been the subject of some investigation. In general, there is an interplay between the smoothness of the error function and the ability of the optimization algorithms to cope with local minima. Visualize it this way: for any machine learning problem that needs to be solved, there are answers and better answers. Take visual classification. If the system (or you) gets shown an image of a coatimundi and a label that says coatimundi (heh, I’m running in New Mexico right now…), learning that image-label association involves adjusting weights assigned to different pixels in the presentation image down through multiple layers of the network that provide increasing abstractions about the features that define a coatimundi. And, importantly, that define a coatimundi versus all the other animals and non-animals.,

These weight choices define an error function that is the optimization target for the network as a whole, and this error function can have many local minima. That is, by enhancing the weights supporting a coati versus a dog or a raccoon, the algorithm inadvertently leans towards a non-optimal assignment for all of them by focusing instead on a balance between them that is predestined by the previous dog and raccoon classifications (or, in general, the order of presentation).

Improvements require “escaping” these local optima in favor of a global solution that accords the best overall outcome to all the animals and a minimization of the global error. And pre-training seems to do that. It likely moves each discriminative category closer to the global possibilities because those global possibilities are initially encoded by the pre-training phase.

This has the added benefit of regularizing or smoothing out the noise that is inherent in any real data set. Indeed, the two approaches appear to be closely allied in their impact on the overall machine learning process.

Dates, Numbers, and Canadian Makings

Against SuperheroesOn the 21st of June, 1997, which was the solstice, my wife and I married. We celebrated that date again today, but it is not the solstice again due to astronomical drift around the calendar. And I also crossed the border of 100,000 words on Against Superheroes, moving towards resolution of a novel that could, conceivably, have no ending. There are always more mythologies to be explored.

Just last week I was in Banff, Canada, sitting quietly with my bear spray and a little titanium cook pot. I didn’t have to deploy the mace, and was relieved I also didn’t have to endure twelve hours of wolf stalking like this Canadian woman.

And while I was north of the US border, I learned that a Canadian animated film I was involved with was released to Amazon Prime video. I am just an Executive Producer of the film, which means that I had no creative input, but I am really pleased with the film. Ironically, I couldn’t watch this Canadian product while in Canada, just an hour from the studio that produced it. But rest assured that Christmas will be saved in the end!

New Behaviorism and New Cognitivism

lstm_memorycellDeep Learning now dominates discussions of intelligent systems in Silicon Valley. Jeff Dean’s discussion of its role in the Alphabet product lines and initiatives shows the dominance of the methodology. Pushing the limits of what Artificial Neural Networks have been able to do has been driven by certain algorithmic enhancements and the ability to process weight training algorithms at much higher speeds and over much larger data sets. Google even developed specialized hardware to assist.

Broadly, though, we see mostly pattern recognition problems like image classification and automatic speech recognition being impacted by these advances. Natural language parsing has also recently had some improvements from Fernando Pereira’s team. The incremental improvements using these methods should not be minimized but, at the same time, the methods don’t emulate key aspects of what we observe in human cognition. For instance, the networks train incrementally and lack the kinds of rapid transitions that we observe in human learning and thinking.

In a strong sense, the models that Deep Learning uses can be considered Behaviorist in that they rely almost exclusively on feature presentation with a reward signal. The internal details of how modularity or specialization arise within the network layers are interesting but secondary to the broad use of back-propagation or Gibb’s sampling combined with autoencoding. This is a critique that goes back to the early days of connectionism, of course, and why it was somewhat sidelined after an initial heyday in the late eighties. Then came statistical NLP, then came hybrid methods, then a resurgence of corpus methods, all the while with image processing getting more and more into the hand-crafted modular space.

But we can see some interesting developments that start to stir more Cognitivism into this stew. Recurrent Neural Networks provided interesting temporal behavior that might be lacking in some feedforward NNs, and Long-Short-Term Memory (LSTM) NNs help to overcome some specific limitations of  recurrent NNs like the disconnection between temporally-distant signals and the reward patterns.

Still, the modularity and rapid learning transitions elude us. While these methods are enhancing the ability to learn the contexts around specific events (and even the unique variability of contexts), that learning still requires many exposures to get right. We might consider our language or vision modules to be learned over evolutionary history and so not expect learning within a lifetime from scratch to result in similarly structured modules, but the differences remain not merely quantitative but significantly qualitative. A New Cognitivism requires more work to rise from this New Behaviorism.

Evolving Visions of Chaotic Futures

FlutterbysMost artificial intelligence researchers think unlikely the notion that a robot apocalypse or some kind of technological singularity is coming anytime soon. I’ve said as much, too. Guessing about the likelihood of distant futures is fraught with uncertainty; current trends are almost impossible to extrapolate.

But if we must, what are the best ways for guessing about the future? In the late 1950s the Delphi method was developed. Get a group of experts on a given topic and have them answer questions anonymously. Then iteratively publish back the group results and ask for feedback and revisions. Similar methods have been developed for face-to-face group decision making, like Kevin O’Connor’s approach to generating ideas in The Map of Innovation: generate ideas and give participants votes equaling a third of the number of unique ideas. Keep iterating until there is a consensus. More broadly, such methods are called “nominal group techniques.”

Most recently, the notion of prediction markets has been applied to internal and external decision making. In prediction markets,  a similar voting strategy is used but based on either fake or real money, forcing participants towards a risk-averse allocation of assets.

Interestingly, we know that optimal inference based on past experience can be codified using algorithmic information theory, but the fundamental problem with any kind of probabilistic argument is that much change that we observe in society is non-linear with respect to its underlying drivers and that the signals needed are imperfect. As the mildly misanthropic Nassim Taleb pointed out in The Black Swan, the only place where prediction takes on smooth statistical regularity is in Las Vegas, which is why one shouldn’t bother to gamble. Taleb’s approach is to look instead at minimizing the impact of shocks (or hedging them in financial markets).

But maybe we can learn something from philosophical circles. For instance, Evolutionary Epistemology (EE), as formulated by Donald Campbell, Sir Karl Popper, and others, posits that central to knowledge formation is blind variation and selective retention. Combined with optimal induction, this leads to random processes being injected into any kind of predictive optimization. We do this in evolutionary algorithms like Genetic Algorithms, Evolutionary Programming, Genetic Programming, and Evolutionary Strategies, as well as in related approaches like Simulated Annealing. But EE also suggests that there are several levels of learning by variation/retention, from the phylogenetic learning of species through to the mental processes of higher organisms. We speculate and trial-and-error continuously, repeating loops of what-ifs in our minds in an effort to optimize our responses in the future. It’s confounding as hell but we do remarkable things that machines can’t yet do like folding towels or learning to bake bread.

This noosgeny-recapitulates-ontogeny-recapitulates-phylogeny (just made that up) can be exploited in a variety of ways for abductive inference about the future. We can, for instance, use evolutionary optimization with a penalty for complexity that simulates the informational trade-off of AIT-style inductive optimality. Further, the noosgeny component (by which I mean the internalized mental trial-and-error) can reduce phylogenetic waste in simulations by providing speculative modeling that retains the “parental” position on the fitness landscape before committing to a next generation of potential solutions, allowing for further probing of complex adaptive landscapes.