# The Goldilocks Complexity Zone

Since my time in the early 90s at Santa Fe Institute, I’ve been fascinated by the informational physics of complex systems. What are the requirements of an abstract system that is capable of complex behavior? How do our intuitions about complex behavior or form match up with mathematical approaches to describing complexity? For instance, we might consider a snowflake complex, but it is also regular in it’s structure, driven by an interaction between crystal growth and the surrounding air. The classic examples of coastlines and fractal self-symmetry also seem complex but are not capable of complex behavior.

So what is a good way of thinking about complexity? There is actually a good range of ideas about how to characterize complexity. Seth Lloyd rounds up many of them, here. The intuition that drives many of them is that complexity seems to be associated with distributions of relationships and objects that are somehow juxtapositioned between a single state and a uniformly random set of states. Complex things, be they living organisms or computers running algorithms, should exist in a Goldilocks zone when each part is examined and those parts are somehow summed up to a single measure.

We can easily construct a complexity measure that captures some of these intuitions. Let’s look at three strings of characters:

x = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

y = menlqphsfyjubaoitwzrvcgxdkbwohqyxplerz

z = the fox met the hare and the fox saw the hare

Now we would likely all agree that y and z are more complex than x, and I suspect most would agree that y looks like gibberish compared with z. Of course, y could be a sequence of weirdly coded measurements or something, or encrypted such that the message appears random. Let’s ignore those possibilities for our initial attempt at defining a complexity measure. We can see right away that an approach using basic information theory doesn’t help much. Algorithmic informational complexity will be highest for y, as will entropy:

$H(s)=-\sum_{i=0}^{i=|s|}P(s_i)log_2P(s_i)$

for each sequence composed out of an alphabet with counts, s. So we get: H(x) = 0, H(y) = 3.199809, and H(z) = 2.3281. Here’s some sample R code using the “entropy” package if you want to calculate yourself:

> z = "the fox met the hare and the fox saw the hare"
> zt = table(strsplit(z, '')[[1]])
> entropy(zt, method="ML")



Note that the alphabet of each string is slightly different, but the missing characters between them don’t matter since their probabilities are 0.

We can just arbitrarily scale entropy by the maximum entropy possible for the same length string like this:

$H_m(s)=-\frac{-\sum_{i=0}^{i=|s|}P(s_i)logP(s_i)}{-\sum_{i=0}^{i=|s|}\frac{1}{|s|}log(\frac{1}{|s|})}$

$H_m(s)=\frac{\sum_{i=0}^{i=|s|}P(s_i)logP(s_i)}{log(\frac{1}{|s|})}$

This is somewhat like channel efficiency in communications theory, I think. And then just turn this into a parabolically-scaled measure that centers at 0.5:

$C(s)=\frac{1}{(1/2-H_m(s))^2+\epsilon}$

where $\epsilon$ is an arbitrary non-zero scaling parameter.

But this calculation is only considering the individual character frequencies, not the composition of the characters into groupings. So we can consider pairs of characters in this same calculation, or triples, etc. And also, just looking at these n-gram sequences doesn’t capture potentially longer range repetitious structures. So we can gradually ladle on grammars as the counting mechanism. Now, if our measure of complexity is really going to capture what we intuitively consider to be complex, all of these different levels of connections within the string or other organized piece of information must be present.

This general program is present in every one of Seth Lloyd’s complexity metrics in various ways and even comes into play in discussions of consciousness, though many use mutual information rather than entropy per se. Here’s Max Tegmark using a variation on Giulio Tinoni’s Phi concept from Integrated Information Theory to demonstrate that integration is a key component of consciousness and how that might be calculated for general physical systems.

# Entanglement and Information

Research can flow into interesting little eddies that cohere into larger circulations that become transformative phase shifts. That happened to me this morning between a morning drive in the Northern California hills and departing for lunch at one of our favorite restaurants in Danville.

The topic I’ve been working on since my retirement is whether there are preferential representations for optimal automated inference methods. We have this grab-bag of machine learning techniques that use differing data structures but that all implement some variation on fitting functions to data exemplars; at the most general they all look like some kind of gradient descent on an error surface. Getting the right mix of parameters, nodes, etc. falls to some kind of statistical regularization or bottlenecking for the algorithms. Or maybe you perform a grid search in the hyperparameter space, narrowing down the right mix. Or you can throw up your hands and try to evolve your way to a solution, suspecting that there may be local optima that are distracting the algorithms from global success.

Yet, algorithmic information theory (AIT) gives us, via Solomonoff, a framework for balancing parameterization of an inference algorithm against the error rate on the training set. But, first, it’s all uncomputable and, second, the AIT framework just uses strings of binary as the coded Turing machines, so I would have to flip 2^N bits and test each representation to get anywhere with the theory. Yet, I and many others have had incremental success at using variations on this framework, whether via Minimum Description Length (MDL) principles, it’s first cousin Minimum Message Length (MML), and other statistical regularization approaches that are somewhat proxies for these techniques. But we almost always choose a model (ANNs, compression lexicons, etc.) and then optimize the parameters around that framework. Can we do better? Is there a preferential model for time series versus static data? How about for discrete versus continuous?

So while researching model selection in this framework, I come upon a mention of Shannon’s information theory and its application to quantum decoherence. Of course I had to investigate. And here is the most interesting thing I’ve seen in months from the always interesting Max Tegmark at MIT:

Particles entangle and then quantum decoherence causes them to shed entropy into one another during interaction. But, most interesting, is the quantum Bayes’ theory section around 00:35:00 where Shannon entropy as a classical measure of improbability gets applied to the quantum indeterminacy through this decoherence process.

I’m pretty sure it sheds no particular light on the problem of model selection but when cosmology and machine learning issues converge it gives me mild shivers of joy.

# A Soliloquy for Volcanoes and Nearest Neighbors

A German kid caught me talking to myself yesterday. It was my fault, really. I was trying to break a hypnotic trance-like repetition of exactly what I was going to say to the tramper’s hut warden about two hours away. OK, more specifically, I had left the Waihohonu camp site in Tongariro National Park at 7:30AM and was planning to walk out that day. To put this into perspective, it’s 28.8 km (17.9 miles) with elevation changes of around 900m, including a ridiculous final assault above red crater at something like 60 degrees along a stinking volcanic ridge line. And, to make things extra lovely, there was hail, then snow, then torrential downpours punctuated by hail again—a lovely tramp in the New Zealand summer—all in a full pack.

But anyway, enough bragging about my questionable judgement. I was driven by thoughts of a hot shower and the duck l’orange at Chateau Tongariro while my hands numbed to unfeeling arresting myself with trekking poles down through muddy canyons. I was talking to myself. I was trying to stop repeating to myself why I didn’t want my campsite for the night that I had reserved. This is the opposite of glorious runner’s high. This is when all the extra blood from one’s brain is obsessed with either making leg muscles go or watching how the feet will fall. I also had the hood of my rain fly up over my little Marmot ball cap. I was in full regalia, too, with the shifting rub of my Gortex rain pants a constant presence throughout the day.  I didn’t notice him easing up on me as I carried on about one-shot learning as some kind of trance-breaking ritual.

We exchanged pleasantries and he meandered on. With his tiny little day pack it was clear he had just come up from the car park at Mangatepopo for a little jaunt. Eurowimp. I caught up with him later slathering some kind of meat product on white bread trailside and pushed by, waiting on my own lunch of jerky, chili-tuna, crackers, and glorious spring water, gulp after gulp, an hour onward. He didn’t bring up the glossolalic soliloquy incident.

My mantra was simple: artificial neural networks, including deep learning approaches, require massive learning cycles and huge numbers of exemplars to learn. In a classic test, scores of handwritten digit images (0 to 9) are categorized as to which number they are. Deep learning systems have gotten to 99% accuracy on that problem, actually besting average human performance. Yet they require a huge training corpus to pull this off, combined with many CPU hours to optimize the models on that corpus. We humans can do much better than that with our neural systems.

So we get this recently lauded effort, One-Shot Learning of Visual Concepts, that uses an extremely complicated Bayesian mixture modeling approach that combines stroke exemplars together for trying to classify foreign and never-before-seen characters (like Bengali or Ethiopic) after only one exposure to the stimulus. In other words, if I show you some weird character with some curves and arcs and a vertical bar in it, you can find similar ones in a test set quite handily, but machines really can’t. A deep learning model could be trained on every possible example known in a long, laborious process, but when exposed to a new script like Amharic or a Cherokee syllabary, the generalizations break down. A simple comparison approach is to use a nearest neighbor match or vote. That is, simply create vectors of the image pixels starting at the top left and compare the distance between the new image vector and the example using an inner vector product. Similar things look the same and have similar pixel patterns, right? Well, except they are rotated. They are shifted. They are enlarged and shrunken.

And then it hit me that the crazy-complex stroke model could be simplified quite radically by simply building a similar collection of stroke primitives as splines and then looking at the K nearest neighbors in the stroke space. So a T is two strokes drawn from the primitives collection with a central junction and the horizontal laying atop the vertical. This builds on the stroke-based intuition of the paper’s authors (basically, all written scripts have strokes as a central feature and we as writers and readers understand the line-ness of them from experience with our own script).

I may have to try this out. I should note, also in critique of this antithesis of runner’s high (tramping doldrums?), that I was also deeply concerned that there were so many damn contending voices and thoughts racing around my head in the face of such incredible scenery. Why did I feel the need to distract my mind from it’s obsessions over something so humanly trivial? At least, I suppose, the distraction was interesting enough that it was worth the effort.

# The Retiring Mind, Part 1: Clouds

I’m setting my LinkedIn and Facebook status to retired on 11/30 (a month later than planned, alas). Retired isn’t completely accurate since I will be in the earliest stage of a new startup in cognitive computing, but I want to bask ever-so-briefly in the sense that I am retired, disconnected from the circuits of organizations, and able to do absolutely nothing from day-to-day if I so desire.

(I’ve spent some serious recent cycles trying to combine Samuel Barber’s “Adagio for Strings” as an intro to the Grateful Dead’s “Terrapin Station”…on my Line6 Variax. Modulate B-flat to C, then D, then E. If there is anything more engaging for a retiring mind, I can’t think of it.)

I recently pulled the original kitenga.com server off a shelf in my garage because I had a random Kindle Digital Publisher account that I couldn’t find the credentials for and, in a new millennium catch-22, I couldn’t ask for a password reset because it had to go to that old email address. I swapped hard drives between a few Linux pizza-box servers and messed around with old BIOS and boot settings, and was finally able to get the full mail archive off the drive. In the process I had to rediscover all the arcane bits of Dovecot and mail.rc and SMTP configurations, and a host of other complexities. After not finding what I needed there, alas, I compressed the mail collection and put it on Dropbox.

I also retired a Mac Mini, shipping it off to a buy-back place for a few hundred bucks in Amazon credit. It had been a Subversion server that followed-up for kitenga.com, holding more than ten years of intellectual property in stasis. And I mean everything: business records, PowerPoints, source code, release packages, artwork, manuscripts, music. The archives were recorded to a USB drive and then tarred and dropped into Dropbox. A few of the more personal archive collections were transformed into Git repositories and stored on a OneDrive account.

And the new startup will exclusively use Microsoft Office 365 for email, calendaring, and productivity (yes, I tried Google Docs). Yammer will help with internal knowledge management. Atlassian’s Confluence, JIRA, and Bitbucket will support code development. Lync and Skype are collaboration tools. Products will launch in Amazon EC2 instances. Financials, HR, and talent acquisition will go to WorkDay. Then we have LegalZoom for legalities, USPTO.gov for trademarks and patents, GoDaddy for domain registration, iPage for WordPress hosting, and so on. And the absolutely critical 1Password for keeping all these credentials straight across dozens of web properties and online systems (I have 178 separate logins stored in 1Password!), with the 1Password archive encrypted in Dropbox and accessible from phones or laptops as needed.

What an incredible change. In just a few years we have erased or reduced to a trickle the frictional costs of doing a modern software business. Even travel has become easier with TripIt Pro. I just forward any itinerary I get from any airline or online booking service and it gets incorporated into a master itinerary. I check in for flights online and the boarding passes appear on my Apple Watch for scanning. I’m taking off for two weeks of backpacking and trail running in New Zealand as some kind of psychological commitment to the concept of retirement so travel optimization is weighing on me right now.

Cord-cutting for cable and landline (except broadband) is coming soon. Television is bad enough that surfing it should not be an option. Also, one of the interesting consequences of cloud everything (including installed software assets in the Apple App Store, Steam, music, movies, etc.) is that the compute platform can be swapped as needed. I keep disposing of compute platforms and I’m now down to just an iPhone 6 and 2015 Macbook with a curved 34” LG 4K display. The Macbook might get swapped for a next-gen Air within a year, or something else (I’ve tried every gen of iPads and also forced myself to live with a Microsoft Surface Pro 3, but ended up selling each because of non-use). If and when I swap platforms, it just takes a day or so to get everything synced up and working again.

The flexibility of the operations back-end of this new startup world demonstrates an odd fact about Silicon Valley: we are getting close to being able to turn ideas directly into tangible products with little or no capital investment. Our OPEXs become predictable and manageable (\$12/month per user, for instance). We have no CAPEX. With Obamacare even the mind-numbing opaqueness of the health insurance market breaks open for independent contractors and contributors.

It’s feeling very warm and comfortable in the clouds.

# Neutered Inventiveness

I just received an award from my employer for getting more than five patents through the patent committee this year. Since I’m a member of the committee, it was easy enough. Just kidding: I was not, of course, allowed to vote on my own patents. The award I received leaves a bit to be desired, however. First, I have to say that it is a well-crafted glass block about 4″ x 3″ and has the kind of heft to it that would make it invaluable as a weapon in a game of Clue. That being said, I give you Exhibits 1 and 2:

Exhibit 1 is a cell-phone snap through the glass surface of my award at Leonardo da Vinci’s famous Vitruvian Man, so named because it was a tribute to the architect Vitruvius—or so Wikipedia tells me. Exhibit 2 is an image of the original sketch by da Vinci, also borrowed from Wikipedia.

And now, with only minimal scrutiny, my dear reader can see the fundamental problem in the borrowing and translation of old Vitruvius. While Vitruvius was deeply enamored of a sense of symmetry to the human body, and da Vinci took that sense of wonder as a basis for drawing his figure, we can rightly believe that the presence of all anatomical parts of the man was regarded as essential for the accurate portrayal of man’s elaborate architecture.

My inventions now seem somehow neutered and my sense of wonder castrated by this lesser man, no matter what the intent of the good people in charge of the production of the award. I reflect on their motivations in light of recent arguments concerning the proper role of the humanities in our modern lives. I have consulted with my wife, an expert on a range of obscure matters concerning art history, mythology, pagan traditions, and other scholarly things that enrich our lives but are sometimes hard to assign tangible value. She insists that penises should never be removed—nor inserted—just to make a point.

Further reflection suggests that the very choice of Vitruvian Man really wasn’t a very good one. How about this?

Shaft intact and all, it represents inventiveness far better than old Vitruvius’ meditations on the architecture of the body and the world.

# Magic in the Age of Unicorns

Ah, Sili Valley, my favorite place in the world but also a place (or maybe a state of mind) that has the odd quality of being increasingly revered for abstractions that bear only cursory similarities to reality. Isn’t that always the way of things? Here’s The Guardian analyzing startup culture. The picture in the article is especially amusing to me since my first startup (freshly spun out of XeroX PARC) was housed on Jay street just across 101 from Intel’s Santa Clara campus (just to the right in the picture). In the evening, as traffic jammed up on the freeway, I watched a hawk hunt in the cloverleaf interchange of the Great American Parkway/101 intersection. It was both picturesque and unrelenting in its cruelty. And then, many years later, I would pitch in the executive center of the tall building alongside Revolution Analytics (now gone to Microsoft).

Everything changes so fast, then changes again. If it is a bubble, it is a more beautiful bubble than before, where it isn’t enough to just stand up a website, but there must be unusual change and disruption. Even the unicorns must pop those bubbles.

I will note that I am returning to the startup world in a few weeks. Startup next will, I promise, change everything!

# The IQ of Machines

Perhaps idiosyncratic to some is my focus in the previous post on the theoretical background to machine learning that derives predominantly from algorithmic information theory and, in particular, Solomonoff’s theory of induction. I do note that there are other theories that can be brought to bear, including Vapnik’s Structural Risk Minimization and Valiant’s PAC-learning theory. Moreover, perceptrons and vector quantization methods and so forth derive from completely separate principals that can then be cast into more fundamental problems in informational geometry and physics.

Artificial General Intelligence (AGI) is then perhaps the hard problem on the horizon that I disclaim as having had significant progress in the past twenty years of so. That is not to say that I am not an enthusiastic student of the topic and field, just that I don’t see risk levels from intelligent AIs rising to what we should consider a real threat. This topic of how to grade threats deserves deeper treatment, of course, and is at the heart of everything from so-called “nanny state” interventions in food and product safety to how to construct policy around global warming. Luckily–and unlike both those topics–killer AIs don’t threaten us at all quite yet.

But what about simply characterizing what AGIs might look like and how we can even tell when they arise? Mildly interesting is Simon Legg and Joel Veness’ idea of an Artificial Intelligence Quotient or AIQ that they expand on in An Approximation of the Universal Intelligence Measure. This measure is derived from, voilà, exactly the kind of algorithmic information theory (AIT) and compression arguments that I lead with in the slide deck. Is this the only theory around for AGI? Pretty much, but different perspectives tend to lead to slightly different focuses. For instance, there is little need to discuss AIT when dealing with Deep Learning Neural Networks. We just instead discuss statistical regularization and bottlenecking, which can be thought of as proxies for model compression.

So how can intelligent machines be characterized by something like AIQ? Well, the conclusion won’t be surprising. Intelligent machines are those machines that function well in terms of achieving goals over a highly varied collection of environments. This allows for tractable mathematical treatments insofar as the complexity of the landscapes can be characterized, but doesn’t really give us a good handle on what the particular machines might look like. They can still be neural networks or support vector machines, or maybe even something simpler, and through some selection and optimization process have the best performance over a complex topology of reward-driven goal states.

So still no reason to panic, but some interesting ideas that shed greater light on the still mysterious idea of intelligence and the human condition.

# Machine Learning and the Coming Robot Apocalypse

Slides from a talk I gave today on current advances in machine learning are available in PDF, below. The agenda is pretty straightforward: starting with some theory about overfitting based on algorithmic information theory, we proceed on through a taxonomy of ML types (not exhaustive), then dip into ensemble learning and deep learning approaches. An analysis of the difficulty and types of performance we get from various algorithms and problems is presented. We end with a discussion of whether we should be frightened about the progress we see around us.

Note: click on the gray square if you don’t see the embedded PDF…browsers vary.

# Intelligence Augmentation and a Frictionless Economy

The ever-present Tom Davenport weighs in in the Harvard Business Review on the topic of artificial intelligence (AI) and its impact on knowledge workers of the future. The theme is intelligence augmentation (IA) where knowledge workers improve their productivity and create new business opportunities using technology. And those new opportunities don’t displace others, per se, but introduce new efficiencies. This was also captured in the New York Times in a round-up of the role of talent and service marketplaces that reduce the costs of acquiring skills and services, creating more efficient and disintermediating sources of friction in economic interactions.

I’ve noticed the proliferation of services for connecting home improvement contractors to customers lately, and have benefited from them in several renovation/construction projects I have ongoing. Meanwhile, Amazon Prime has absorbed an increasingly large portion of our shopping, even cutting out Whole Foods runs, with often next day deliveries. Between pricing transparency and removing barriers (delivery costs, long delays, searching for reliable contractors), the economic impacts might be large enough to be considered a revolution, though perhaps a consumer revolution rather than a worker productivity one.

Here’s the concluding paragraph from an IEEE article I just wrote that will appear in the San Francisco Chronicle in the near future:

One of the most interesting risks also carries with it the potential for enhanced reward. Don’t they always? That is, some economists see economic productivity largely stabilizing if not stagnating.  Industrial revolutions driven by steam engines, electrification, telephony, and even connected computing led to radical reshaping our economy in the past and leaps in the productivity of workers, but there is no clear candidate for those kinds of changes in the near future. Big data feeding into more intelligent systems may be the driver for the next economic wave, though revolutions are always messier than anyone expected.

But maybe it will be simpler and less messy than I imagine, just intelligence augmentation helping with our daily engagement with a frictionless economy.