Category: AI

The Retiring Mind, Part 1: Clouds

goghcloudsI’m setting my LinkedIn and Facebook status to retired on 11/30 (a month later than planned, alas). Retired isn’t completely accurate since I will be in the earliest stage of a new startup in cognitive computing, but I want to bask ever-so-briefly in the sense that I am retired, disconnected from the circuits of organizations, and able to do absolutely nothing from day-to-day if I so desire.

(I’ve spent some serious recent cycles trying to combine Samuel Barber’s “Adagio for Strings” as an intro to the Grateful Dead’s “Terrapin Station”…on my Line6 Variax. Modulate B-flat to C, then D, then E. If there is anything more engaging for a retiring mind, I can’t think of it.)

I recently pulled the original server off a shelf in my garage because I had a random Kindle Digital Publisher account that I couldn’t find the credentials for and, in a new millennium catch-22, I couldn’t ask for a password reset because it had to go to that old email address. I swapped hard drives between a few Linux pizza-box servers and messed around with old BIOS and boot settings, and was finally able to get the full mail archive off the drive. In the process I had to rediscover all the arcane bits of Dovecot and mail.rc and SMTP configurations, and a host of other complexities. After not finding what I needed there, alas, I compressed the mail collection and put it on Dropbox.

I also retired a Mac Mini, shipping it off to a buy-back place for a few hundred bucks in Amazon credit. It had been a Subversion server that followed-up for, holding more than ten years of intellectual property in stasis. And I mean everything: business records, PowerPoints, source code, release packages, artwork, manuscripts, music. The archives were recorded to a USB drive and then tarred and dropped into Dropbox. A few of the more personal archive collections were transformed into Git repositories and stored on a OneDrive account.

And the new startup will exclusively use Microsoft Office 365 for email, calendaring, and productivity (yes, I tried Google Docs). Yammer will help with internal knowledge management. Atlassian’s Confluence, JIRA, and Bitbucket will support code development. Lync and Skype are collaboration tools. Products will launch in Amazon EC2 instances. Financials, HR, and talent acquisition will go to WorkDay. Then we have LegalZoom for legalities, for trademarks and patents, GoDaddy for domain registration, iPage for WordPress hosting, and so on. And the absolutely critical 1Password for keeping all these credentials straight across dozens of web properties and online systems (I have 178 separate logins stored in 1Password!), with the 1Password archive encrypted in Dropbox and accessible from phones or laptops as needed.

What an incredible change. In just a few years we have erased or reduced to a trickle the frictional costs of doing a modern software business. Even travel has become easier with TripIt Pro. I just forward any itinerary I get from any airline or online booking service and it gets incorporated into a master itinerary. I check in for flights online and the boarding passes appear on my Apple Watch for scanning. I’m taking off for two weeks of backpacking and trail running in New Zealand as some kind of psychological commitment to the concept of retirement so travel optimization is weighing on me right now.

Cord-cutting for cable and landline (except broadband) is coming soon. Television is bad enough that surfing it should not be an option. Also, one of the interesting consequences of cloud everything (including installed software assets in the Apple App Store, Steam, music, movies, etc.) is that the compute platform can be swapped as needed. I keep disposing of compute platforms and I’m now down to just an iPhone 6 and 2015 Macbook with a curved 34” LG 4K display. The Macbook might get swapped for a next-gen Air within a year, or something else (I’ve tried every gen of iPads and also forced myself to live with a Microsoft Surface Pro 3, but ended up selling each because of non-use). If and when I swap platforms, it just takes a day or so to get everything synced up and working again.

The flexibility of the operations back-end of this new startup world demonstrates an odd fact about Silicon Valley: we are getting close to being able to turn ideas directly into tangible products with little or no capital investment. Our OPEXs become predictable and manageable ($12/month per user, for instance). We have no CAPEX. With Obamacare even the mind-numbing opaqueness of the health insurance market breaks open for independent contractors and contributors.

It’s feeling very warm and comfortable in the clouds.

The IQ of Machines

standard-dudePerhaps idiosyncratic to some is my focus in the previous post on the theoretical background to machine learning that derives predominantly from algorithmic information theory and, in particular, Solomonoff’s theory of induction. I do note that there are other theories that can be brought to bear, including Vapnik’s Structural Risk Minimization and Valiant’s PAC-learning theory. Moreover, perceptrons and vector quantization methods and so forth derive from completely separate principals that can then be cast into more fundamental problems in informational geometry and physics.

Artificial General Intelligence (AGI) is then perhaps the hard problem on the horizon that I disclaim as having had significant progress in the past twenty years of so. That is not to say that I am not an enthusiastic student of the topic and field, just that I don’t see risk levels from intelligent AIs rising to what we should consider a real threat. This topic of how to grade threats deserves deeper treatment, of course, and is at the heart of everything from so-called “nanny state” interventions in food and product safety to how to construct policy around global warming. Luckily–and unlike both those topics–killer AIs don’t threaten us at all quite yet.

But what about simply characterizing what AGIs might look like and how we can even tell when they arise? Mildly interesting is Simon Legg and Joel Veness’ idea of an Artificial Intelligence Quotient or AIQ that they expand on in An Approximation of the Universal Intelligence Measure. This measure is derived from, voilà, exactly the kind of algorithmic information theory (AIT) and compression arguments that I lead with in the slide deck. Is this the only theory around for AGI? Pretty much, but different perspectives tend to lead to slightly different focuses. For instance, there is little need to discuss AIT when dealing with Deep Learning Neural Networks. We just instead discuss statistical regularization and bottlenecking, which can be thought of as proxies for model compression.

So how can intelligent machines be characterized by something like AIQ? Well, the conclusion won’t be surprising. Intelligent machines are those machines that function well in terms of achieving goals over a highly varied collection of environments. This allows for tractable mathematical treatments insofar as the complexity of the landscapes can be characterized, but doesn’t really give us a good handle on what the particular machines might look like. They can still be neural networks or support vector machines, or maybe even something simpler, and through some selection and optimization process have the best performance over a complex topology of reward-driven goal states.

So still no reason to panic, but some interesting ideas that shed greater light on the still mysterious idea of intelligence and the human condition.

Machine Learning and the Coming Robot Apocalypse

Daliesque creepy dogsSlides from a talk I gave today on current advances in machine learning are available in PDF, below. The agenda is pretty straightforward: starting with some theory about overfitting based on algorithmic information theory, we proceed on through a taxonomy of ML types (not exhaustive), then dip into ensemble learning and deep learning approaches. An analysis of the difficulty and types of performance we get from various algorithms and problems is presented. We end with a discussion of whether we should be frightened about the progress we see around us.

Note: click on the gray square if you don’t see the embedded PDF…browsers vary.

Download the PDF file .

Intelligence Augmentation and a Frictionless Economy

Speed SkatingThe ever-present Tom Davenport weighs in in the Harvard Business Review on the topic of artificial intelligence (AI) and its impact on knowledge workers of the future. The theme is intelligence augmentation (IA) where knowledge workers improve their productivity and create new business opportunities using technology. And those new opportunities don’t displace others, per se, but introduce new efficiencies. This was also captured in the New York Times in a round-up of the role of talent and service marketplaces that reduce the costs of acquiring skills and services, creating more efficient and disintermediating sources of friction in economic interactions.

I’ve noticed the proliferation of services for connecting home improvement contractors to customers lately, and have benefited from them in several renovation/construction projects I have ongoing. Meanwhile, Amazon Prime has absorbed an increasingly large portion of our shopping, even cutting out Whole Foods runs, with often next day deliveries. Between pricing transparency and removing barriers (delivery costs, long delays, searching for reliable contractors), the economic impacts might be large enough to be considered a revolution, though perhaps a consumer revolution rather than a worker productivity one.

Here’s the concluding paragraph from an IEEE article I just wrote that will appear in the San Francisco Chronicle in the near future:

One of the most interesting risks also carries with it the potential for enhanced reward. Don’t they always? That is, some economists see economic productivity largely stabilizing if not stagnating.  Industrial revolutions driven by steam engines, electrification, telephony, and even connected computing led to radical reshaping our economy in the past and leaps in the productivity of workers, but there is no clear candidate for those kinds of changes in the near future. Big data feeding into more intelligent systems may be the driver for the next economic wave, though revolutions are always messier than anyone expected.

But maybe it will be simpler and less messy than I imagine, just intelligence augmentation helping with our daily engagement with a frictionless economy.

Evolutionary Optimization and Environmental Coupling

Red QueensCarl Schulman and Nick Bostrom argue about anthropic principles in “How Hard is Artificial Intelligence? Evolutionary Arguments and Selection Effects” (Journal of Consciousness Studies, 2012, 19:7-8), focusing on specific models for how the assumption of human-level intelligence should be easy to automate are built upon a foundation of assumptions of what easy means because of observational bias (we assume we are intelligent, so the observation of intelligence seems likely).

Yet the analysis of this presumption is blocked by a prior consideration: given that we are intelligent, we should be able to achieve artificial, simulated intelligence. If this is not, in fact, true, then the utility of determining whether the assumption of our own intelligence being highly probable is warranted becomes irrelevant because we may not be able to demonstrate that artificial intelligence is achievable anyway. About this, the authors are dismissive concerning any requirement for simulating the environment that is a prerequisite for organismal and species optimization against that environment:

In the limiting case, if complete microphysical accuracy were insisted upon, the computational requirements would balloon to utterly infeasible proportions. However, such extreme pessimism seems unlikely to be well founded; it seems unlikely that the best environment for evolving intelligence is one that mimics nature as closely as possible. It is, on the contrary, plausible that it would be more efficient to use an artificial selection environment, one quite unlike that of our ancestors, an environment specifically designed to promote adaptations that increase the type of intelligence we are seeking to evolve (say, abstract reasoning and general problem-solving skills as opposed to maximally fast instinctual reactions or a highly optimized visual system).

Why is this “unlikely”? The argument is that there are classes of mental function that can be compartmentalized away from the broader, known evolutionary provocateurs. For instance, the Red Queen argument concerning sexual optimization in the face of significant parasitism is dismissed as merely a distraction to real intelligence:

And as mentioned above, evolution scatters much of its selection power on traits that are unrelated to intelligence, such as Red Queen’s races of co-evolution between immune systems and parasites. Evolution will continue to waste resources producing mutations that have been reliably lethal, and will fail to make use of statistical similarities in the effects of different mutations. All these represent inefficiencies in natural selection (when viewed as a means of evolving intelligence) that it would be relatively easy for a human engineer to avoid while using evolutionary algorithms to develop intelligent software.

Inefficiencies? Really? We know that sexual dimorphism and competition are essential to the evolution of advanced species. Even the growth of brain size and creative capabilities are likely tied to sexual competition, so why should we think that they can be uncoupled? Instead, we are left with a blocker to the core argument that states instead that simulated evolution may, in fact, not be capable of producing sufficient complexity to produce intelligence as we know it without, in turn, a sufficiently complex simulated fitness function to evolve against. Observational effects, aside, if we don’t get this right, we need not worry about the problem of whether there are 10 or ten billion planets suitable for life out there.

Active Deep Learning

BrainDeep Learning methods that use auto-associative neural networks to pre-train (with bottlenecking methods to ensure generalization) have recently been shown to perform as well and even better than human beings at certain tasks like image categorization. But what is missing from the proposed methods? There seem to be a range of challenges that revolve around temporal novelty and sequential activation/classification problems like those that occur in natural language understanding. The most recent achievements are more oriented around relatively static data presentations.

Jürgen Schmidhuber revisits the history of connectionist research (dating to the 1800s!) in his October 2014 technical report, Deep Learning in Neural Networks: An Overview. This is one comprehensive effort at documenting the history of this reinvigorated area of AI research. What is old is new again, enhanced by achievements in computing that allow for larger and larger scale simulation.

The conclusions section has an interesting suggestion: what is missing so far is the sensorimotor activity loop that allows for the active interrogation of the data source. Human vision roams over images while DL systems ingest the entire scene. And the real neural systems have energy constraints that lead to suppression of neural function away from the active neural clusters.

The Deep Computing Lessons of Apollo

Apollo 11With the arrival of the Apollo 11 mission’s 45th anniversary, and occasional planning and dreaming about a manned mission to Mars, the role of information technology comes again into focus. The next great mission will include a phalanx of computing resources, sensors, radars, hyper spectral cameras, laser rangefinders, and information fusion visualization and analysis tools to knit together everything needed for the astronauts to succeed. Some of these capabilities will be autonomous, predictive, and knowledgable.

But it all began with the Apollo Guidance Computer or AGC, the rather sophisticated for-its-time computer that ran the trigonometric and vector calculations for the original moonshot. The AGC was startlingly simple in many ways, made up exclusively of NOR gates to implement Arithmetic Logic Unit-like functionality, shifts, and register opcodes combined with core memory (tiny ferromagnetic loops) in both RAM and ROM forms (the latter hand-woven by graduate students).

Using NOR gates to create the entire logic of the central processing unit is guided by a few simple principles. A NOR gate combines both NOT and OR functionality together and has the following logical functionality:

[table id=1 /]

The NOT-OR logic can be read as “if INPUT1 or INPUT2 is set to 1, then the OUTPUT should be 1, but then take the logical inversion (NOT) of that”. And, amazingly, circuits built from NORs can create any Boolean logic. NOT A is just NOR(A,A), which you can see from the following table:

[table id=2 /]

AND and OR can similarly be constructed by layering NORs together. For Apollo, the use of just a single type of integrated circuit that packaged NORs into chips improved reliability.

This level of simplicity has another important theoretical result that bears on the transition from simple guidance systems to potentially intelligent technologies for future Mars missions: a single layer of Boolean functions can only compute simple things. And as you layer on the functions you get increased complexity but complexity that is bounded by the depth of the logical function network. In fact, it can be proved that there are functions that can be represented in a k-depth network that can only be represented in a k-1 depth network if that network has exponentially many hidden units relative to the input size.

This is a startling theoretical discovery and motivates much of the deep learning research: functions for classification of Martian hyper spectral imagery need deep networks precisely because the complexity of the classification task rules out the use of shallower ones. Now mostly we are using artificial neural node simplifications to do this rather than Boolean primitives, but the motivations are the same.

But back to the crawling that predates the running: besides basic logical operations, how can we do something more usefully complex using NORs? Here’s an example logic circuit from that shows an adding circuit for adding together bits:

where each of the little half-moons are NORs with their inputs on the left and their outputs to the right. A and B are the inputs while S is the output, and C is the “carry” to the next significant bit. By combining these together, they can add arbitrarily large binary representations of integers with a circuit depth of 7 per 2 bit adder.

Inching Towards Shannon’s Oblivion

SkynetFollowing Bill Joy’s concerns over the future world of nanotechnology, biological engineering, and robotics in 2000’s Why the Future Doesn’t Need Us, it has become fashionable to worry over “existential threats” to humanity. Nuclear power and weapons used to be dreadful enough, and clearly remain in the top five, but these rapidly developing technologies, asteroids, and global climate change have joined Oppenheimer’s misquoted “destroyer of all things” in portending our doom. Here’s Max Tegmark, Stephen Hawking, and others in Huffington Post warning again about artificial intelligence:

One can imagine such technology outsmarting financial markets, out-inventing human researchers, out-manipulating human leaders, and developing weapons we cannot even understand. Whereas the short-term impact of AI depends on who controls it, the long-term impact depends on whether it can be controlled at all.

I almost always begin my public talks on Big Data and intelligent systems with a presentation on industrial revolutions that progresses through Robert Gordon’s phases and then highlights Paul Krugman’s argument that Big Data and the intelligent systems improvements we are seeing potentially represent a next industrial revolution. I am usually less enthusiastic about the timeline than nonspecialists, but after giving a talk at PASS Business Analytics Friday in San Jose, I stuck around to listen in on a highly technical talk concerning statistical regularization and deep learning and I found myself enthused about the topic once again. Deep learning is using artificial neural networks to classify information, but is distinct from traditional ANNs in that the systems are pre-trained using auto-encoders to have a general knowledge about the data domain. To be clear, though, most of the problems that have been tackled are “subsymbolic” for image recognition and speech problems. Still, the improvements have been fairly impressive based on some pretty simple ideas. First, the pre-training is accompanied by systematic bottlenecking of the number of nodes that can be used for learning. Second, the amount that each fires is kept low to avoid overfitting to nodes with dominating magnitudes. Together, the auto-encoders learn the patterns without training and can then be trained faster and easier to associate those patterns with classes.

I still have my doubts concerning the threat timeline, however. For one, these are mostly sub-symbolic systems that are not capable of the kinds of self-directed system modifications that many fear can lead to exponential self-improvement. Second, the tasks that are seeing improvements are not new but just relatively well-known classification problems. Finally, the improvements, while impressive, are incremental improvements. There is probably a meaningful threat profile that can convert into a decision tree for when action is needed. For global climate change there are consensus estimates about sea level changes for instance. For Evil AI I think we need to wait for a single act of machine intelligence out-of-control before spending excessively on containment, policy, or regulation. In the meantime, though, keep a close eye on your laptop.

And then there’s the mild misanthropy of Claude Shannon, possibly driven by living too long in New Jersey:

I visualize a time when we will be to robots what dogs are to humans, and I’m rooting for the machines.