Tagged: Jurgen Schmidhuber

Active Deep Learning

BrainDeep Learning methods that use auto-associative neural networks to pre-train (with bottlenecking methods to ensure generalization) have recently been shown to perform as well and even better than human beings at certain tasks like image categorization. But what is missing from the proposed methods? There seem to be a range of challenges that revolve around temporal novelty and sequential activation/classification problems like those that occur in natural language understanding. The most recent achievements are more oriented around relatively static data presentations.

Jürgen Schmidhuber revisits the history of connectionist research (dating to the 1800s!) in his October 2014 technical report, Deep Learning in Neural Networks: An Overview. This is one comprehensive effort at documenting the history of this reinvigorated area of AI research. What is old is new again, enhanced by achievements in computing that allow for larger and larger scale simulation.

The conclusions section has an interesting suggestion: what is missing so far is the sensorimotor activity loop that allows for the active interrogation of the data source. Human vision roams over images while DL systems ingest the entire scene. And the real neural systems have energy constraints that lead to suppression of neural function away from the active neural clusters.

Novelty in the Age of Criticism

Lower Manhattan Panorama because I am in Jersey City, NJ tonight.
Lower Manhattan panorama because I am in Jersey City, NJ as I write this, with an awesomely aesthetic view.

Gary Cutting from Notre Dame and the New York Times knows how to incite an intellectual riot, as demonstrated by his most recent The Stone piece, Mozart vs. the Beatles. “High art” is superior to “low art” because of its “stunning intellectual and emotional complexity.” He sums up:

My argument is that this distinctively aesthetic value is of great importance in our lives and that works of high art achieve it much more fully than do works of popular art.

But what makes up these notions of complexity and distinctive aesthetic value? One might try to enumerate those values or create a list. Or, alternatively, one might instead claim that time serves as a sieve for the values that Cutting is claiming make one work of art superior to another, thus leaving open the possibility for the enumerated list approach to be incomplete but still a useful retrospective system of valuation.

I previously argued in a 1994 paper (published in 1997), Complexity Formalisms, Order and Disorder in the Structure of Art, that simplicity and random chaos exist in a careful balance in art that reflects our underlying grammatical systems that are used to predict the environment. And Jürgen Schmidhuber took the approach further by applying algorithmic information theory to novelty seeking behavior that leads, in turn, to aesthetically pleasing models. The reflection of this behavioral optimization in our sideline preoccupations emerges as art, with the ultimate causation machine of evolution driving the proximate consequences for men and women.

But let’s get back to the flaw I see in Cutting’s argument that, in turn, fits better with Schmidhuber’s approach: much of what is important in art is cultural novelty. Picasso is not aesthetically superior to the detailed hyper-reality of Dutch Masters, for instance, but is notable for his cultural deconstruction of the role of art as photography and reproduction took hold. And the simplicity and unstructured chaos of the Abstract Expressionists is culturally significant as well. Importantly, changes in technology are essential to changes in artistic outlook, from the aforementioned role of photography in diminishing the aesthetic value of hand renderings to the application of electronic instruments in Philip Glass symphonies. Is Mozart better than Glass or Stravinsky? Using this newer standard for aesthetics, no, because Mozart was working skillfully (and perhaps brilliantly) but within the harmonic model of Classical composition and Classical forms. He was one of many. But Wagner or Debussy changed the aural landscape, by comparison, and by the time of tone rows and aleatoric composition, conventional musical aesthetics were largely abandoned, if only fleetingly.

Modernism and postmodernism in prose and poetry follow similar trajectories, but I think there may have been a counter-opposing force to novelty seeking in much prose literature. That force is the requirement for narrative stories that are about human experiences, which is not a critical component of music or visual art. Human experience has a temporal flow and spatial unity. When novelists break these requirements in complex ways, writing becomes increasingly difficult to comprehend (perhaps a bit like aleatoric music?), so the efforts of novelists more often cling to convention while using other prose tools and stylistic fireworks to enhance the reader’s aesthetic valuations. Novelty hits less often, but often with greater challenges. Poetry has, by comparison, been more experimental in forms and concepts.

And architecture? Cutting’s Chartres versus Philip Johnson?

So, returning to Cutting, I have largely been arguing about the difficulty of calling one piece of what Cutting might declare high art as aesthetically superior to another piece of high art. But my goal is that if we use cultural novelty as the primary yardstick, then we need to reorder the valuations. Early rock and roll pioneers, early blues artists, early modern jazz impresarios—all the legends we can think of—get top billing alongside Debussy. Heavy metal, rap, and electronica inventors live proudly with the Baroque masters. They will likely survive that test-of-time criteria, too, because of the invention of recording technologies, which were not available to the Baroque composers.

Singularity and its Discontents

Kimmel botIf a machine-based process can outperform a human being is it significant? That weighty question hung in the background as I reviewed Jürgen Schmidhuber’s work on traffic sign classification. Similar results have emerged from IBM’s Watson competition and even on the TOEFL test. In each case, machines beat people.

But is that fact significant? There are a couple of ways we can look at these kinds of comparisons. First, we can draw analogies to other capabilities that were not accessible by mechanical aid and show that the fact that they outperformed humans was not overly profound. The wheel quickly outperformed human legs for moving heavy objects. The cup outperformed the hands for drinking water. This then invites the realization that the extension of these physical comparisons leads to extraordinary juxtapositions: the airline really outperformed human legs for transport, etc. And this, in turn, justifies the claim that since we are now just outperforming human mental processes, we can only expect exponential improvements moving forward.

But this may be a category mistake in more than the obvious differentiator of the mental and the physical. Instead, the category mismatch is between levels of complexity. The number of parts in a Boeing 747 is 6 million versus one moving human as the baseline (we could enumerate the cells and organelles, etc., but then we would need to enumerate the crystal lattices of the aircraft steel, so that level of granularity is a wash). The number of memory addresses in a big server computer is 64 x 10^9 or higher, with disk storage in the TBs (10^12). Meanwhile, the human brain has 100 x 10^9 neurons and 10^14 connections. So, with just 2 orders of magnitude between computers and brains versus 6 between humans and planes, we find ourselves approaching Kurzweil’s argument that we have to wait until 2040. I’m more pessimistic and figure 2080, but then no one expects the Inquisition, either, to quote the esteemed philosophers, Monty Python.

We might move that back even further, though, because we still lack a theory of the large scale control of the collected software modules needed to operate on that massive neural simulation. At least Schmidhuber’s work used an artifical neural network. The others were looser about any affiliation to actual human information processing, though the LSI work is mathematically similar to some kinds of ANNs in terms of outcomes.

So if analogies only serve to support a mild kind of techno-optimism, we still can think about the problem in other ways by inverting the comparisons or emphasizing the risk of superintelligent machines. Thus is born the existential risk school of technological singularities. But such concerns and planning doesn’t really address the question of whether superintelligent machines are actually possible, or whether current achievements are significant.

And that brings us to the third perspective: the focus on competitive outcomes in AI research leads to only mild advances in the state-of-the-art, but does lead to important social outcomes. These are Apollo moon shots, in other words. Regardless of significant scientific advances, they stir the mind and the soul. It may transform the mild techno-optimism into moderate techo-optimism. And that’s OK, because the alternative is stationary fear.