Tweak, Memory

Artificial Neural Networks (ANNs) were, from early on in their formulation as Threshold Logic Units (TLUs) or Perceptrons, mostly focused on non-sequential decision-making tasks. With the invention of back-propagation training methods, the application to static presentations of data became somewhat fixed as a methodology. During the 90s Support Vector Machines became the rage and then Random Forests and other ensemble approaches held significant mindshare. ANNs receded into the distance as a quaint, historical approach that was fairly computationally expensive and opaque when compared to the other methods.

But Deep Learning has brought the ANN back through a combination of improvements, both minor and major. The most important enhancements include pre-training of the networks as auto-encoders prior to pursuing error-based training using back-propagation or  Contrastive Divergence with Gibbs Sampling. The critical other enhancement derives from Schmidhuber and others work in the 90s on managing temporal presentations to ANNs so the can effectively process sequences of signals. This latter development is critical for processing speech, written language, grammar, changes in video state, etc. Back-propagation without some form of recurrent network structure or memory management washes out the error signal that is needed for adjusting the weights of the networks. And it should be noted that increased compute fire-power using GPUs and custom chips has accelerated training performance enough that experimental cycles are within the range of doable.

Note that these are what might be called “computer science” issues rather than “brain science” issues. Researchers are drawing rough analogies between some observed properties of real neuronal systems (neurons fire and connect together) but then are pursuing a more abstract question as to how a very simple computational model of such neural networks can learn.… Read the rest

Curiouser and Curiouser

georgeJürgen Schmidhuber’s work on algorithmic information theory and curiosity is worth a few takes, if not more, for the researcher has done something that is both flawed and rather brilliant at the same time. The flaws emerge when we start to look deeply into the motivations for ideas like beauty (is symmetry and noncomplex encoding enough to explain sexual attraction? Well-understood evolutionary psychology is probably a better bet), but the core of his argument is worth considering.

If induction is an essential component of learning (and we might suppose it is for argument’s sake), then why continue to examine different parameterizations of possible models for induction? Why be creative about how to explain things, like we expect and even idolize of scientists?

So let us assume that induction is explained by the compression of patterns into better and better models using an information theoretic-style approach. Given this, Schmidhuber makes the startling leap that better compression and better models are best achieved by information harvesting behavior that involves finding novelty in the environment. Thus curiosity. Thus the implementation of action in support of ideas.

I proposed a similar model to explain aesthetic preferences for mid-ordered complex systems of notes, brush-strokes, etc. around 1994, but Schmidhuber’s approach has the benefit of not just characterizing the limitations and properties of aesthetic systems, but also justifying them. We find interest because we are programmed to find novelty, and we are programmed to find novelty because we want to optimize our predictive apparatus. The best optimization is actively seeking along the contours of the perceivable (and quantifiable) universe, and isolating the unknown patterns to improve our current model.… Read the rest