Inferred Modular Superparrots

The buzz about ChatGPT and related efforts has been surprisingly resistant to the standard deflationary pressure of the Gartner hype cycle. Quantum computing definitely fizzled but appears to be moving towards the plateau of productivity with recent expansions of the number of practical qubits available by IBM and Origin in China, as well as additional government funding out of national security interests and fears. But ChatGPT attracted more sustained attention because people can play with it easily without needing to understand something like Shor’s algorithm for factoring integers. Instead, you just feed it a prompt and are amazed that it writes so well. And related image generators are delightful (as above) and may represent a true displacement of creative professionals even at this early stage, with video hallucinators evolving rapidly too.

But are Large Language Models (LLMs) like ChatGPT doing much more than stitching together recorded fragments of texts ingested from an internet-scale corpus of text? Are they inferring patterns that are in any way beyond just being stochastic parrots? And why would scaling up a system result in qualitative new capabilities, if there are any at all?

Some new work covered in Quanta Magazine has some intriguing suggestions that there is a bit more going on in LLMs, although the subtitle contains the word “understanding” that I think is premature. At heart is the idea that as networks scale up given ordering rules that are not highly uniform or correlated they tend to break up into collections of subnetworks that are distinct (substitute “graphs” for networks if you are a specialist). The theory, then, is that the ingest of sufficient magnitudes of text into a sufficiently large network and the error-minimization involved in tuning that network to match output to input also segregates groupings that the Quanta author and researchers at Princeton and DeepMind refer to as skills.… Read the rest

Triangulation Machinery, Poetry, and Politics

I was reading Muriel Rukeyser‘s poetry and marveling at some of the lucid yet novel constructions she employs. I was trying to avoid the grueling work of comparing and contrasting Biden’s speech on the anniversary of January 6th, 2021 with the responses from various Republican defenders of Trump. Both pulled into focus the effect of semantic and pragmatic framing as part of the poetic and political processes, respectively. Sorry, Muriel, I just compared your work to the slow boil of democracy.

Reaching in interlaced gods, animals, and men.
There is no background. The figures hold their peace
In a web of movement. There is no frustration,
Every gesture is taken, everything yields connections.

There is a theory about how language works that I’ve discussed here before. In this theory, from Donald Davidson primarily, the meaning of words and phrases are tied directly to a shared interrogation of what each person is trying to convey. Imagine a child observing a dog and a parent says “dog” and is fairly consistent with that usage across several different breeds that are presented to the child. The child may overuse the word, calling a cat a dog at some point, at which point the parent corrects the child with “cat” and the child proceeds along through this interrogatory process, triangulating in on the meaning of dog versus cat. Triangulation is Davidson’s term, reflecting three parties: two people discussing a thing or idea. In the case of human children, we also know that there are some innate preferences the child will apply during the triangulation process, like preferring “whole object” semantics to atomized ones, and assuming different words mean different things even when applied to the same object: so “canine” and “dog” must refer to the same object in slightly different ways since they are differing words, and indeed they do: dog IS-A canine but not vice-versa.… Read the rest

Distributed Contexts in the Language Game

The meaning of words and phrases can be a bit hard to pin down. Indeed, the meaning of meaning itself is problematical. I can point to a dictionary and say, well, there is where we keep the meanings of things, but that is just a record of the way in which we use the language. I’m personally fond of a kind of philosophical perspective on this matter of meaning that relies on a form of holism. That is, words and their meanings are defined by our usages of them, our historical interactions with them in different contexts, and subtle distinctive cues that illuminate how words differ and compare. Often, but not always, the words are tied to things in the world, as well, and therefore have a fastness that resists distortions and distinctions.

This is, of course, a critical area of inquiry when trying to create intelligent machines that deal with language. How do we imbue the system with meaning, represent it within the machine, and apply it to novel problems that show intelligent behavior? In approaching the problem, we must therefore be achieving some semblance of intelligence in a fairly rigorous way since we are simulating it with logical steps.

The history of philosophical and linguistic interest in these topics is fascinating, ranging from Wittgenstein’s notion of a language game that builds up rules of use to Firth’s expansion to formalization of collocation of words as critical to meaning. In artificial intelligence, this concept of collocation has been expanded further to include interchangeability of contexts. Thus, boat and ship occur in more similar contexts than boat and bank.

A general approach to acquiring these contexts is based on the idea of dimensionality reduction in various forms.… Read the rest

Intelligent Borrowing

There has been a continuous bleed of biological, philosophical, linguistic, and psychological concepts into computer science since the 1950s. Artificial neural networks were inspired by real ones. Simulated evolution was designed around metaphorical patterns of natural evolution. Philosophical, linguistic, and psychological ideas transferred as knowledge representation and grammars, both natural and formal.

Since computer science is a uniquely synthetic kind of science and not quite a natural one, borrowing and applying metaphors seems to be part of the normal mode of advancement in this field. There is a purely mathematical component to the field in the fundamental questions around classes of algorithms and what is computable, but there are also highly synthetic issues that arise from architectures that are contingent on physical realizations. Finally, the application to simulating intelligent behavior relies largely on three separate modes of operation:

  1. Hypothesize about how intelligent beings perform such tasks
  2. Import metaphors based on those hypotheses
  3. Given initial success, use considerations of statistical features and their mappings to improve on the imported metaphors (and, rarely, improve with additional biological insights)

So, for instance, we import a simplified model of neural networks as connected sets of weights representing some kind of variable activation or inhibition potentials combined with sudden synaptic firing. Abstractly we already have an interesting kind of transfer function that takes a set of input variables and has a nonlinear mapping to the output variables. It’s interesting because being nonlinear means it can potentially compute very difficult relationships between the input and output.

But we see limitations, immediately, and these are observed in the history of the field. For instance, if you just have a single layer of these simulated neurons, the system isn’t fundamentally complex enough to compute any complex functions, so we add a few layers and then more and more.… Read the rest

Forever Uncanny

Quanta has a fair round up of recent advances in deep learning. Most interesting is the recent performance on natural language understanding tests that are close to or exceed mean human performance. Inevitably, John Searle’s Chinese Room argument is brought up, though the author of the Quanta article suggests that inferring the Chinese translational rule book from the data itself is slightly different from the original thought experiment. In the Chinese Room there is a person who knows no Chinese but has a collection of translational reference books. She receives texts through a slot and dutifully looks up the translation of the text and passes out the result. “Is this intelligence?” is the question and it serves as a challenge to the Strong AI hypothesis. With statistical machine translation methods (and their alternative mechanistic implementation, deep learning), the rule books have been inferred by looking at translated texts (“parallel” texts as we say in the field). By looking at a large enough corpus of parallel texts, greater coverage of translated variants is achieved as well as some inference of pragmatic issues in translation and corner cases.

As a practical matter, it should be noted that modern, professional translators often use translation memory systems that contain idiomatic—or just challenging—phrases that they can reference when translating new texts. The understanding resides in the original translator’s head, we suppose, and in the correct application of the rule to the new text by checking for applicability according to, well, some other criteria that the translator brings to bear on the task.

In the General Language Understand Evaluation (GLUE) tests described in the Quanta article, the systems are inferring how to answer Wh-style queries (who, what, where, when, and how) as well as identify similar texts.… Read the rest

The Elusive in Art and Artificial Intelligence

Per caption.
Deep Dream (deepdreamgenerator.com) of my elusive inner Van Gogh.

How exactly deep learning models do what they do is at least elusive. Take image recognition as a task. We know that there are decision-making criteria inferred by the hidden layers of the networks. In Convolutional Neural Networks (CNNs), we have further knowledge that locally-receptive fields (or their simulated equivalent) provide a collection of filters that emphasize image features in different ways, from edge detection to rotation-invariant reductions prior to being subjected to a learned categorizer. Yet, the dividing lines between a chair and a small loveseat, or between two faces, is hidden within some non-linear equation composed of these field representations with weights tuned by exemplar presentation.

This elusiveness was at least part of the reason that neural networks and, generally, machine learning-based approaches have had a complicated position in AI research; if you can’t explain how they work, or even fairly characterize their failure modes, maybe we should work harder to understand the support for those decision criteria rather than just build black boxes to execute them?

So when groups use deep learning to produce visual artworks like the recently auctioned work sold by Christie’s for USD 432K, we can be reassured that the murky issue of aesthetics in art appreciation is at least paired with elusiveness in the production machine.

Or is it?

Let’s take Wittgenstein’s ideas about aesthetics as a perhaps slightly murky point of comparison. In Wittgenstein, we are almost always looking at what are effectively games played between and among people. In language, the rules are shared in a culture, a community, and even between individuals. These are semantic limits, dialogue considerations, standardized usages, linguistic pragmatics, expectations, allusions, and much more.… Read the rest

Black and Gray Boxes with Autonomous Meta-Cognition

Vijay Pande of VC Andreessen Horowitz (who passed on my startups twice but, hey, it’s just business!) has a relevant article in New York Times concerning fears of the “black box” of deep learning and related methods: is the lack of explainability and limited capacity for interrogation of the underlying decision making a deal-breaker for applications to critical areas like medical diagnosis or parole decisions? His point is simple, and related to the previous post’s suggestion of the potential limitations of our capacity to truly understand many aspects of human cognition. Even the doctor may only be able to point to a nebulous collection of clinical experiences when it comes to certain observational aspects of their jobs, like in reading images for indicators of cancer. At least the algorithm has been trained on a significantly larger collection of data than the doctor could ever encounter in a professional lifetime.

So the human is almost as much a black box (maybe a gray box?) as the algorithm. One difference that needs to be considered, however, is that the deep learning algorithm might make unexpected errors when confronted with unexpected inputs. The classic example from the early history of artificial neural networks involved a DARPA test of detecting military tanks in photographs. The apocryphal to legendary formulation of the story is that there was a difference in the cloud cover between the tank images and the non-tank images. The end result was that the system performed spectacularly on the training and test data sets but then failed miserably on new data that lacked the cloud cover factor. I recalled this slightly differently recently and substituted film grain for the cloudiness. In any case, it became a discussion point about the limits of data-driven learning that showed how radically incorrect solutions could be created without careful understanding of how the systems work.… Read the rest

Deep Simulation in the Southern Hemisphere

I’m unusually behind in my postings due to travel. I’ve been prepping for and now deep inside a fresh pass through New Zealand after two years away. The complexity of the place seems to have a certain draw for me that has lured me back, yet again, to backcountry tramping amongst the volcanoes and glaciers, and to leasurely beachfront restaurants painted with eruptions of summer flowers fueled by the regular rains.

I recently wrote a technical proposal that rounded up a number of the most recent advances in deep learning neural networks. In each case, like with Google’s transformer architecture, there is a modest enhancement that is based on a realization of a deficit in the performance of one of two broad types of networks, recurrent and convolutional.

An old question is whether we learn anything about human cognition if we just simulate it using some kind of automatically learning mechanism. That is, if we use a model acquired through some kind of supervised or unsupervised learning, can we say we know anything about the original mind and its processes?

We can at least say that the learning methodology appears to be capable of achieving the technical result we were looking for. But it also might mean something a bit different: that there is not much more interesting going on in the original mind. In this radical corner sits the idea that cognitive processes in people are tactical responses left over from early human evolution. All you can learn from them is that they may be biased and tilted towards that early human condition, but beyond that things just are the way they turned out.

If we take this position, then, we might have to discard certain aspects of the social sciences.… Read the rest

Ambiguously Slobbering Dogs

I was initially dismissive of this note from Google Research on improving machine translation via Deep Learning Networks by adding in a sentence-level network. My goodness, they’ve rediscovered anaphora and co-reference resolution! Next thing they will try is some kind of network-based slot-filler ontology to carry gender metadata. But their goal was to add a framework to their existing recurrent neural network architecture that would support a weak, sentence-level resolution of translational ambiguities while still allowing the TPU/GPU accelerators they have created to function efficiently. It’s a hack, but one that potentially solves yet another corner of the translation problem and might result in a few percent further improvements in the quality of the translation.

But consider the following sentences:

The dog had the ball. It was covered with slobber.

The dog had the ball. It was thinking about lunch while it played.

In these cases, the anaphora gets resolved by semantics and the resolution seems largely an automatic and subconscious process to us as native speakers. If we had to translate these into a second language, however, we would be able to articulate that there are specific reasons for correctly assigning the “It” to the ball in the first two sentences. Well, it might be possible for the dog to be covered with slobber, but we would guess the sentence writer would intentionally avoid that ambiguity. The second set of sentences could conceivably be ambiguous if, in the broader context, the ball was some intelligent entity controlling the dog. Still, when our guesses are limited to the sentence pairs in isolation we would assign the obvious interpretations. Moreover, we can resolve giant, honking passage-level ambiguities with ease, where the author is showing off in not resolving the co-referents until obscenely late in the text.… Read the rest

Twilight of the Artistic Mind

Kristen Stewart, of Twilight fame, co-authored a paper on using deep learning neural networks in her new movie that she is directing. The basic idea is very old but the details and scale are more recent. If you take an artificial neural network and have it autoencode the input stream with bottlenecking, you can then submit any stimulus and will get some reflection of the training in the output. The output can be quite surreal, too, because the effect of bottlenecking combined with other optimizations results in an exaggeration of the features that define the input data set. If the input is images, the output will contain echoes of those images.

For Stewart’s effort, the goal was to transfer her highly stylized concept art into the movie scene. So they trained the network on her concept image and then submitted frames from the film to the network. The result reflected aspects of the original stylized image and the input image, not surprisingly.

There has been a long meditation on the unique status of art and music as a human phenomenon since the beginning of the modern era. The efforts at actively deconstructing the expectations of art play against a background of conceptual genius or divine inspiration. The abstract expressionists and the aleatoric composers show this as a radical 20th Century urge to re-imagine what art might be when freed from the strictures of formal ideas about subject, method, and content.

Is there any significance to the current paper? Not a great deal. The bottom line was that there was a great deal of tweaking to achieve a result that was subjectively pleasing and fit with the production goals of the film.… Read the rest