Inferred Modular Superparrots

The buzz about ChatGPT and related efforts has been surprisingly resistant to the standard deflationary pressure of the Gartner hype cycle. Quantum computing definitely fizzled but appears to be moving towards the plateau of productivity with recent expansions of the number of practical qubits available by IBM and Origin in China, as well as additional government funding out of national security interests and fears. But ChatGPT attracted more sustained attention because people can play with it easily without needing to understand something like Shor’s algorithm for factoring integers. Instead, you just feed it a prompt and are amazed that it writes so well. And related image generators are delightful (as above) and may represent a true displacement of creative professionals even at this early stage, with video hallucinators evolving rapidly too.

But are Large Language Models (LLMs) like ChatGPT doing much more than stitching together recorded fragments of texts ingested from an internet-scale corpus of text? Are they inferring patterns that are in any way beyond just being stochastic parrots? And why would scaling up a system result in qualitative new capabilities, if there are any at all?

Some new work covered in Quanta Magazine has some intriguing suggestions that there is a bit more going on in LLMs, although the subtitle contains the word “understanding” that I think is premature. At heart is the idea that as networks scale up given ordering rules that are not highly uniform or correlated they tend to break up into collections of subnetworks that are distinct (substitute “graphs” for networks if you are a specialist). The theory, then, is that the ingest of sufficient magnitudes of text into a sufficiently large network and the error-minimization involved in tuning that network to match output to input also segregates groupings that the Quanta author and researchers at Princeton and DeepMind refer to as skills.… Read the rest

Be Persistent and Evolve

If we think about the evolution of living things we generally start from the idea that evolution requires replicators, variation, and selection. But what if we loosened that up to the more everyday semantics of the word “evolution” when we talk about the evolution of galaxies or of societies or of crystals? Each changes, grows, contracts, and has some kind of persistence that is mediated by a range of internal and external forces. For crystals, the availability of heat and access to the necessary chemicals is key. For galaxies, elements and gravity and nuclear forces are paramount. In societies, technological invention and social revolution overlay the human replicators and their biological evolution. Should we make a leap and just declare that there is some kind of impetus or law to the universe such that when there are composable subsystems and composition constraints, there will be an exploration of the allowed state space for composition? Does this add to our understanding of the universe?

Wong, et. al. say exactly that in “On the roles of function and selection in evolving systems” in PNAS. The paper reminds me of the various efforts to explain genetic information growth given raw conceptions of entropy and, indeed, some of those papers appear in the cites. It was once considered an intriguing problem how organisms become increasingly complex in the face of, well, the grinding dissolution of entropy. It wasn’t really that hard for most scientists: Earth receives an enormous load of solar energy that supports the push of informational systems towards negentropy. But, to the earlier point about composability and constraints, the energy is in a proportion that supports the persistence of systems that are complex.… Read the rest

Find the Alien

Assembly Theory (AT) (original paper) is some new theoretical chemistry that tries to assess the relative complexity of the molecular underpinnings of life, even when the chemistry might be completely alien. For instance, if we send a probe to a Jovian moon and there are new microscopic creatures in the ocean, how will we figure that out? In AT, it is assumed that all living organisms require a certain complexity in order to function since that is a minimal requirement for life on Earth. The chemists experimentally confirmed that mass spectrometry is a fairly reliable way of differentiating the complexity of living things and their byproducts from other substances. Of course, they only have Earthly living things to test, but they had no false positives in their comparison set of samples, though some substances like beer tended to be unusually high in their spectral analysis. The theory is that when a mass spec ionizes a sample and routes it through a magnetic and electric field, the complexity of the original molecules is represented in the complexity of the spray of molecular masses recorded by the detectors.

But what is “complexity” exactly? There are a great number of candidates, as Seth Lloyd notes in this little round-up paper that I linked to previously. Complexity intuitively involves something like a trade-off between randomness and uniformity, but also reflects internal repetition with variety. There is a mathematical formalism that in full attribution is “Solomonoff-Chaitin-Kolmogorov Complexity”—but we can just call it algorithmic complexity (AC) for short—that has always been an idealized way to think about complexity: take the smallest algorithm (in terms of bits) that can produce a pattern and the length of the algorithm in bits is the complexity.… Read the rest

Entanglements: Collected Short Works

Now available in Kindle, softcover, and hardcover versions, Entanglements assembles a decade of short works by author, scientist, entrepreneur, and inventor Mark William Davis.

The fiction includes an intimate experimental triptych on the evolution of sexual identities. A genre-defying poetic meditation on creativity and environmental holocaust competes with conventional science fiction about quantum consciousness and virtual worlds. A postmodern interrogation of the intersection of storytelling and film rounds out the collected works as a counterpoint to an introductory dive into the ethics of altruism.

The nonfiction is divided into topics ranging from literary theory to philosophical concerns of religion, science, and artificial intelligence. Legal theories are magnified to examine the meaning of liberty and autonomy. A qualitative mathematics of free will is developed over the course of two essays and contextualized as part of the algorithm of evolution. What meaning really amounts to is always a central concern, whether discussing politics, culture, or ideas.

The works show the author’s own evolution in his thinking of our entanglement with reality as driven by underlying metaphors that transect science, reason, and society. For Davis, metaphors and the constellations of words that help frame them are the raw materials of thought, and their evolution and refinement is the central narrative of our growth as individuals in a webwork of societies and systems.

Entanglements is for readers who are in love with ideas and the networks of language that support and enervate them. It is a metalinguistic swim along a polychromatic reef of thought where fiction and nonfictional analysis coexist like coral and fish in a greater ecosystem.

Mark William Davis is the author of three dozen scientific papers and patents in cognitive science, search, machine translation, and even the structure of art.… Read the rest

A Learning Smorgasbord

Compliments of a discovery by Futurism, the paper The Autodidactic Universe by a smorgasbord of contemporary science and technology thinkers caught my attention for several reasons. First was Jaron Lanier as a co-author. I knew Jaron’s dad, Ellery, when I was a researcher at NMSU’s now defunct Computing Research Laboratory. Ellery had returned to school to get his psychology PhD during retirement. In an odd coincidence, my brother had also rented a trailer next to the geodesic dome Jaron helped design and Ellery lived after my brother became emancipated in his teens. Ellery may have been his landlord, but I am not certain of that.

The paper is an odd piece of kit that I read over two days in fits and spurts with intervening power lifting interludes (I recently maxed out my Bowflex and am considering next steps!). It initially has the feel of physicists trying to reach into machine learning as if the domain specialists clearly missed something that the hardcore physical scientists have known all along. But that concern dissipated fairly quickly and the paper settled into showing isomorphisms between various physical theories and the state evolution of neural networks. OK, no big deal. Perhaps they were taken by the realization that the mathematics of tensors was a useful way to describe network matrices and gradient descent learning. They then riffed on that and looked at the broader similarities between the temporal evolution of learning and quantum field theory, approaches to quantum gravity, and cosmological ideas.

The paper, being a smorgasbord, then investigates the time evolution of graphs using a lens of graph theory. The core realization, as I gleaned it, is that there are more complex graphs (visually as well as based on the diversity of connectivity within the graph) and pointlessly uniform or empty ones.… Read the rest

One Shot, Few Shot, Radical Shot

Exunoplura is back up after a sad excursion through the challenges of hosting providers. To be blunt, they mostly suck. Between systems that just don’t work right (SSL certificate provisioning in this case) and bad to counterproductive support experiences, it’s enough to make one want to host it oneself. But hosting is mostly, as they say of war, long boring periods punctuated by moments of terror as things go frustratingly sideways. But we are back up again after two hosting provider side-trips!

Honestly, I’d like to see an AI agent effectively navigate through these technological challenges. Where even human performance is fleeting and imperfect, the notion that an AI could learn how to deal with the uncertain corners of the process strikes me as currently unthinkable. But there are some interesting recent developments worth noting and discussing in the journey towards what is named “general AI” or a framework that is as flexible as people can be, rather than narrowly tied to a specific task like visually inspecting welds or answering a few questions about weather, music, and so forth.

First, there is the work by the OpenAI folks on massive language models being tested against one-shot or few-shot learning problems. In each of these learning problems, the number of presentations of the training data cases is limited, rather than presenting huge numbers of exemplars and “fine tuning” the response of the model. What is a language model? Well, it varies across different approaches, but typically is a weighted context of words of varying length, with the weights reflecting the probabilities of those words in those contexts over a massive collection of text corpora. For the OpenAI model, GPT-3, the total number of parameters (words/contexts and their counts) is an astonishing 175 billion using 45 Tb of text to train the model.… Read the rest

Metaphors as Bridges to the Future

David Lewis’s (I’m coming to accept this new convention with s-ending possessives!) solution to Putnam’s semantic indeterminacy is that we have a network of concepts that interrelate in a manner that is consistent under probing. As we read, we know from cognitive psychology, texts that bridge unfamiliar concepts from paragraph to paragraph help us to settle those ideas into the network, sometimes tentatively, and sometimes needing some kind of theoretical reorganization as we learn more. Then there are some concepts that have special referential magnetism and are piers for the bridges.

You can see these same kinds of bridging semantics being applied in the quest to solve some our most difficult and unresolved scientific conundrums. Quantum physics has presented strangeness from its very beginning and the various interpretations of that strangeness and efforts to reconcile the strange with our everyday logic remains incomplete. So it is not surprising that efforts to unravel the strange in quantum physics often appeal to Einstein’s descriptive approach to deciphering the strange problems of electromagnetic wave propagation that ultimately led to Special and then General Relativity.

Two recent approaches that borrow from the Einstein model are Carlo Rovelli’s Relational Quantum Mechanics and David Albert’s How to Teach Quantum Mechanics. Both are quite explicit in drawing comparisons to the relativity approach; Einstein, in merging space and time, and in realizing inertial and gravitational frames of reference were indistinguishable, introduced an explanation that defied our expectations of ordinary, Newtonian physical interactions. Time was no longer a fixed universal but became locked to observers and their relative motion, and to space itself.

Yet the two quantum approaches are decidedly different, as well. For Rovelli, there is no observer-independent state to quantum affairs.… Read the rest

Theoretical Reorganization

Sean Carroll of Caltech takes on the philosophy of science in his paper, Beyond Falsifiability: Normal Science in a Multiverse, as part of a larger conversation on modern theoretical physics and experimental methods. Carroll breaks down the problems of Popper’s falsification criterion and arrives at a more pedestrian Bayesian formulation for how to view science. Theories arise, theories get their priors amplified or deflated, that prior support changes due to—often for Carroll—coherence reasons with other theories and considerations and, in the best case, the posterior support improves with better experimental data.

Continuing with the previous posts’ work on expanding Bayes via AIT considerations, the non-continuous changes to a group of scientific theories that arrive with new theories or data require some better model than just adjusting priors. How exactly does coherence play a part in theory formation? If we treat each theory as a binary string that encodes a Turing machine, then the best theory, inductively, is the shortest machine that accepts the data. But we know that there is no machine that can compute that shortest machine, so there needs to be an algorithm that searches through the state space to try to locate the minimal machine. Meanwhile, the data may be varying and the machine may need to incorporate other machines that help improve the coverage of the original machine or are driven by other factors, as Carroll points out:

We use our taste, lessons from experience, and what we know about the rest of physics to help guide us in hopefully productive directions.

The search algorithm is clearly not just brute force in examining every micro variation in the consequences of changing bits in the machine. Instead, large reusable blocks of subroutines get reparameterized or reused with variation.… Read the rest

Free Will and Algorithmic Information Theory (Part II)

Bad monkey

So we get some mild form of source determinism out of Algorithmic Information Complexity (AIC), but we haven’t addressed the form of free will that deals with moral culpability at all. That free will requires that we, as moral agents, are capable of making choices that have moral consequences. Another way of saying it is that given the same circumstances we could have done otherwise. After all, all we have is a series of if/then statements that must be implemented in wetware and they still respond to known stimuli in deterministic ways. Just responding in model-predictable ways to new stimuli doesn’t amount directly to making choices.

Let’s expand the problem a bit, however. Instead of a lock-and-key recognition of integer “foodstuffs” we have uncertain patterns of foodstuffs and fallible recognition systems. Suddenly we have a probability problem with P(food|n) [or even P(food|q(n)) where q is some perception function] governed by Bayesian statistics. Clearly we expect evolution to optimize towards better models, though we know that all kinds of historical and physical contingencies may derail perfect optimization. Still, if we did have perfect optimization, we know what that would look like for certain types of statistical patterns.

What is an optimal induction machine? AIC and variants have been used to define that machine. First, we have Solomonoff induction from around 1960. But we also have Jorma Rissanen’s Minimum Description Length (MDL) theory from 1978 that casts the problem more in terms of continuous distributions. Variants are available, too, from Minimum Message Length, to Akaike’s Information Criterion (AIC, confusingly again), Bayesian Information Criterion (BIC), and on to Structural Risk Minimization via Vapnik-Chervonenkis learning theory.

All of these theories involve some kind of trade-off between model parameters, the relative complexity of model parameters, and the success of the model on the trained exemplars.… Read the rest

Free Will and Algorithmic Information Theory

I was recently looking for examples of applications of algorithmic information theory, also commonly called algorithmic information complexity (AIC). After all, for a theory to be sound is one thing, but when it is sound and valuable it moves to another level. So, first, let’s review the broad outline of AIC. AIC begins with the problem of randomness, specifically random strings of 0s and 1s. We can readily see that given any sort of encoding in any base, strings of characters can be reduced to a binary sequence. Likewise integers.

Now, AIC states that there are often many Turing machines that could generate a given string and, since we can represent those machines also as a bit sequence, there is at least one machine that has the shortest bit sequence while still producing the target string. In fact, if the shortest machine is as long or a bit longer (given some machine encoding requirements), then the string is said to be AIC random. In other words, no compression of the string is possible.

Moreover, we can generalize this generator machine idea to claim that given some set of strings that represent the data of a given phenomena (let’s say natural occurrences), the smallest generator machine that covers all the data is a “theoretical model” of the data and the underlying phenomena. An interesting outcome of this theory is that it can be shown that there is, in fact, no algorithm (or meta-machine) that can find the smallest generator for any given sequence. This is related to Turing Incompleteness.

In terms of applications, Gregory Chaitin, who is one of the originators of the core ideas of AIC, has proposed that the theory sheds light on questions of meta-mathematics and specifically that it demonstrates that mathematics is a quasi-empirical pursuit capable of producing new methods rather than being idealistically derived from analytic first-principles.… Read the rest