Tagged: Donald Davidson

Randomness and Meaning

The impossibility of the Chinese Room has implications across the board for understanding what meaning means. Mark Walker’s paper “On the Intertranslatability of all Natural Languages” describes how the translation of words and phrases may be achieved:

  1. Through a simple correspondence scheme (word for word)
  2. Through “syntactic” expansion of the languages to accommodate concepts that have no obvious equivalence (“optometrist” => “doctor for eye problems”, etc.)
  3. Through incorporation of foreign words and phrases as “loan words”
  4. Through “semantic” expansion where the foreign word is defined through its coherence within a larger knowledge network.

An example for (4) is the word “lepton” where many languages do not have a corresponding concept and, in fact, the concept is dependent on a bulwark of advanced concepts from particle physics. There may be no way to create a superposition of the meanings of other words using (2) to adequately handle “lepton.”

These problems present again for trying to understand how children acquire meaning in learning a language. As Walker points out, language learning for a second language must involve the same kinds of steps as learning translations, so any simple correspondence theory has to be supplemented.

So how do we make adequate judgments about meanings and so rapidly learn words, often initially with a course granularity but later with increasingly sharp levels of focus? What procedure is required for expanding correspondence theories to operate in larger networks? Methods like Latent Semantic Analysis and Random Indexing show how this can be achieved in ways that are illuminating about human cognition. In each case, the methods provide insights into how relatively simple transformations of terms and their occurrence contexts can be viewed as providing a form of “triangulation” about the meaning of words. And, importantly, this level of triangulation is sufficient for these methods to do very human-like things. Both methods can pass the TOEFL exam, for instance, and Latent Semantic Analysis is at the heart of automatic essay grading approaches that have sufficiently high success rates that they are widely used by standardized test makers.

How do they work? I’ll just briefly describe Random Indexing, since I recently presented the concept at the Big Data Science meetup at SGI in Fremont, California. In Random Indexing, we simply create a randomized sparse vector for each word we encounter in a large collection of texts. The vector can be binary as a first approximation, so something like:

The: 0000000000000100000010000000000000000001000000000000000…

quick: 000100000000000010000000000001000000000110000000000000…

fox: 0000000000000000000000100000000000000000000000000100100…

Now, as I encountered a given word in the text, I just add up the random vectors for the words around it to create a new “context” vector that is still sparse, but less so than the component parts. What is interesting about this approach is that if you consider the vectors as representing points in a hyperspace with the same dimensionality as the vectors are long, then words that have similar meanings tend to cluster in that space. Latent Semantic Analysis achieves a similar clustering using some rather complex linear algebra. A simple approximation of the LSA approach is also at the heart of Google’s PageRank algorithm, though operating on link structure rather than word co-occurrences.

So how do we solve the TOEFL test using an approach like Random Indexing? A large collection of texts are analyzed to create a Random Index, then for a sample question like:

In line 5, the word “pronounced” most closely means

  1. evident
  2. spoken
  3. described
  4. unfortunate

The question and the question text are converted into a context vector using the same random vectors for the index and then the answers vectors are compared to see which is closest in the index space. This is remarkably inexpensive to compute, requiring just an inner product between the context vectors for question and answer.

A method for compact coding using Algorithmic Information Theory can also be used to achieve similar results, demonstrating the wide applicability of context-based analysis to helping understand how intertranslateability and language learning are dependent on the rich contexts of word usage.

Radical Triangulation

Donald Davidson argued that descriptive theories of semantics suffered from untenable complications that could, in turn, be solved by a holistic theory of meaning. Holism, in this sense, is due to the dependency of words and phrases as part of a complex linguistic interchange. He proposed “triangulation” as a solution, where we zero-in on a tentatively held belief about a word based on other beliefs about oneself, about others, and about the world we think we know.

This seems daringly obvious, but it is merely the starting point of the hard work of what mechanisms and steps are involved in fixing the meaning of words through triangulation. There are certainly some predispositions that are innate and fit nicely with triangulation. These are subsumed under The Principle of Charity and even the notion of the Intentional Stance in how we regard others like us.

Fixing meaning via model-making has some curious results. The language used to discuss aesthetics and art tends to borrow from other fields (“The narrative of the painting,” “The functional grammar of the architecture.”) Religious and spiritual terminology often has extremely porous models: I recently listened to Episcopalians discuss the meaning of “grace” for almost an hour with great glee but almost no progress; it was the belief that they were discussing something of ineffable greatness that was moving to them. Even seemingly simple scientific ideas become elaborately complex for both children and adults: we begin with atoms as billiard balls that mutate into mini solar systems that become vibrating clouds of probabilistic wave-particles around groups of properties in energetic suspension by virtual particle exchange.

Can we apply more formal models to the task of staking out this method of triangulation? For Davidson, language was both compositional and holistic, so it stands to reason that optimizing each vector of the triangulation can be rephrased as maximizing the agreement between the existing belief and new beliefs about terms and meaning, the models we hold about others’ beliefs about the terms, and any empirical facts or related desiderata that are at sway. And here we may have an application of Solomonoff Induction, again, as an extension to Bayesian model-making. How do I chose to order the meaning signals from each of my belief sources? Under what circumstances do I reorder them or abandon an existing model in an aha moment? If the meta-model for ordering and triangulation is a striving for parsimony, then radical revisionism by reorganizing the underlying explanatory model is optimal when it follows Solomonoff-like principles.

“Optimality” might be straining credulity here–especially given the above description of arguments about the meaning of “grace”–but there may be a modified sense of the word in that the mathematical purity of a Solomonoff result is implemented in cognition as a kind of heuristic that tends towards good results in the face of extremely noisy signals.