Tagged: Chinese Room

Randomness and Meaning

The impossibility of the Chinese Room has implications across the board for understanding what meaning means. Mark Walker’s paper “On the Intertranslatability of all Natural Languages” describes how the translation of words and phrases may be achieved:

  1. Through a simple correspondence scheme (word for word)
  2. Through “syntactic” expansion of the languages to accommodate concepts that have no obvious equivalence (“optometrist” => “doctor for eye problems”, etc.)
  3. Through incorporation of foreign words and phrases as “loan words”
  4. Through “semantic” expansion where the foreign word is defined through its coherence within a larger knowledge network.

An example for (4) is the word “lepton” where many languages do not have a corresponding concept and, in fact, the concept is dependent on a bulwark of advanced concepts from particle physics. There may be no way to create a superposition of the meanings of other words using (2) to adequately handle “lepton.”

These problems present again for trying to understand how children acquire meaning in learning a language. As Walker points out, language learning for a second language must involve the same kinds of steps as learning translations, so any simple correspondence theory has to be supplemented.

So how do we make adequate judgments about meanings and so rapidly learn words, often initially with a course granularity but later with increasingly sharp levels of focus? What procedure is required for expanding correspondence theories to operate in larger networks? Methods like Latent Semantic Analysis and Random Indexing show how this can be achieved in ways that are illuminating about human cognition. In each case, the methods provide insights into how relatively simple transformations of terms and their occurrence contexts can be viewed as providing a form of “triangulation” about the meaning of words. And, importantly, this level of triangulation is sufficient for these methods to do very human-like things. Both methods can pass the TOEFL exam, for instance, and Latent Semantic Analysis is at the heart of automatic essay grading approaches that have sufficiently high success rates that they are widely used by standardized test makers.

How do they work? I’ll just briefly describe Random Indexing, since I recently presented the concept at the Big Data Science meetup at SGI in Fremont, California. In Random Indexing, we simply create a randomized sparse vector for each word we encounter in a large collection of texts. The vector can be binary as a first approximation, so something like:

The: 0000000000000100000010000000000000000001000000000000000…

quick: 000100000000000010000000000001000000000110000000000000…

fox: 0000000000000000000000100000000000000000000000000100100…

Now, as I encountered a given word in the text, I just add up the random vectors for the words around it to create a new “context” vector that is still sparse, but less so than the component parts. What is interesting about this approach is that if you consider the vectors as representing points in a hyperspace with the same dimensionality as the vectors are long, then words that have similar meanings tend to cluster in that space. Latent Semantic Analysis achieves a similar clustering using some rather complex linear algebra. A simple approximation of the LSA approach is also at the heart of Google’s PageRank algorithm, though operating on link structure rather than word co-occurrences.

So how do we solve the TOEFL test using an approach like Random Indexing? A large collection of texts are analyzed to create a Random Index, then for a sample question like:

In line 5, the word “pronounced” most closely means

  1. evident
  2. spoken
  3. described
  4. unfortunate

The question and the question text are converted into a context vector using the same random vectors for the index and then the answers vectors are compared to see which is closest in the index space. This is remarkably inexpensive to compute, requiring just an inner product between the context vectors for question and answer.

A method for compact coding using Algorithmic Information Theory can also be used to achieve similar results, demonstrating the wide applicability of context-based analysis to helping understand how intertranslateability and language learning are dependent on the rich contexts of word usage.

On the Soul-Eyes of Polar Bears

I sometimes reference a computational linguistics factoid that appears to be now lost in the mists of early DoD Tipster program research: Chinese linguists only agree on the segmentation of texts into words about 80% of the time. We can find some qualitative agreement on the problematic nature of the task, but the 80% is widely smeared out among the references that I can now find. It should be no real surprise, though, because even English with white-space tokenization resists easy characterization of words versus phrases: “New York” and “New York City” are almost words in themselves, though just given white-space tokenization are also phrases. Phrases lift out with common and distinct usage, however, and become more than the sum of their parts; it would be ridiculously noisy to match a search for “York” against “New York” because no one in the modern world attaches semantic significance to the “York” part of the phrase. It exists as a whole and the nature of the parts has dissolved against this wholism.

John Searle’s Chinese Room argument came up again today. My son was waxing, as he does, in a discussion about mathematics and order, and suggested a poverty of our considerations of the world as being purely and completely natural. He meant in the sense of “materialism” and “naturalism” meaning that there are no mystical or magical elements to the world in a metaphysical sense. I argued that there may nonetheless be something that is different and indescribable by simple naturalistic calculi: there may be qualia. It led, in turn, to a qualification of what is unique about the human experience and hence on to Searle’s Chinese Room.

And what happens in the Chinese Room? Well, without knowledge of Chinese, you are trapped in a room with a large collection of rules for converting Chinese questions into Chinese answers. As slips of Chinese questions arrive, you consult the rule book and spit out responses. Searle’s point was that it is silly to argue that the algorithm embodied by the room really understands Chinese and that the notion of “Strong AI” (artificial intelligence is equivalent to human intelligence insofar as there is behaviorally equivalence between the two) falls short of the meaning of “strong.” This is a correlate to the Turing Test in a way, which also posits a thought experiment with computer and human interlocutors who are remotely located.

The arguments against the Chinese Room range from complaints that there is no other way to establish intelligence to the claim that given sensory-motor relationships with the objects the symbols represent, the room could be considered intentional. I don’t dispute any of these arguments, however. Instead, I would point out that the initial specification of the gedankenexperiment fails in the assumption that the Chinese Room is actually able to produce adequate outputs for the range of possible inputs. In fact, while even the linguists disagree about the nature of Chinese words, every language can be used to produce utterances that have never been uttered before. Chomsky’s famous “colorless green ideas sleep furiously” shows the problem with clarity. It is the infinitude of language and its inherent ambiguity that makes the Chinese Room an inexact metaphor. A Chinese questioner could ask how do the “soul-eyes of polar bears beam into the hearts of coal miners?” and the system would fail like enjambing precision German machinery fed tainted oil. Yeah, German machinery enjambs just like polar bears beam.

So the argument stands in its opposition to Strong AI given its initial assumptions, but fails given real qualifications of those assumptions.

NOTE: There is possibly a formal argument embedded in here in that a Chomsky grammar that is recursively enumerable has infinite possible productions but that an algorithm can be devised to accommodate those productions given Turing completeness. Such an algorithm is in principle only, however, and does require a finite symbol alphabet. While the Chinese characters may be finite, the semantic and pragmatic metadata are not clearly so.