I was initially dismissive of this note from Google Research on improving machine translation via Deep Learning Networks by adding in a sentence-level network. My goodness, they’ve rediscovered anaphora and co-reference resolution! Next thing they will try is some kind of network-based slot-filler ontology to carry gender metadata. But their goal was to add a framework to their existing recurrent neural network architecture that would support a weak, sentence-level resolution of translational ambiguities while still allowing the TPU/GPU accelerators they have created to function efficiently. It’s a hack, but one that potentially solves yet another corner of the translation problem and might result in a few percent further improvements in the quality of the translation.
But consider the following sentences:
The dog had the ball. It was covered with slobber.
The dog had the ball. It was thinking about lunch while it played.
In these cases, the anaphora gets resolved by semantics and the resolution seems largely an automatic and subconscious process to us as native speakers. If we had to translate these into a second language, however, we would be able to articulate that there are specific reasons for correctly assigning the “It” to the ball in the first two sentences. Well, it might be possible for the dog to be covered with slobber, but we would guess the sentence writer would intentionally avoid that ambiguity. The second set of sentences could conceivably be ambiguous if, in the broader context, the ball was some intelligent entity controlling the dog. Still, when our guesses are limited to the sentence pairs in isolation we would assign the obvious interpretations. Moreover, we can resolve giant, honking passage-level ambiguities with ease, where the author is showing off in not resolving the co-referents until obscenely late in the text.
In combination, we can see the obvious problem with sentence-level “attention” calculations. The context has to be moving and fairly long.