Handling Very Long Contexts in Neural Machine Translation: a Survey
« This report examines methods for integrating an extended discourse context in machine translation, focusing on neural translation methods. Machine translation systems generally translate each sentence independently of its neighbors, which yields systematic errors resulting from a limited discourse context. Therefore, various approaches have been proposed to incorporate cross-sentential context, mostly based on the predominant Transformer architecture. Recently, the introduction of large language models (LLMs) also created novel opportunities to process long-range dependencies, inspiring several context-aware machine translation approaches. We present the challenges of translating long inputs, then investigate encoder-decoder architectures and LLM-based approaches, with a brief overview of efficient transformer implementations as a common background. (…) »