Project Overview
The continuous increase of (digital) textual content in different languages suggests that the potential demand for translation services by far exceeds the current volume of actual translation orders. One major bottleneck in harvesting this increase in demand is the cost of diversifying the services offered by translation companies, e.g., adapting to new domains of language use (e.g., legal, news, politics, economy, finance, sports, weather reports, appliance manuals, professional email messages). As well as differences in jargon, fixed expressions and idiomatic language use, the translations of individual words and multi-word expressions may differ significantly across different domains. This proposal aims at providing automatic methods for inducing dedicated translation aids from large translation data that enable rapid, user-driven, low-cost adaptation of translation services to meet market demand.
We take inspiration from user studies that show that automatic translation aids lead to increased productivity in translation (up to four times). In this proposal we want to make a major leap from static, costly acquisition of automatic translation aids to rapid, user-driven, domain-adaptation methods. We start out from the idea that when massive translated data is accumulated, continuously replenished and shared by a diverse enough set of translation companies, a practical perspective on solving the adaptation bottleneck becomes possible. By statistically weighting translation data for relevance to a desired domain, statistical machine translation (SMT) methods can be extended to induce a variety of domain-specific translation aids to fit user demand.
This project will exploit the massive translation data repository housed by the TAUS Data Association (TDA). The project will build an automatic Data-Powered, Example-Driven Translation Adaptor (called DatAptor) that automatically compiles translation memories, engines as well as hierarchically structured translations that are optimized for a desired domain exemplified by a user-supplied text sample. For realizing the DatAptor methodology, the present project concentrates on three major research challenges:
Main Objectives
- How to select and weight translation examples in the TDA repository according to relevance for a user-supplied example text representing a certain domain? The measure of relevance is optimized for translation quality.
- How to automatically train and adapt a translation system (or components)for optimized performance on similar future texts from the desired domain? and
- How to extract hierarchically structured sets of translation equivalents that provide a more extensive and advanced representation than dictionaries and phrase translation tables.