AMTA 2012 Workshop on Monolingual Machine Translation (MONOMT 2012)

Title: Monolingual Machine Translation (MONOMT 2012).

Date: Nov 1, 2012
Location: San Diego, United States
* Colocated with AMTA 2012 (The Tenth Biennial Conference of the Association for Machine Translation in the America)

Program (tentative) version 0.2

8:00-8:50 Light Breakfast
8:50 Introduction (5min)
9:00 Invited session (1)
    George Foster (60min)
    Marcello Federico (40min)
10:40-10:50 coffee break
10:50 Invited session (2)
    Evgeny Matusov (45min)
11:35 Long presentation session (1)
    Andy Way (long presen long paper, 30min)
    Dennis Nolan Mehay (long presen short paper, 25min)
12:30 - 12:45 Poster session
    Jin'ichi Murakami
    Thoudam Doren Singh
    Benjamin Gottesman
    Saptina Dian Larasati
    Marianna J Martindale
12:45 - 14:00 Lunch
14:00 Long presentation session (2)
    Lluis Formiga (long presen long paper, 30min)
14:30 Invited session (3)
    Philipp Koehn (30min)
    Qun Liu (30min)
15:40-15:50 coffee break
15:50 Invited session (4)
    Taro Watanabe (50min)
16:40 Short presentation session
    Jin'ichi Murakami (short presen long paper, 14min)
    Thoudam Doren Singh (short presen long paper, 14min)
    Saptina Dian Larasati (short presen short paper, 7min)
    Benjamin Gottesman (short presen short paper, 7min)


Long Presentations
  • Improving English to Spanish Out-of-Domain Translations by Morphology Generalization and Generation
    Lluis Formiga, Adolfo Hernandez, Jose B. Marino, and Enric Monte
  • Monolingual Data Optimisation for Bootstrapping SMT Engines
    Jie Jiang, Andy Way, Nelson Ng, Rejwanul Haque, Mike Dillinger, and Jun Lu
  • Shallow and Deep Paraphrasing for Improved Machine Translation Parameter Optimization
    Dennis Nolan Mehay and Michael White
Short Presentations
  • Two stage Machine Translation System using Pattern-based MT and Phrase-based SMT
    Jin'ichi Murakami, Takuya Nishimura and Masato Tokuhisa
  • Improving Word Alignment by Exploiting Adapted Word Similarity
    Septina Dian Larasati
  • Addressing some Issues of Data Sparsity towards Improving English-Manipuri SMT using Morphological Information
    Thoudam Doren Singh
  • Statistical Machine Translation for Depassivizing German Part-of-speech Sequences
    Benjamin Gottesman
Late Breaking Poster (No Proceedings)
  • Can statistical post-editing with a small parallel corpus save a weak MT engine?
    Marianna J Martindale

Marcello Federico (FBK, Italy)
George Foster (National Research Council Canada, Canada)
Philipp Koehn (University of Edinburgh, UK)
Qun Liu (Dublin City University, Ireland & Chinese Academy of Science, China)
Evgeny Matusov (SAIC)
Taro Watanabe (NICT, Japan)

Abstract of Invited Talks

George Foster
  • Domain Adaptation without the Domain

    Domain Adaptation usually means adjusting parameters to cope with a mismatch between old (training) and new domains. However, even when there is no new domain, there can be significant variation among and within documents. Can SMT performance benefit from the use of adaptation techniques to capture this variation? In this talk, I will explore this question, focussing on three relatively new aspects. First, the absence of representative "new" data means that adaptation must take place solely on the basis of the current source document. Second, within-domain variation can occur at the sub-document level, requiring that different parts of a document be treated differently. Finally, variation may be characterized by multiple "views", such as topic and genre, which need to be handled simultaneously. I will describe strategies for dealing with these problems, and give preliminary results on the Hansard corpus.
Marcello Federico
  • Improved word-reordering for Phrase-Based SMT

    We discuss the word-reordering issues in phrase-based SMT and overview a few approaches we have been investigating to enhance the standard word re-ordering constraints used by popular tools like Moses. The presented methods redefine the allowed permutation space for each input string, in a way to permit long word movements only to a subset of words or chunks. We show that our best approaches result in more accurate translations at no extra computational cost. Performance improvements were measured with two competitive SMT systems, respectively, translating from Arabic to English and from German to English. (Joint work with Arianna Bisazza, FBK)
Evgeny Matusov
  • Monolingual word alignment for system combination and other MT-related applications

    Combination of the output of multiple machine translation (MT) systems has been a hot research topic in the recent years, yielding very promising results. The word-level system combination methods that produce a consensus translation rely on monolingual word alignment between the different translation hypotheses. In this talk, one of such methods will be described that uses a Hidden Markov Model (HMM) to produce the alignment automatically on a whole corpus translated by multiple MT systems. The alignment procedure is iterative and is especially well suited for non-monotonic alignment and alignment of synonyms. In the talk, the HMM alignment will be compared with the TER-based and other alignments. The advantages and drawbacks of each of these alignment methods will be discussed.

    Another area where monolingual alignment may be necessary is automatic MT evaluation. In the talk, we will describe a variant of the edit distance alignment which can align a hypothesis with multiple reference translations. The application of this method is evaluation of speech translation with automatic sentence segmentation. In such cases the sentence boundaries in the MT hypothesis do not correspond to the reference sentence boundaries. The method allows us to overcome this problem by aligning a whole document/show at once so that the reference segment boundaries are optimally inserted into the hypothesis. The proposed alignment method has been adopted by many international MT research projects.

    Finally, the talk will shortly consider another application of monolingual alignment: automatic document correction or postediting. We will elaborate whether a monolingual MT system can be used as a postediting system, as well as present an alternative document correction approach that makes use of the monolingual alignment to derive candidates for correction rules.
Philipp Koehn
  • Enabling Monolingual Translation

    What kind of assistance can we offer to an information seeker who is trying to decipher a foreign text in a language she is not familiar with. This talk present how advancements in computer aided translation tools can be used.
Taro Watanabe
  • Tree-based System Combination and Pre-ordering for Machine Translation

    Among monolingual sub-tasks employed in machine translation, we will focus on the use of tree-based approaches for system combination and pre-ordering. First, in a conventional system combination method for machine translation is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated by taking syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. We demonstrate that our forest based approach competes with the confusion network based method under smaller hypothesis space.

    Second, we present a method for learning a discriminative parser for machine translation reordering using only aligned parallel text. This is done by treating the parser’s derivation tree as a latent variable in a model that is trained to maximize reordering accuracy. We demonstrate that efficient large-margin training is possible by showing that two measures of reordering accuracy can be factored over the parse tree. Using this model in the pre-ordering framework results in significant gains in translation accuracy over standard phrase-based SMT and previously proposed unsupervised syntax induction methods.
Qun Liu
  • Maximum Rank Correlation Training for Statistical Machine Translation

    We propose Maximum Ranking Correlation (MRC) as an objective function in discriminative tuning of parameters in a linear model of Statistical Machine Translation (SMT). We try to maximize the ranking correlation between sentence level BLEU (SBLEU) scores and model scores of the N-best list, while the MERT paradigm focuses on the potential 1-best candidates of the N-best list. After we optimize the MER and the MRC objectives using an multiple objective optimization algorithm at the same time, we interpolate them to obtain parameters which outperform both. Experimental results on WMT French–English data set confirm that our method significantly outperforms MERT on out-of-domain data sets, and performs marginally better than MERT on in-domain data sets, which validates the usefulness of MRC on both domain specific and general domain data. This research was originally published in MT Summit XIII and we will give more details and analysis in this talk.
Instructions for Presenters
  • Oral Presentation
    There will be a switcher with four RBG inputs for connecting laptops to the projector. Please connect your laptop to one of the inputs during the break before your session and make sure your presentation is visible. The session chair or a/v technician should be able to help if problems arise. If you need a laptop and are unable to borrow one, please contact the organizers.

  • Poster Presentation
    Each poster will have a table with two chairs and a 4-foot long by 3-foot high poster board (big enough to accommodate A0 paper format). The conference will provide double-sided tape, pushpins, clips, etc. for attaching your poster to the board.

Due to the increasing demands for high quality translation, monolingual Machine Translation (MT) subtasks are frequently encountered in various occasions, where one MT task is decomposed into several subtasks some of which can be called `monolingual'. Such monolingual MT subtasks include: (1) MT for morphologically rich languages, [Bojar, 08] aimed at dealing with morphologic richness of the target, as is the case with the English-Czech (EN-CZ) language pair. An MT task is thus split into two subtasks: first, English is (`bilingually') translated into simplified Czech and then, the obtained morphologically normalized Czech is (`monolingually') translated into morphologically rich Czech; (2) system combination [Matusov et al., 05], where a source sentence is first translated into the target language by several MT systems, and then, the obtained translations are combined to create / generate the output in the same language; (3) statistical post-editing [Dugast et al., 07; Simard et al., 07], where a source sentence is first translated into the target language by a rule-based MT system and then, the obtained output is `monolingually' translated by an SMT system; (4) domain adaptation using transfer learning [Daume III, 07]: the source side written in a `source' domain (e.g., newswires) is converted into the target side written in a `target' domain (e.g., patents); (5) transliteration between phonemes / alphabets [Knight and Graehl, 98]; (6) considering reordering issues (SVO and SOV) [Katz-Brown et al., 11]; (7) MERT process [Arun et al., 10]; (8) translation memory (TM) and MT integration [Ma et al., 11]; (9) paraphrasing for creating additional training data or for evaluation purposes; ((10) error identification and voting with independent monolingual crowdsources [Hu et al., 11].)

A distinction could be established between bilingual MT tools (B-tools) and monolingual MT tools (M-tools) that may be exploited for monolingual MT. Consider, e.g., monolingual subtasks such as MT for morphologically rich languages, statistical post-editing, or transliteration and a task of system combination or domain adaptation as respective representatives. The latter group is often approached with monolingual M-tools like monolingual word alignment [Matusov et al., 05; He et al., 08] and the minimization of Bayes risk [Kumar and Byrne, 02] (on the outputs of combined systems). However, the former usually employs bilingual MT tools, like GIZA++ [Och and Ney, 04] to extract bilingual phrases and MAP decoding on them. The way M-tools and B-tools are used for monolingual MT is an issue of particular interest for this workshop.

This workshop is intended to provide the opportunity to discuss ideas and share opinions on the question of the applicability of M-tools or B-tools for monolingual MT subtasks, and on their respective strengths and weaknesses in specific settings. Furthermore we wish to provide opportunity to demonstrate successful usecases of M-tools.

Possible questions, that are encouraged to be addressed during the workshop, include:
  • ways of applying M-tools to monolingual MT subtasks such as MT for morphologically rich languages and statistical post-editing.
  • investigation of the suitability of B-tools or M-tools for monolingual MT subtasks.
  • performance improvements of monolingual word alignment tools, since these are necessary for specific monolingual subtasks, such as MT for morphologically rich languages and statistical post-editing.
Submission deadline: ** August 14 ** (extended), 2012   
Notification to authors: August 31, 2012
Camera ready: ** September 6 **, 2012 
Workshop: November 1, 2012 

Original papers are invited on different aspects of monolingual MT, such as:
  • MT for morphologically rich languages 
  • system combination 
  • statistical post-editing 
  • domain adaptation 
  • MERT process 
  • MT for reordering mismatched language pairs (SVO and SOV, ...) 
  • MT-TM integration (i.e. MT systems whose prior knowledge includes bilingual terminology and TM) 
  • transliteration 
  • MT using textual entailment 
  • MT using confidence estimation 
  • paraphrasing 
  • hybrid MT 
  • ... 
Papers describing the mechanism of MT tools that may be considered `monolingual' are also encouraged. Some possible topics are listed below:
  • MBR decoding, consensus decoding 
  • monolingual word alignment (based on TER, METEOR,...) 
  • language models constructed by learning the representation of data 
  • data structure related matters 
  • ranking algorithms  
  • multitask learning (in the context of domain adaptation) 
  • ... 

Authors are invited to submit long papers (up to 10 pages) and short papers (2 - 4 pages). Long papers should describe unpublished, substantial and completed research. Short papers should be position papers, papers describing work in progress or short, focused contributions. Papers will be accepted until August 3, 2012 in PDF format via the system:  Submitted papers must follow the styles and formatting guidelines available from the AMTA main conference site (See below). As the reviewing will be blind, the papers must not include the authors' names and affiliations. Furthermore, self-references that reveal the author's identity, e.g., "We previously showed (Smith, 1991) ..." must be avoided. Instead, use citations such as "Smith previously showed (Smith, 1991) ..." Papers that do not conform to these requirements will be rejected without review.

Style files:


For those who may want to participate in our workshop or have some opinion to our workshop, we would like to organize a late breaking poster sessions. Note that if you are only those who will not require visa to the US or who have already acquired visa to the US, please submit a paper. Note that this will not be included in the Proceedings, but will be presented in the form of posters.
  • 2 page extended abstract using AMTA format (Send a pdf file to
  • Deadline: October 15, 2012.

Tsuyoshi Okita (Dublin City University, Ireland)
Artem Sokolov (LIMSI, France)
Taro Watanabe (National Institute of Information and Communications Technology, Japan)

Bogdan Babych (University of Leeds, UK)
Loic Barrault (LIUM, Universite du Maine, France)
Nicola Bertoldi (FBK, Italy)
Ergun Bicici (CNGL, Dublin City University, Ireland)
Ondrej Bojar (Charles University, Czech)
Boxing Chen (NRC Institute for Information Technology, Canada)
Trevor Cohn (University of Sheffield, UK)
Marta Ruiz Costa-jussa (Barcelona Media, Spain)
Josep M. Crego (SYSTRAN, France)
John DeNero (Google, USA)
Jinhua Du (Xi'an University of Technology, China)
Kevin Duh (Nara Institute of Science and Technology, Japan)
Chris Dyer (CMU, USA)
Christian Federmann (DFKI, Germany)
Yvette Graham (Dublin City University, Ireland)
Barry Haddow (University of Edinburgh, UK)
Xiadong He (Microsoft, USA)
Jagadeesh Jagarlamudi (University of Maryland, USA)
Jie Jiang (Applied Language Solutions, UK)
Philipp Koehn (University of Edinburgh, UK)
Shankar Kumar (Google, USA)
Alon Lavie (CMU, USA)
Yanjun Ma (Baidu, China)
Aurelien Max (LIMSI, University Paris Sud, France)
Maite Melero (Barcelona Media, Spain)
Philip Resnik (University of Maryland, USA)
Stefan Riezler (University of Heidelberg, Germany)
Lucia Specia (University of Sheffield, UK)
Marco Turchi (JRC, Italy)
Antal van den Bosch (Radboud University Nijmegen, Netherlands)
Xianchao Wu (Baidu, Japan)
Dekai Wu (HKUST, Hong Kong)
Francois Yvon (LIMSI, University Paris Sud, France)
(Last modified Oct 20, 2012)