http://www.computing.dcu.ie/~mforcada/ebmt3/
Following two successful previous meetings at
the MT Summits in 2001 and 2005, the 3rd International Workshop on
Example-Based Machine Translation (EBMT) took place at the Centre
for Next Generation Localization at Dublin City University, on
November 12 and 13, 2009.
The main theme of the Workshop was "Going open-source to revive
example-based machine translation", as a result of a reflection by
the workshop chairs, Andy Way and Mikel Forcada, on the current
success of statistical machine translation (SMT): Is it because SMT
is the best way to do MT, or is it because SMT software is free and
open-source, and therefore easily obtained and open to
collaboration?
In the four years since the 2nd Workshop, held in Phuket, Thailand,
as part of Machine Translation Summit X, EBMT research seemed to be
languishing in the doldrums. The response to the call for papers
issued in July 2009 might have been disappointing. On the contrary,
15 papers were received, of which 11 were accepted after being
reviewed by a Programme Committee involving top EBMT researchers
from around the world. A good part of the papers submitted
addressed the theme and involved free/open-source MT (FOSMT) or
related software, either describing new FOSMT software or
announcing the release of FOSMT software involved in the research
presented.
A complete one-day-and-a-half programme was assembled, starting
with an invited talk by Prof. Sadao Kurohashi (Kyoto University) on
"Fully Syntactic Example-Based Machine Translation", and including
an open-discussion on the main theme of the Workshop: "Going
open-source to revive EBMT".
Forty-seven people signed up to attend the workshop from ten
different countries: Belgium, France, UK, Japan, the Netherlands,
Poland, Spain, Switzerland, the United States, and, of course, the
host country, Ireland. The session was opened by Dr. Stephen
Flinter, Scientific Programme Manager with Science Foundation
Ireland, and Prof. Josef van Genabith, Director of the Centre for
Next Generation Localisation.
After Prof. Kurohashi's keynote address, two sessions took place,
one on hybrid approaches to EBMT and the other on open-source EBMT
packages and tools, with three papers each.
The scientific programme of the first day ended with the open
discussion, which started with two short seeding talks by co-chairs
Andy Way ("Open Research Questions in Example-Based Machine
Translation") and Mikel L. Forcada ("Why free/open-source
EBMT?").
Andy Way started by posing 11 open research questions for EBMT:
just to name a few, questions about possible advantages of EBMT
over SMT for online, real-time applications, or of tree-to-tree
EBMT over SMT, questions about redundancy in the example base,
about the lack of EBMT papers devoted to EBMT recombination, etc.
Mikel L. Forcada gave a quick summary of free/open-source licences,
and explained the advantages of doing research in a
free/open-source setting, in particular as a way to foster
collaboration and guarantee reproducibility of research.
There then followed short addresses by the three panellists, Sadao
Kurohashi, Yves Lepage (Université de Caen) and Ralf Brown
(Carnegie-Mellon University). Sadao Kurohashi, who described
himself as "positively disposed towards OS", advocated that EBMT
should move away from the simplicity of SMT and embrace the power
of linguistically-motivated technologies such as parsing. Ralf
Brown explored some of the reasons that prevented researchers from
going open-source (inertia, reluctance to show ugly code,
university policies), insisted on some of the advantages already
mentioned, and encouraged people to "just do it". Yves Lepage
advocated collaboration toward the creation of a set of tools that
would clearly identify EBMT and give it visibility, but with the
clear aim of building one baseline EBMT system. He then went on to
list a number of desiderata: availability of
subsententially-aligned corpora and of open tools to align,
evaluation tools and metrics that measure what the EBMT community
would like to be measured, and increased external visibility and
closeness to translation professionals.
The ensuing discussion revealed a wide consensus on the benefits of
freeing/open-sourcing not only EBMT tools and engines, but also
corpora and associated EBMT sub-sentential 'memories'. Some of the
problems involved (such as the difficulties of obtaining permission
from universities) were put forward. Some participants supported
Lepage's proposal of one strong open-source EBMT system which could
be used as a reference for all EBMT practitioners, but even in the
absence of such, all who spoke were in favour of setting up an EBMT
Internet portal where researchers would meet and share software and
corpora. Other issues were also raised, such as real-life
post-editing using EBMT (likely to be more favourably received by
translators/post-editors than SMT), the (in)adequacy of BLEU-like
automatic metrics for EBMT, and focussing on problems where EBMT
clearly wins out over SMT.
Two sessions were held on Friday, one on "Pure" EBMT, and the other
one on Applications of EBMT, comprising five papers in total. The
conference closed at 12.30 and the attendees either headed for home
or chose to stay in Dublin for the weekend.
The complete proceedings of the conference, including slides for
many of the talks, are available online: http://www.computing.dcu.ie/~mforcada/ebmt3/proceedings.html.
As to the next moves following the workshop, in order to bring
together all of the open-source EBMT initiatives, the organizers of
the workshop will launch a web portal by the end of 2009 (the URL
which will be widely announced, also through the workshop page) and
urge everyone to turn their good intentions into real collaboration
before the enthusiasm and the consensus die away. In addition to a
toolkit featuring the code of the open-sourced systems, corpora,
demos and links to documentation and papers will also be featured.
In the meantime, a list of open-source MT software (not only EBMT)
is being compiled at http://www.computing.dcu.ie/~mforcada/fosmt.html.
There will also be an EBMT tutorial and coding session at the MT
Marathon (http://www.mtmarathon2010.info):
EBMT groups are urged to send their students so that collaboration
starts as soon as possible.
If things start to happen and the field begins to pick up momentum
again, perhaps we won't have to let another four years pass before
the 4th EBMT Workshop. It could be held in two years and collocated
with a major MT event for convenience (for instance, with EAMT 2011
or with MT Summit XIII).
The 3rd Example-Based Machine Translation Workshop was made
possible by the generous sponsorship of four institutions: the EAMT
(http://www.eamt.org), through a
generous award, Dublin City University through their conference
support programme, the Centre for Next Generation Localisation
(http://www.cngl.ie) as the host
institution, and Science Foundation Ireland (http://www.sfi.ie), which is supporting
Mikel L. Forcada during his sabbatical stay at DCU.