Weakly Supervised PoS Tagging

The code for my work on transfering PoS annotation is available on this page.


Here is our re-implementation of the HyTER metric used in our NAACL'18 paper.

Please cite the following paper if your using this code:

    title = "Automated Paraphrase Lattice Creation for {H}y{TER} Machine Translation Evaluation",
    author = "Apidianaki, Marianna  and
      Wisniewski, Guillaume  and
      Cocos, Anne  and
      Callison-Burch, Chris",
    booktitle = "Proceedings of the 2018 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)",
    month = jun,
    year = "2018",
    address = "New Orleans, Louisiana",
    publisher = "Association for Computational Linguistics",
    url = "",
    doi = "10.18653/v1/N18-2077",
    pages = "480--485",
    abstract = "We propose a variant of a well-known machine translation (MT) evaluation metric, HyTER (Dreyer and Marcu, 2012), which exploits reference translations enriched with meaning equivalent expressions. The original HyTER metric relied on hand-crafted paraphrase networks which restricted its applicability to new data. We test, for the first time, HyTER with automatically built paraphrase lattices. We show that although the metric obtains good results on small and carefully curated data with both manually and automatically selected substitutes, it achieves medium performance on much larger and noisier datasets, demonstrating the limits of the metric for tuning and evaluation of current MT systems.",

Low-latency diarization System

Here is the code of the low-latency diarization system I am developing the context of the Odessa project. This is an ad-hoc work in progress and I hope that a more complete version will be available soon.

Dependency transformations

Here is the code to automatically modify the representation of several syntactic constructions defined by the UD guidelines we used in our UDW'17 paper.

Please cite the following paper if you are using this code

  author    = {Wisniewski, Guillaume  and  Lacroix, Oph\'{e}lie},
  title     = {A Systematic Comparison of Syntactic Representations of Dependency Parsing},
  booktitle = {Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)},
  month     = {May},
  year      = {2017},
  address   = {Gothenburg, Sweden},
  publisher = {Association for Computational Linguistics},
  pages     = {146--152},
  url       = {}


An implementation of the n-gram posteriors described in [De Guispert et al, 2013]. Please cite the following paper if you are using this code:

author    = {Wisniewski, Guillaume  and  P\'{e}cheux, Nicolas  and  Allauzen, Alexander  and  Yvon, Fran\c{c}ois},
title     = {LIMSI Submission for WMT'14 QE Task},
booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation},
month     = {June},
year      = {2014},
address   = {Baltimore, Maryland, USA},
publisher = {Association for Computational Linguistics},
pages     = {348--354},
url       = {}


The Trace corpus on translation error: this corpus contains almost 7,000 French to English and 7,000 English to French translations and their post-editions by professionals translators.