An overview of the European Union's highly multilingual parallel corpora

作者:Steinberger Ralf*; Ebrahim Mohamed; Poulis Alexandros; Carrasco Benitez Manuel; Schluter Patrick; Przybyszewski Marek; Gilbro Signe
来源:Language Resources and Evaluation, 2014, 48(4): 679-707.
DOI:10.1007/s10579-014-9277-0

摘要

Starting in 2006, the European Commission's Joint Research Centre and other European Union organisations have made available a number of large-scale highly-multilingual parallel language resources. In this article, we give a comparative overview of these resources and we explain the specific nature of each of them. This article provides answers to a number of question, including: What are these linguistic resources? What is the difference between them? Why were they originally created and why was the data released publicly? What can they be used for and what are the limitations of their usability? What are the text types, subject domains and languages covered? How to avoid overlapping document sets? How do they compare regarding the formatting and the translation alignment? What are their usage conditions? What other types of multilingual linguistic resources does the EU have? This article thus aims to clarify what the similarities and differences between the various resources are and what they can be used for. It will also serve as a reference publication for those resources, for which a more detailed description has been lacking so far (EAC-TM, ECDC-TM and DGT-Acquis).

  • 出版日期2014-12