About Evrokorpus

Evrokorpus consists of parallel bilingual corpora. In 2002, the English-Slovene corpus was composed from translation memories made in the Translation Unit of the Slovenian Government Office for European Affairs (GOEA) by means of the Trados Translator's Workbench software. In 2006, a German-Slovene corpus was made, followed by a French-Slovene corpus in 2007 and Italian-Slovene and Spanish-Slovene in 2008.
In 2008, the corpus was extended by the inclusion of EU Commission data.
In 2010, data from Trans corpus (compiled by the Department of Translation, Faculty of Arts, University of Ljubljana, Slovenia; mentor: Špela Vintar, Ph.D.) were incorporated into Evrokorpus.
In the same year, the English-Slovene EMEA corpus was also incorporated into Evrokorpus.
In March 2012, the Slovene part of Evrokorpus was lemmatized.

The currently available corpus contains texts in English, French, German, Italian, Slovene and Spanish.
Multilingual corpus contains texts in 22 official languages of the EU.

The target users of Evrokorpus are mostly professional translators. The first internal translation memories were compiled in 1999, while Evrokorpus was first published on the web at the beginning of 2002. Corpus material has been gathered, supplemented and improved in the course of revising translated documents (revision stages: translation, translation check, expert, linguistic and legal revision, finalized or published). Each text segment has a status according to the revision phase completed. Since June 2005, the database has been maintained by the Translation and Interpretation Division of the Secretariat-General of the Government of the Republic of Slovenia.

Evrokorpus contains the following data:

CorpusWordsTranslation units
English-Slovene82 million2,030,000
French-Slovene26 million570,000
German-Slovene13 million340,000
Italian-Slovene11 million270,000
Spanish-Slovene10 million240,000
Multilingual98 million610,000
Maritime (Slovene)1.6 million32,000

Searches can be made for terms and concordances (keyword in context) in the whole corpus or the volume of output can be limited by specifying constraints: one or more fields, document identification number and/or revision phase.
After finding the term, Evrokorpus displays the text segment together with its attributes: Search possibilities are explained in instructions for use.

Evrokorpus was last updated in September 2018.

Please send any comments regarding the corpus to the Main Administrative Office of GSV.