Searching the Evrokorpus
A search in the corpus database can be made in two ways: on the left side of the main page there is a simple user interface, beneath which is a link to the advanced search.
To use the simple search just enter the search word(s) in the input field and select the appropriate corpus. The program uses the following search strategy:
- a search in the corpus is made in both languages
- a search is made on a string level (in order to search for words, add a space both in front of and after the search string, e.g., enter " act " (without quotes) to search for the word "act")
- results are ranked on the basis of the quality of translations
(translations at the highest revision stage are listed first)
- output is bilingual
- if the search word is found in Evroterm, a link to the Evroterm data is made and both the search word and its translation are coloured blue on the corpus output screen.
If you use advanced search, the search word(s) can be entered in
English/French/German/Italian/Spanish or/and Slovene. Additionally, search results can be limited to:
The default output is bilingual: the first part of each hit consists of the header data (field, revision stage and ID number), followed by aligned translation units (usually sentences) in source and target languages. A monolingual output can be selected (KWIC - KeyWord In
Context - in this case, there are only up to 50 characters to the left and right of the search word), or only the number of hits can be shown.
- one or more fields (if more than one is to be selected, the
Control or Ctrl key should be pressed and then the desired
field clicked by a mouse)
- minimum revision stage
- full or partial ID number (Celex number for EU-related acts, SOP number, treaty number or Official Journal number for Slovene legal acts).
Results are sorted on the basis of the quality of translations (translations at the highest revision stage appear on the top of the list). The ID number is shown on the right side of the header of each hit - it indicates the act from which this particular sentence was taken. If this link is clicked, the whole document will be shown.
When making a search, the following wildcards can be used:
Tips on using the corpus
- _ . ? (underline, dot or question mark) can substitute for any
single character; for instance, if you want to get all hits containing the word organisation and organization with a single query, then the term organi_ation (or organi.ation or organi?ation) can be entered in the input field.
- % * (percent sign or asterisk) can substitute for any number of characters; for instance, if you want to find hits containing additional words between illicit and drug, then the search query should be written as illicit%drug (or illicit*drug); in addition to the usual word combinations with illicit drug, terms such as illicit trafficking in narcotic drugs are also obtained.
If you want to check how many times the search query has been translated in a particular way, you can just click the links provided in the detailed output of Evroterm (e.g., there are five possible translations of the word sustainable into Slovene; however, according to Evrokorpus results, only one of them seems to be widely used). If the search query is not found in the Evroterm database, you can switch to advanced search in Evrokorpus and then enter the search query in one language and possible translations (one by one) in another language. This will give the frequency of use of a particular translation.
If you want to see the bilingual aligned version of a particular act, enter the appropriate ID (e.g., Celex) number and put the most frequent English word (e.g. the) into the "Search query in English" field. This should result in the major part of this particular act; however, it is true that the sequence of sentences on the corpus output page is not usually the same as in the original document.
The corpus can sometimes provide an answer on punctuation. If you want to check whether there is a comma in front of the English word "unless", then the program can first count all hits (enter " unless" as a search query - without double quotes; the space in front of the word is important because it eliminates cases in which "unless" appears at the beginning of the sentence) and the program can then count hits with a comma preceding the word (enter ", unless" (without double quotes) in the entry field). The difference between these two numbers indicates the number of cases in which the search word was not preceded by a comma.
Please send any comments regarding the corpus to the Main Administrative Office of GSV.