Slovenian National Corpus
Slovenian National Corpus FidaPLUS is the 621 million words (tokens) corpus of the Slovenian language, gathered from selected texts written in Slovenian of different genres and styles, mainly from books and newspapers.[1]
The FidaPLUS database is an upgrade of the older (FIDA) corpus, which was developed between 1997 and 2000, with added texts that were published up to 2006 and was the result of the applicative research project of the Faculty of Arts, Faculty of Social Sciences, both University of Ljubljana, and Jožef Stefan Institute's Department of Knowledge Technologies.[2]
Corpus is available via a corpus manager Sketch Engine.[3] This version FidaPLUS corpus contains Word sketches, an automatic corpus-derived overview of word's grammatical and collocational behaviour.
Year of publication | Number of words | Percent |
---|---|---|
1979 - 1990 | 262.708 | 0.04% |
1991 | 1.487.895 | 0.24% |
1992 | 2.256.692 | 0.36% |
1993 | 3.208.687 | 0.52% |
1994 | 7.534.689 | 1.21% |
1995 | 7.433.897 | 1.2% |
1996 | 16.913.916 | 2.27% |
1997 | 31.589.250 | 5.09% |
1998 | 43.512.041 | 7.01% |
1999 | 54.711.630 | 8.81% |
2000 | 57.677.534 | 9.29% |
2001 | 74.720.532 | 12.03% |
2002 | 72.802.484 | 11.72% |
2003 | 82.897.097 | 13.35% |
2004 | 67.041.167 | 10.79% |
2005 | 39.086.695 | 6.29% |
2006 | 44.526.825 | 7.17% |
N/A | 13.486.261 | 2,17% |
References
External links
- Slovenian National Corpus website FidaPLUS
- v
- t
- e
English
- American National Corpus
- Bank of English
- Bergen Corpus of London Teenage Language
- British National Corpus
- Brown Corpus
- Buckeye Corpus
- Cambridge English Corpus
- Corpus of Contemporary American English
- Enron Corpus
- EnTenTen
- International Corpus of English
- Lancaster-Oslo-Bergen Corpus
- Oxford English Corpus
- PropBank
- Spoken English Corpus
- Switchboard Telephone Speech Corpus
- TIMIT
- VerbNet
- Wellington Corpus of Spoken New Zealand English
non-English
- Bijankhan Corpus
- CHILDES
- CorCenCC National Corpus of Contemporary Welsh
- Croatian Language Corpus
- Croatian National Corpus
- Czech National Corpus
- Europarl Corpus
- German Reference Corpus
- Hamshahri Corpus
- National Corpus of Polish
- Neo-Assyrian Text Corpus Project
- Persian Speech Corpus
- Quranic Arabic Corpus
- Russian National Corpus
- Scottish Corpus of Texts and Speech
- Slovenian National Corpus
- TalkBank
- Tatoeba
- Tehran Monolingual Corpus
- Tekstaro de Esperanto
- TenTen Corpus Family
- Thesaurus Linguae Graecae