Corpus: lij_wikipedia_2012

Other corpora

Overview

1 General Corpus Information
1.1 Summary
2 Characters and Character N-Grams
2.1.4 Special Characters
2.1.6 Amount of special characters
2.1.7 Repeat Rate - Characters
2.2.3 Word prefixes
2.2.4 Word suffixes
2.2.5 Most frequent word beginnings
2.2.6 Most frequent word endings
2.2.10 Prefixes of Length 3 (between NN Co-occurrences)
2.2.11 Repetitions
2.2.12 Typical Prefixes and Suffixes
3 Words and Multiwords
3.2.1 The Most Frequent 50 Words
3.2.3.1 Longest Words in Top-1.000 by rank
3.2.3.2 Longest Words in Top-10.000 by rank
3.2.3.3 Longest Words in Top-100.000 by rank
3.3.2.1 Frequency of numbers and special patterns I
3.3.2.2 Frequency of numbers and special patterns II
3.3.2.3 Frequency of numbers and special patterns III
3.3.2.4 Numbers in date format (1980-2029)
3.5.1.1 Words by Length without multiplicity
3.5.1.2 Words by Length with multiplicity
3.5.2 Average word length for different frequency ranges
3.5.6 Longest Words
3.6.1 Zipf's law (Standard version)
3.6.2 Zipf's law for words of fixed lengths
3.6.3 Zipf's law for words with same first letter
3.6.4 Zipf's law for words with same last letter
3.6.5 Zipf's law for numbers
3.7.1 String similarity graph for words
3.7.2 String similar words of similar frequency
3.7.3 Distribution of the string similarity for different rank ranges
3.7.4 Node degree vs. word length
3.7.5 Levenshtein similarity examples
3.8.1 Number of letter-N-grams at word beginnings
3.8.2 Number of letter-N-grams at word endings
3.8.4 Number of words of fixed length in different frequency ranges
3.9.1.1 Most Frequent Abbreviations
3.10.1 Text coverage by top words
3.12.1 Words with Hyphens
3.12.2 Multiwords
3.12.3 (Multi-)Words with dots
3.12.4 Words containing special characters
3.12.5 Palindromes
3.12.6 Words with reverse word
3.12.7 Words containing many different characters
3.12.8 Words both in lower and upper case
3.12.9 Problems with sentence segmentation - Words ending in a stopword
3.12.11 Left neighbors of the full stop
3.12.12 Left neighbors of the full stop with additional internal full stops
3.12.13 Compounds
3.13.1 Average Position of Words by Word Length
3.14.1 Growth of types
3.14.2 Type-Token Ratios
4 Sentences
4.1.1 Shortest sentences
4.1.2 Sentences of fixed length I
4.1.3 Sentences of fixed length II
4.1.4 Sentences of fixed length III
4.1.5 Longest sentences
4.2.1 Length of sentences in characters
5 Co-occurrences
5.1.1 Next neighbor co-occurrence summary
6 Sources
6.1.1 Number of sources by time period
6.2.1 Size of Sources
6.2.2 Size of largest domains
6.2.3 Size of different TLDs
6.4.2.1 Word length for different sources
6.4.2.2 Average logarithmic word rank for different sources
6.4.2.8 Sources with low or high average word length of rare words