Asynchronous (Online) Vocabulary Research Report/Paper (Asynchronous)
Zipfian Distribution and Corpus Frequency Data – Assessing Language Authenticity in English Textbooks and Strategies for Vocabulary Retention
Zipfian distribution, applied to the context of linguistics, suggests that the most common 100 words comprise 50% of most written or spoken texts; 1000 covers 80%; and 5000 covers 98%. Scholars such as Nation (2013) suggest language learners should learn words systematically, meaning that it would be most logical to learn the most common words in a Zipfian way. Corpora - large databases of authentic written or spoken text - can assist EFL material creators through vocabulary frequency rankings. However, it is unknown whether textbook writers use corpus data, at least in Japan. Thus, a simple analysis of the word frequency of three government-approved Junior High School textbooks was undertaken using the CANCODE Corpus. The results showed that there was evidence of a frequency-based allocation, with some abnormalities resulting from regional variations and exam focus.