APPLICABILITY OF READABILITY FORMULAE TO THE MEASUREMENT OF SENTENCE-LEVEL READABILITY
Readability formulae have been widely used for measuring the readability of a text (T-R), but not for measuring the readability of a single sentence (S-R). The measurement of T-R helps collect reading materials and test materials that suit the proficiency of the learners of English as a foreign language (EFL), and thus the readability formulae are considered as a useful method in EFL teaching and test development. However, sentences in a text could be at the different readability levels, and the validity of these readability formulae is unclear for measuring the S-R. In EFL teaching, a teacher has to identify the S-R in order to find sentences that might be easy or difficult for the EFL learners. In test development, a teacher or a test developer has to prevent the S-R of single-sentence test materials from adversely affecting the test results. To accomplish these goals, it is necessary to assess the validity of the readability formulae for measuring the S-R. The purpose of this paper is to report how well the S-R measured with the previously proposed readability formulae correlate with the S-R judged by 90 EFL learners on 80 sentences.
First, the previously proposed readability formulae, which used the surface linguistic variables such as the length of a sentence and a word, that is, Flesch Reading Ease score (FRES) (Flesch 1946) and Flesch-Kincaid Grade Level (FKGL) (Flesch 1949), were examined with respect to the correlation between the S-R measured with these readability formulae and the S-R judged by the EFL learners. Spearman rank correlation analyses showed that the S-R judged by the EFL learners were weakly correlated both with the S-R measured with FRES (r=-0.49, p<0.01) and with the S-R measured with FKGL (r=0.62, p<0.01). The weak correlation suggests that the readability formulae using surface linguistic variables need further improvements for measuring the S-R for EFL learners.
Then, the other readability formula was constructed using surface and deep linguistic variables proposed as Coh-Metrix indices (Graesser et al. 2004). Of 60 linguistic variables, 29 linguistic variables, which are effective as linguistic variables within a sentence, were chosen. Taking the S-R judged by the 29 linguistic variables as independent variables, and the EFL learners as a dependent variable, a step-wise multiple regression analysis was carried out to derive an S-R formula. The multiple regression analysis showed strong correlation (r=0.89, p<0.01). The results also showed that 5 linguistic variables explained 79.2% of the variance (F(5,74)=55.29, p<0.01). A variable, which most significantly contributed to the explanation of the S-R judged by the EFL learners, was the number of words in a sentence (β=0.68, p<0.01). This result suggests that the surface linguistic variable primarily accounts for the S-R judged by the EFL learners. The other variables, which significantly contributed to the explanation of the S-R judged by the EFL learners, were the following deep linguistic variables: log frequency of content words in a sentence (β=-0.31, p<0.01), FKGL (β=0.20, p<0.01), the number of words before the main verb (β=-0.19, p<0.01), and the incidence of a negative additive connective (β=0.14, p<0.01). In conclusion, we found that the readability formula using both surface and deep linguistic variables are appropriate for measuring the S-R for EFL learners.