Applying Natural Language Processing and Hierarchical Machine Learning Approaches to Text Difficulty Classification

被引:27
作者
Balyan, Renu [1 ]
McCarthy, Kathryn S. [2 ]
McNamara, Danielle S. [3 ]
机构
[1] Arizona State Univ, Ira A Fulton Sch Engn, Mesa, AZ 85212 USA
[2] Georgia State Univ, Dept Learning Sci, Atlanta, GA 30303 USA
[3] Arizona State Univ, Dept Psychol, Tempe, AZ 85287 USA
关键词
Text difficulty; Machine learning; Hierarchical classification; Natural language processing; COH-METRIX; READABILITY; CONCRETENESS; ALGORITHMS; IMAGERY;
D O I
10.1007/s40593-020-00201-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
For decades, educators have relied on readability metrics that tend to oversimplify dimensions of text difficulty. This study examines the potential of applying advanced artificial intelligence methods to the educational problem of assessing text difficulty. The combination of hierarchical machine learning and natural language processing (NLP) is leveraged to predict the difficulty of practice texts used in a reading comprehension intelligent tutoring system, iSTART. Human raters estimated the text difficulty level of 262 texts across two text sets (Set A and Set B) in the iSTART library. NLP tools were used to identify linguistic features predictive of text difficulty and these indices were submitted to both flat and hierarchical machine learning algorithms. Results indicated that including NLP indices and machine learning increased accuracy by more than 10% as compared to classic readability metrics (e.g., Flesch-Kincaid Grade Level). Further, hierarchical outperformed non-hierarchical (flat) machine learning classification for Set B (72%) and the combined set A + B (65%), whereas the non-hierarchical approach performed slightly better than the hierarchical approach for Set A (79%). These findings demonstrate the importance of considering deeper features of language related to text difficulty as well as the potential utility of hierarchical machine learning approaches in the development of meaningful text difficulty classification.
引用
收藏
页码:337 / 370
页数:34
相关论文
共 110 条
[1]  
Aggarwal CC., 2012, MINING TEXT DATA, P163, DOI [10.1007/978-1-4614-3223-4, DOI 10.1007/978-1-4614-3223-4]
[2]  
Allen L.K., 2015, P 5 ANN INT LEARNING, P246, DOI DOI 10.1145/2723576.2723617
[3]  
Allen L.K., 2016, the 38th annual meeting of the cognitive science society: Recognizing and representing events, CogSci 2016, P2681, DOI DOI 10.1044/2021_LSHSS-20-00117
[4]  
[Anonymous], 2010, COMM COR STAT STAND
[5]  
[Anonymous], 2012, P 7 WORKSH BUILD ED
[6]  
[Anonymous], 2010, COLING 2010
[7]  
[Anonymous], 1996, Neural networks: A systematic introduction
[8]  
Baayen R.Harald., 1996, CELEX LEXICAL DATABA
[9]  
Babbar R., 2013, P INT C ADV NEUR INF, P1824, DOI DOI 10.1007/S10994-019-05791-5
[10]  
Balyan R., 2018, P 31 ANN FLOR ART IN