A Method of Readability Assessment for Web Documents Using Text Features and HTML Structures

Yamasaki Takahiro<sup>*</sup>; Tokiwa Kin Ichiroh

doi:10.1002/ecj.11565

摘要

This paper describes a method of readability assessment for Web documents. Readability is the ease in which text can be read and understood. We hypothesize that the readability is determined by whether a reader can easily grasp text structures. The impression and complexity of text are significant factors. We extract features of impression and complexity from plain text and additional data, such as HTML tags. In order to compare the effect of extracting features, we assess readability rank by machine learning. We conduct fivefold cross validation for each domain and calculate the root mean squared error between the actual rank and the estimated rank. Cross validation experiments confirm that the performance of our method is high, showing the effectiveness of extracting features about the impression and complexity for readability assessment.

出版日期2014-10

全文

访问全文

收藏分享被引(3) 浏览

更新时间：2021-04-21 06:40

A Method of Readability Assessment for Web Documents Using Text Features and HTML Structures

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友