摘要

Progress in optical character recognition, which underlies most applications of document processing, has been driven mainly by technological advances in microprocessors and optical sensor arrays. Software development based on algorithmic innovations appears to be reaching the point of diminishing returns. Research results, dispersed among a dozen venues, tend to lag behind commercial methodology. Some early main-line applications, like reading typescript, patents and law books, have already become obsolete. Check, postal address, and form processing are on their way out. Open source software may open up niche applications that don't generate enough revenue for commercial developers, including poorly-funded transcription of historical documents (especially genealogical records). Smartphone cameras and wearable technologies are engendering new image-based applications, but there is little evidence of widespread adoption. As document contents are integrated into a web-based continuum of data, they are likely losing even the meager individuality of discrete sheets of paper. The persistent need to create, preserve and communicate information is giving rise to entirely new genres of digital documents with a concomitant need for new approaches to document understanding.

  • 出版日期2016-8-1