Analysis and understanding of multi-class invoices

被引:29
作者
F. Cesarini
E. Francesconi
M. Gori
G. Soda
机构
[1] DSI, Università di Firenze, 50139 Firenze, Via S. Marta
[2] DII, Università di Siena, 53100 Siena, Via Roma
来源
Document Analysis and Recognition | 2003年 / 6卷 / 2期
关键词
Class Knowledge (CK); Class-Dependent Domain Knowledge (CDDK); Class-independent domain knowledge (CIDK); Layout structure; Logical object; Logical structure; Physical object;
D O I
10.1007/s10032-002-0084-6
中图分类号
学科分类号
摘要
In this paper a system for processing documents that can be grouped into classes is illustrated. We have considered invoices as a case-study. The system is divided into three phases: document analysis, classification, and understanding. We illustrate the analysis and understanding phases. The system is based on knowledge constructed by means of a learning procedure. The experimental results demonstrate the reliability of our document analysis and understanding procedures. They also present evidence that it is possible to use a small learning set of invoices to obtain reliable knowledge for the understanding phase. © Springer-Verlag 2003.
引用
收藏
页码:102 / 114
页数:12
相关论文
共 27 条
[1]  
Appiani E., Cesarini F., Colla A.M., Diligenti M., Gori M., Marinai S., Soda G., Automatic document classification and indexing in high-volume applications, Int J Doc Anal Recognition, 4, 2, pp. 69-83, (2001)
[2]  
Blasius K.-H., Grawemeyer B., John I., Kuhn N., Knowledge-based document analysis, Proc. International Conference on Document Analysis and Recognition, pp. 728-731, (1997)
[3]  
Cesarini F., Francesconi E., Gori M., Marinai S., Sheng J.Q., Soda G., Aneural based architecture for spot-noisy logo recognition, Proc. International Conference on Document Analysis and Recognition, pp. 175-179, (1997)
[4]  
Cesarini F., Francesconi E., Gori M., Soda G., A two-level knowledge approach for understanding documents of a multi-class domain, Proc. International Conference on Document Analysis and Recognition, pp. 135-138, (1999)
[5]  
Cesarini F., Francesconi E., Gori M., Soda G., Using physical and logical constraints for invoice understanding, Special Issue on Document Image Analysis and Recognition, Pattern Anal Appl, 3, 2, pp. 182-195, (2000)
[6]  
Cesarini F., Gori M., Marinai S., Soda G., INFORMys: Aflexible invoice-like form reader system, IEEE Trans Pattern Anal Mach Intell, 20, 7, pp. 730-745, (1998)
[7]  
Chhabra A.K., Misra V., Arias J., Detection of horizontal lines in noisy run length encoded images: The fast method, Graphics Recognition-Methods and Applications, pp. 35-48, (1996)
[8]  
Dengel A., Bleisinger R., Hoch R., Fein F., Hones F., From paper to office document standard representation, IEEE Comput, pp. 63-67, (1992)
[9]  
Dengel A., Dubiel F., Clustering and classification of document stucture: A machine learning approach, Proc. International Conference on Document Analysis and Recognition, pp. 587-591, (1995)
[10]  
Esposito F., Malerba D., Semeraro G., Multistrategy learning for document recognition, Appl Artif Intell, 8, 1, pp. 33-83, (1994)