Harvest - a System for Creating Structured Rate Filing Data from Filing PDFs

被引:0
作者
Tekin, Ender [1 ]
You, Qian [2 ]
Conathan, Devin M. [1 ]
Fung, Glenn M. [1 ]
Kneubuehl, Thomas S. [1 ]
机构
[1] Amer Family Mutual Insurance Co SI, Madison, WI 53783 USA
[2] Coupang Corp, Seoul, South Korea
来源
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a machine-learning-guided process that can efficiently extract factor tables from unstructured rate filing documents. Our approach combines multiple deep-learning-based models that work in tandem to create structured representations of tabular data present in unstructured documents such as pdf files. This process combines CNN's to detect tables, language-based models to extract table metadata and conventional computer vision techniques to improve the accuracy of tabular data on the machine-learning side. The extracted tabular data is validated through an intuitive user interface. This process, which we call Harvest, significantly reduces the time needed to extract tabular information from PDF files, enabling analysis of such data at a speed and scale that was previously unattainable.
引用
收藏
页码:12414 / 12422
页数:9
相关论文
共 17 条
[1]  
Beitzel Steven M., 2009, Average R-Precision, P195, DOI DOI 10.1007/978-0-387-39940-9_491
[2]  
Brooke J., 1996, Usability evaluation in industry, V189, P4
[3]   The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images [J].
Casado-Garcia, Angela ;
Dominguez, Cesar ;
Heras, Jonathan ;
Mata, Eloy ;
Pascual, Vico .
DOCUMENT ANALYSIS SYSTEMS, 2020, 12116 :199-215
[4]  
Cesarini F, 2002, INT C PATT RECOG, P236, DOI 10.1109/ICPR.2002.1047838
[5]   Domain Adaptive Faster R-CNN for Object Detection in the Wild [J].
Chen, Yuhua ;
Li, Wen ;
Sakaridis, Christos ;
Dai, Dengxin ;
Van Gool, Luc .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :3339-3348
[6]  
Ganin Y, 2015, PR MACH LEARN RES, V37, P1180
[7]   Fine-grained Recognition in the Wild: A Multi-Task Domain Adaptation Approach [J].
Gebru, Timnit ;
Hoffman, Judy ;
Li Fei-Fei .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1358-1367
[8]   Learning to Detect Tables in Scanned Document Images using Line Information [J].
Kasar, T. ;
Barlas, P. ;
Adam, S. ;
Chatelain, C. ;
Paquet, T. .
2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, :1185-1189
[9]  
Kavasidis I., 2019, SALIENCY BASED CONVO, P292
[10]  
Li M., 2019, TableBank: table benchmark for image- based table detection and recognition