The System for Efficient Indexing and Search in the Large Archives of Scanned Historical Documents

被引:1
作者
Bulin, Martin [1 ]
Svec, Jan [1 ]
Ircing, Pavel [1 ]
机构
[1] Univ West Bohemia, Dept Cybernet, Plzen, Czech Republic
来源
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2023, PT III | 2023年 / 13982卷
关键词
Indexing; GUI design; Scanned documents;
D O I
10.1007/978-3-031-28241-6_15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The paper introduces software capable of indexing and searching large archives of scanned historical documents. The system capabilities are demonstrated on the collection containing documents from the archives of the post-Soviet security services. The backend of the system was designed with a focus on flexibility (it is actually already being used for other related tasks) and scalability to larger volumes of data. The graphical user interface design has been consulted with historians interested in using the archived documents and was developed in several iterations, gradually including the changes induced both by the user's requests and by our improving knowledge about the nature of the processed data.
引用
收藏
页码:206 / 210
页数:5
相关论文
共 8 条
[1]  
Chylek A, 2019, INTERSPEECH, P3663
[2]  
Gruber Ivan, 2021, Speech and Computer: 23rd International Conference, SPECOM 2021, Proceedings. Lecture Notes in Computer Science, Lecture Notes in Artificial Intelligence (12997), P226, DOI 10.1007/978-3-030-87802-3_21
[3]  
Gruber Ivan, 2020, Speech and Computer. 22nd International Conference, SPECOM 2020. Proceedings. Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science (LNAI 12335), P166, DOI 10.1007/978-3-030-60276-5_17
[4]  
Institute for Study of the Totalitarian Regimes, 2022, US
[5]   System for fast lexical and phonetic spoken term detection in a Czech cultural heritage archive [J].
Psutka, Josef ;
Svec, Jan ;
Psutka, Josef V. ;
Vanek, Jan ;
Prazak, Ales ;
Smidl, Lubos ;
Ircing, Pavel .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, :1-11
[6]   An overview of the tesseract OCR engine [J].
Smith, Ray .
ICDAR 2007: NINTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2007, :629-633
[7]  
Stanislav P, 2016, INTERSPEECH, P2352
[8]  
Zajic Z, 2018, P 11 INT C LANGUAGE