A general segmentation scheme for DjVu document compression

被引:0
作者
Haffner, P [1 ]
Bottou, L [1 ]
LeCun, Y [1 ]
Vincent, L [1 ]
机构
[1] AT&T Labs Res, Middletown, NJ 07748 USA
来源
MATHEMATICAL MORPHOLOGY, PROCEEDINGS | 2002年
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We describe the "DjVu" (Deja Vu) technology: an efficient document image compression methodology, a file format, and a delivery platform that together, enable instant access to high quality documents from essentially any platform, over any connection. Originally developed for scanned color documents, it was recently expanded to electronic documents, so DjVu has now truly become a universal document interchange format. With DjVu, a color magazine page scanned at 300dpi typically occupies between 40KB and 80KB, i.e. approximately 5 to 10 times smaller than JPEG for a similar level of readability (the typical compression ratio is 500:1). Converting electronic documents to DjVu also offers substantial advantages, as described in the paper. The technology relies on a classification of each pixel as either foreground (text, drawing) or background (pictures, paper texture and color), thereby producing a segmentation into layers that are compressed separately. The novel contribution of this paper is a unified approach for segmentation of scanned or electronic documents, using a rigorous approach based on the Minimum Description Length (MDL) principle. The foreground layer is compressed using a pattern matching technique taking advantage of the similarities between character shapes. A progressive, wavelet-based compression technique, combined with a masking algorithm, is then used to compress the background image at lower resolution, while minimizing the number of bits spent on the pixels that are otherwise covered by foreground pixels. Encoders, decoders, and real-time, memory efficient plug-ins for various web browsers are available for all the major platforms.
引用
收藏
页码:17 / 36
页数:20
相关论文
共 20 条
[1]   MEANS FOR ACHIEVING A HIGH DEGREE OF COMPACTION ON SCAN-DIGITIZED PRINTED TEXT [J].
ASCHER, RN ;
NAGY, G .
IEEE TRANSACTIONS ON COMPUTERS, 1974, C 23 (11) :1174-1179
[2]   High quality document image compression with "DjVU" [J].
Bottou, L ;
Haffner, P ;
Howard, PG ;
Simard, P ;
Bengio, Y ;
LeCun, Y .
JOURNAL OF ELECTRONIC IMAGING, 1998, 7 (03) :410-425
[3]   The Z-Coder adaptive binary coder [J].
Bottou, L ;
Howard, PG ;
Bengio, Y .
DCC '98 - DATA COMPRESSION CONFERENCE, 1998, :13-22
[4]  
BOTTOU L, 2001, P INT C DOC AN REC S
[5]  
BOTTOU L, 1998, P IEEE DATA COMPRESS
[6]  
HAFFNER P, 1999, P ICIP 99
[7]   Text image compression using soft pattern matching [J].
Howard, PG .
COMPUTER JOURNAL, 1997, 40 (2-3) :146-156
[8]  
INGLIS S, 1999, THESIS U WAIKATO
[9]  
LECUN Y, 2001, INTERNET IMAGING JUN
[10]  
MACLEOD PS, 1998, Patent No. 5778092