The problem of cheating in handwritten academic essays has become more significant over the past few years. One type of cheating involves submitting the same paper, photographed in a different environment (for example, from another angle, in a different light, or in lower quality) or changed by automatic augmentation. The existing methods for detecting near-duplicates are not designed to work on large collections of handwritten documents, which significantly limits their use in practice. A machine learning-based method is presented that enables the detection of near-duplicate handwritten text images among large collections of potential sources. The proposed approach consists of three stages: converting the image into a vector representation, searching for candidates, and then selecting the source of duplication among the candidates. Our method achieved 80% and 59% recall-at-1 with false positive rate of 4.8% and 5.5% on Synthetic and Real data, respectively. The search latency is 5.5 seconds per query for a collection of 10 000 images. The results showed that the developed method is sufficiently robust to solve problems that require checking large collections of handwritten documents for cheating.
机构:
Dhanalakshmi Srinivasan Coll Engn & Technol, ECR, Madras 603104, Tamil Nadu, IndiaDhanalakshmi Srinivasan Coll Engn & Technol, ECR, Madras 603104, Tamil Nadu, India
Thaiyalnayaki, S.
Sasikala, J.
论文数: 0引用数: 0
h-index: 0
机构:
Annamalai Univ, Chidambaram 608002, IndiaDhanalakshmi Srinivasan Coll Engn & Technol, ECR, Madras 603104, Tamil Nadu, India
Sasikala, J.
Ponraj, R.
论文数: 0引用数: 0
h-index: 0
机构:
Dhanalakshmi Srinivasan Coll Engn & Technol, ECR, Madras 603104, Tamil Nadu, IndiaDhanalakshmi Srinivasan Coll Engn & Technol, ECR, Madras 603104, Tamil Nadu, India