A Fused Forensic Text Comparison System Using Lexical Features, Word and Character N-grams A Likelihood Ratio-based Analysis in Predatory Chatlog Messages

被引:0
作者
Ishihara, Shunichi [1 ]
机构
[1] Australian Natl Univ, Dept Linguist, Canberra, ACT, Australia
来源
2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI) | 2014年
关键词
forensic text comparison; likelihood ratio; logistic-regression fusion; log likelihood ratio cost; 95% credible intervals; Tippett plot; N-grams; lexical features; PROBABILISTIC EVALUATION; HANDWRITING EVIDENCE; AUTHORSHIP ANALYSIS; IDENTIFICATION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This study investigates the degree that the performance of a likelihood ratio (LR)-based forensic text comparison (FTC) system improves by using logistic-regression fusion on LRs that were separately estimated by three different procedures, involving lexical features, word-based N-grams and character-based N-grams. This study uses predatory chatlog messages. The number of words used for modelling each group of messages is 500 words. The performance of the FTC system is assessed in terms of its validity (= accuracy) and reliability (= precision) using the log-likelihood-ratio cost (Cllr) and 95% credible intervals (CI), respectively. It is demonstrated that 1) out of the three procedures, the lexical features procedure performed best in terms of Cllr; and that 2) the fused system outperformed all three of the single procedures. The Cllr value of the fused system is better than that of the procedure with lexical features by a value of 0.14. It is also reported that the validity and reliability of a system is negatively correlated; the fused system that yielded the best result in terms of Cllr has the worst CI value.
引用
收藏
页码:2762 / 2768
页数:7
相关论文
共 27 条
[1]  
Aitken C.G.G., 1991, The use of Statistics in Forensic Science
[2]  
Aitken C.G. G., 2020, Statistics and the evaluation of evidence for forensic scientists
[3]   Evaluation of trace evidence in the form of multivariate data [J].
Aitken, CGG ;
Lucy, D .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2004, 53 :109-122
[4]  
[Anonymous], AUSTR LANG TECHN ASS
[5]  
[Anonymous], NAT C REC TRENDS COM
[6]  
[Anonymous], P 10 AUSTR INT C SPE
[7]  
[Anonymous], 2000, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition
[8]  
[Anonymous], P 7 INT WORKSH DIG F
[9]   Probabilistic evaluation of handwriting evidence: likelihood ratio for authorship [J].
Bozza, Silvia ;
Taroni, Franco ;
Marquis, Raymond ;
Schmittbuhl, Matthieu .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2008, 57 :329-341
[10]   Application-independent evaluation of speaker detection [J].
Brümmer, N ;
du Preez, J .
COMPUTER SPEECH AND LANGUAGE, 2006, 20 (2-3) :230-275