Protecting Privacy in the Archives: Supervised Machine Learning and Born-Digital Records

被引:0
|
作者
Hutchinson, Tim [1 ]
机构
[1] Univ Saskatchewan Lib, Univ Arch & Special Collect, Saskatoon, SK, Canada
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
natural language processing; NLP; personal information; PII; digital archives; supervised machine learning; probabilistic classification; Naive Bayes classifier;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper documents the iterations attempted in developing training sets for supervised machine learning relating to identification of documents relating to human resources and containing personal information. Overall, these results show promise, although we have so far been unable to propose a more systematic approach to developing training sets. This suggests that supervised machine learning could be a viable approach for a "triage" method of reviewing collection for restrictions.
引用
收藏
页码:2696 / 2701
页数:6
相关论文
共 50 条