DATING MEDIEVAL ENGLISH CHARTERS

被引:6
作者
Tilahun, Gelila [1 ]
Feuerverger, Andrey [1 ]
Gervers, Michael [2 ]
机构
[1] Univ Toronto, Dept Stat, Toronto, ON M5S 3G3, Canada
[2] Univ Toronto, Dept Hist, Toronto, ON M5S 3G3, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Bandwidth selection; cross-validation; medieval charters; DEEDS data set; generalized linear models; kernel smoothing; local log-likelihood; maximum prevalence method; nearest neighbor methods (kNN); quantile regression; text mining; DISTANCE;
D O I
10.1214/12-AOAS566
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Deeds, or charters, dealing with property rights, provide a continuous documentation which can be used by historians to study the evolution of social, economic and political changes. This study is concerned with charters (written in Latin) dating from the tenth through early fourteenth centuries in England. Of these, at least one million were left undated, largely due to administrative changes introduced by William the Conqueror in 1066. Correctly dating such charters is of vital importance in the study of English medieval history. This paper is concerned with computer-automated statistical methods for dating such document collections, with the goal of reducing the considerable efforts required to date them manually and of improving the accuracy of assigned dates. Proposed methods are based on such data as the variation over time of word and phrase usage, and on measures of distance between documents. The extensive (and dated) Documents of Early England Data Set (DEEDS) maintained at the University of Toronto was used for this purpose.
引用
收藏
页码:1615 / 1640
页数:26
相关论文
共 30 条
[1]  
[Anonymous], 1999, Local regression and likelihood
[2]  
[Anonymous], 1994, Kernel smoothing
[3]  
Berry M. W., 2005, Understanding Search Engines: Mathematical Modeling and Text Retrieval
[4]   On the resemblance and containment of documents [J].
Broder, AZ .
COMPRESSION AND COMPLEXITY OF SEQUENCES 1997 - PROCEEDINGS, 1998, :21-29
[5]  
de Jong F, 2005, HUMANITIES, COMPUTERS AND CULTURAL HERITAGE, P161
[6]  
Djeraba C., 2003, Multimedia Mining - A Highway to Intelligent Multimedia Documents
[7]  
Domingos P., 1996, Proceedings of the 13th International Conference on Machine Learning, P105
[8]  
Fan JQ, 2000, WILEY SER PROB STAT, P229
[9]   Distance measures and smoothing methodology for imputing features of documents [J].
Feuerverger, A ;
Hall, P ;
Tilahun, G ;
Gervers, M .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2005, 14 (02) :255-262
[10]  
Feuerverger A., 2008, Inst. Math. Stat. Collect., V1, P321, DOI [10.1214/193940307000000248, DOI 10.1214/193940307000000248]