Determining the Number of Samples Required to Estimate Entropy in Natural Sequences

被引:7
|
作者
Back, Andrew D. [1 ]
Angus, Daniel [1 ]
Wiles, Janet [1 ]
机构
[1] Univ Queensland, Sch ITEE, Brisbane, Qld 4072, Australia
关键词
Shannon entropy; information theory; natural sequences; computational linguistics; LAW; DIVERSITY; LANGUAGE;
D O I
10.1109/TIT.2019.2898412
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Calculating the Shannon entropy for symbolic sequences has been widely considered in many fields. For descriptive statistical problems such as estimating the N-gram entropy of English language text, a common approach is to use as much data as possible to obtain progressively more accurate estimates. However, in some instances, only short sequences may be available. This gives rise to the question of how many samples are needed to compute entropy. In this paper, we examine this problem and propose a method for estimating the number of samples required to compute Shannon entropy for a set of ranked symbolic "natural" events. The result is developed using a modified Zipf-Mandelbrot law and the Dvoretzky-Kiefer-Wolfowitz inequality, and we propose an approximation which yields an estimate for the minimum number of samples required to obtain an estimate of entropy with a given confidence level and degree of accuracy.
引用
收藏
页码:4345 / 4352
页数:8
相关论文
共 50 条
  • [1] Determining the Number of Measurements and Bootstrap Samples Required to Estimate of Long-Term Noise Indicators
    Stepien, Bartlomiej
    ARCHIVES OF ACOUSTICS, 2020, 45 (04) : 613 - 623
  • [2] Determining the number of measurements required to estimate crop residue cover by different methods
    Laamrani, A.
    Joosse, P.
    Feisthauer, N.
    JOURNAL OF SOIL AND WATER CONSERVATION, 2017, 72 (05) : 471 - 479
  • [3] PREDICTING THE REQUIRED NUMBER OF TRAINING SAMPLES
    KALAYEH, HM
    LANDGREBE, DA
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1983, 5 (06) : 664 - 667
  • [4] An estimate of the number of samples to convergence for critic algorithms
    Hrycej, T
    IJCNN 2000: PROCEEDINGS OF THE IEEE-INNS-ENNS INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOL III, 2000, : 227 - 232
  • [5] An estimate method of the minimum entropy of natural languages
    Ren, FJ
    Mitsuyoshi, S
    Yen, K
    Zong, CQ
    Zhu, HB
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, PROCEEDINGS, 2003, 2588 : 382 - 392
  • [6] About the Entropy of a Natural Number and a Type of the Entropy of an Ideal
    Minculete, Nicusor
    Savin, Diana
    ENTROPY, 2023, 25 (04)
  • [7] Number of samples required for estimating herbaceous biomass
    Tsutsumi, Michio
    Itano, Shiro
    Shiyomi, Masae
    RANGELAND ECOLOGY & MANAGEMENT, 2007, 60 (04) : 447 - 452
  • [8] Determining the Complexity of FH/SS Sequences by Fuzzy Entropy
    Chen, Xiaojun
    Li, Zan
    Si, Jiangbo
    Hao, Benjian
    Bai, Baoming
    2011 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2011,
  • [9] On determining the number of outliers in exponential and Pareto samples
    Jeevanand, ES
    Nair, NU
    STATISTICAL PAPERS, 1998, 39 (03) : 277 - 290
  • [10] On determining the number of outliers in exponential and Pareto samples
    E. S. Jeevanand
    N. Unnikrishnan Nair
    Statistical Papers, 1998, 39 : 277 - 290