Big data-assisted urban governance: A comprehensive system for business documents classification of the government hotline
被引:1
|
作者:
Zhang, Zicheng
论文数: 0引用数: 0
h-index: 0
机构:
Nanjing Univ Posts & Telecommun, Sch Modern Posts, Nanjing 210003, Peoples R China
Nanjing Univ, Sch Informat Management, Nanjing 210023, Peoples R China
Knowledge Serv, Jiangsu Key Lab Data Engn, Nanjing 210023, Peoples R China
Nanjing Huiningjie Informat Technol Co Ltd, Nanjing 210023, Peoples R ChinaNanjing Univ Posts & Telecommun, Sch Modern Posts, Nanjing 210003, Peoples R China
Zhang, Zicheng
[1
,2
,3
,6
]
Li, Anguo
论文数: 0引用数: 0
h-index: 0
机构:
Beihang Univ, Sino French Engineer Sch, Beijing, Peoples R ChinaNanjing Univ Posts & Telecommun, Sch Modern Posts, Nanjing 210003, Peoples R China
Li, Anguo
[4
]
Wang, Li
论文数: 0引用数: 0
h-index: 0
机构:
Nanjing Univ, Sch Business, Nanjing 210093, Peoples R ChinaNanjing Univ Posts & Telecommun, Sch Modern Posts, Nanjing 210003, Peoples R China
Wang, Li
[5
]
Cao, Wei
论文数: 0引用数: 0
h-index: 0
机构:
Nanjing Huiningjie Informat Technol Co Ltd, Nanjing 210023, Peoples R ChinaNanjing Univ Posts & Telecommun, Sch Modern Posts, Nanjing 210003, Peoples R China
Cao, Wei
[6
]
Yang, Jianlin
论文数: 0引用数: 0
h-index: 0
机构:
Nanjing Univ, Sch Informat Management, Nanjing 210023, Peoples R China
Knowledge Serv, Jiangsu Key Lab Data Engn, Nanjing 210023, Peoples R ChinaNanjing Univ Posts & Telecommun, Sch Modern Posts, Nanjing 210003, Peoples R China
Yang, Jianlin
[2
,3
]
机构:
[1] Nanjing Univ Posts & Telecommun, Sch Modern Posts, Nanjing 210003, Peoples R China
[2] Nanjing Univ, Sch Informat Management, Nanjing 210023, Peoples R China
[3] Knowledge Serv, Jiangsu Key Lab Data Engn, Nanjing 210023, Peoples R China
[4] Beihang Univ, Sino French Engineer Sch, Beijing, Peoples R China
[5] Nanjing Univ, Sch Business, Nanjing 210093, Peoples R China
[6] Nanjing Huiningjie Informat Technol Co Ltd, Nanjing 210023, Peoples R China
Government hotline;
Text classification;
New words;
TF-IDF;
Information entropy;
Nested balanced binary tree;
D O I:
10.1016/j.engappai.2024.107997
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
The government service platform, exemplified by the government hotline, has to handle extensive volumes of business documents that contain rich and timely public opinion information and citizens' demands. However, manual processing struggles to process large-scale text data, adversely impacting operating costs and the quality of government services. This study proposes a comprehensive system for business document classification of the government hotline (BDCGHS) in China to address these challenges. BDCGHS leverages information entropy fused with term frequency-inverse document frequency (TF-IDF) weight to mine new words from business documents of the government hotline, and store them in a new word repository. These new words optimize Chinese word segmentation and text representation for text classification. We introduce a novel data structure called nested balanced binary tree to expedite new word mining, yielding a computational speed of almost five times than the Trie trees. Comparative experiments on the THUNews and government hotline datasets validate our proposed improvement BDCGHS algorithm's superior performance 3 % over text classification algorithms. Compared to the latest bidirectional encoder representations from the transformers (BERT) model, BDCGHS enhances the accuracy of order dispatch based on business documents by almost 3 %. It has also demonstrated stable operations in two Chinese cities for over a year, yielding favorable results.