Applying Machine Learning for Automatic Product Categorization

被引:1
作者
Roberson, Andrea [1 ]
机构
[1] US Census Bur, 4600 Silver Hill Rd, Washington, DC 20233 USA
关键词
Text analytics; artificial intelligence; data collection;
D O I
10.2478/JOS-2021-0017
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Every five years, the U.S. Census Bureau conducts the Economic Census, the official count of US businesses and the most extensive collection of data related to business activity. Businesses, policymakers, governments and communities use Economic Census data for economic development, business decisions, and strategic planning. The Economic Census provides key inputs for economic measures such as the Gross Domestic Product and the Producer Price Index. The Economic Census requires businesses to fill out a lengthy questionnaire, including an extended section about the goods and services provided by the business. To address the challenges of high respondent burden and low survey response rates, we devised a strategy to automatically classify goods and services based on product information provided by the business. We asked several businesses to provide a spreadsheet containing Universal Product Codes and associated text descriptions for the products they sell. We then used natural language processing to classify the products according to the North American Product Classification System. This novel strategy classified text with very high accuracy rates our best algorithms surpassed over 90%.
引用
收藏
页码:395 / 410
页数:16
相关论文
共 32 条
  • [1] LASER: A Scalable Response Prediction Platform For Online Advertising
    Agarwal, Deepak
    Long, Bo
    Traupman, Jonathan
    Xin, Doris
    Zhang, Liang
    [J]. WSDM'14: PROCEEDINGS OF THE 7TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2014, : 173 - 182
  • [2] [Anonymous], 1997, 1602 AI MIT
  • [3] [Anonymous], 2000, NATURE STAT LEARNING, DOI DOI 10.1007/978-1-4757-3264-1
  • [4] Bahassine S., 2018, J KING SAUD U COMPUT, P1319, DOI [10.1016/j.jksuci.2018.05.010, DOI 10.1016/J.JKSUCI.2018.05.010]
  • [5] Bast H., 2005, SIGIR 2005. Proceedings of the Twenty-Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P11, DOI 10.1145/1076034.1076040
  • [6] Blanz V., 1996, Artificial Neural Networks - ICANN 96. 1996 International Conference Proceedings, P251
  • [7] Cost-sensitive Learning for Large-scale Hierarchical Classification
    Chen, Jianfu
    Warren, David
    [J]. PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1351 - 1360
  • [8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [9] Eyheramendy S., 2003, P 9 WORKSH ART INT J, P93
  • [10] Grandini M., 2020, ARXIV PREPRINT ARXIV