Skill requirements in job advertisements: A comparison of skill-categorization methods based on wage regressions

被引:19
作者
Ao, Ziqiao [1 ]
Horvath, Gergely [2 ]
Sheng, Chunyuan [3 ]
Song, Yifan [4 ]
Sun, Yutong [4 ]
机构
[1] Northwestern Univ, McCormick Sch Engn, 633 Clark St, Evanston, IL 60208 USA
[2] Duke Kunshan Univ, Div Social Sci, 8 Duke Ave, Kunshan 215316, Jiangsu, Peoples R China
[3] Grad Inst Geneva, Chem Eugene Rigot 2, CH-1202 Geneva, Switzerland
[4] Duke Kunshan Univ, Div Nat Sci, 8 Duke Ave, Kunshan 215316, Jiangsu, Peoples R China
关键词
Text analytics; Topic modeling; Skill extraction; Job advertisements; Wage regressions; LDA;
D O I
10.1016/j.ipm.2022.103185
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we compare different methods to extract skill demand from the text of job de-scriptions. We propose the fraction of wage variation explained by the extracted skills as a novel performance metric for the comparison of methods. Using this, we compare the performance of the word-counting method with three different dictionaries and that of three unsupervised topic -modeling techniques, the LDA, the PLSA and the BERTopic. We apply these methods to a U.K. job board dataset of 1,158,926 job advertisements from 35 industries collected in 2018. We find that each of the dictionary-based methods explain about 20% of the wage variation across jobs. The topic modeling techniques perform better as the PLSA is able to explain 36.5% of the wage variation, while BERTopic 32.6%. The best performing method is the LDA with 48.3% of the wage variation explained. Its disadvantage, however, is in the difficulty of interpretation of the skills extracted.
引用
收藏
页数:16
相关论文
共 53 条
[1]   Gender stereotypes in job advertisements: What do they imply for the gender salary gap? [J].
Arceo-Gomez, Eva O. ;
Campos-Vazquez, Raymundo M. ;
Badillo, Raquel Y. ;
Lopez-Araiza, Sergio .
JOURNAL OF LABOR RESEARCH, 2022, 43 (01) :65-102
[2]   The Evolution of Work in the United States [J].
Atalay, Enghin ;
Phongthiengtham, Phai ;
Sotelo, Sebastian ;
Tannenbaum, Daniel .
AMERICAN ECONOMIC JOURNAL-APPLIED ECONOMICS, 2020, 12 (02) :1-34
[3]   Concentration in US labor markets: Evidence from online vacancy data [J].
Azar, Jose ;
Marinescu, Ioana ;
Steinbaum, Marshall ;
Taska, Bledi .
LABOUR ECONOMICS, 2020, 66
[4]  
Barde BV, 2017, 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), P745, DOI 10.1109/ICCONS.2017.8250563
[5]   Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints [J].
Bastani, Kaveh ;
Namavari, Hamed ;
Shaffer, Jeffrey .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 127 :256-271
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]  
Bothmer K., 2022, P LEARN ID C
[8]   Mining Labor Market Requirements Using Distributional Semantic Models and Deep Learning [J].
Botov, Dmitriy ;
Klenin, Julius ;
Melnikov, Andrey ;
Dmitrin, Yuri ;
Nikolaev, Ivan ;
Vinel, Mikhail .
BUSINESS INFORMATION SYSTEMS, BIS 2019, PT II, 2019, 354 :177-190
[9]   What 5 Million Job Advertisements Tell Us about Testing: a Preliminary Empirical Investigation [J].
Cerioli, Maura ;
Leotta, Maurizio ;
Ricca, Filippo .
PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, :1586-1594
[10]  
Chaturvedi S., 2021, IZA DISCUSSION PAPER