On Using Machine Learning to Identify Knowledge in API Reference Documentation

被引:17
作者
Fucci, Davide [1 ]
Mollaalizadehbahnemiri, Alireza [1 ]
Maalej, Walid [1 ]
机构
[1] Univ Hamburg, Hamburg, Germany
来源
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING | 2019年
关键词
API documentation; information needs; machine learning;
D O I
10.1145/3338906.3338943
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Using API reference documentation like JavaDoc is an integral part of software development. Previous research introduced a grounded taxonomy that organizes API documentation knowledge in 12 types, including knowledge about the Functionality, Structure, and Quality of an API. We study how well modern text classification approaches can automatically identify documentation containing specific knowledge types. We compared conventional machine learning (k-NN and SVM) with deep learning approaches trained on manually-annotated Java and. NET API documentation (n = 5,574). When classifying the knowledge types individually (i.e., multiple binary classifiers) the best AUPRC was up to 87%. The deep learning and SVM classifiers seem complementary. For four knowledge types (Concept, Control, Pattern, and Non-Information), SVM clearly outperforms deep learning which, on the other hand, is more accurate for identifying the remaining types. When considering multiple knowledge types at once (i.e., multi-label classification) deep learning outperforms naive baselines and traditional machine learning achieving a MacroAUC up to 79%. We also compared classifiers using embeddings pre-trained on generic text corpora and Stack-Overflow but did not observe significant improvements. Finally, to assess the generalizability of the classifiers, we re-tested them on a different, unseen Python documentation dataset. Classifiers for Functionality, Concept, Purpose, Pattern, and Directive seem to generalize from Java and. NET to Python documentation. We discuss our results and how they inform the development of tools for supporting developers sharing and accessing API knowledge.
引用
收藏
页码:109 / 119
页数:11
相关论文
共 49 条
  • [1] [Anonymous], OVERVIEW GRADIENT DE
  • [2] [Anonymous], 2016, MULTIPLE INSTANCE LE
  • [3] [Anonymous], 2006, P ACMSIGKDD INT C KN
  • [4] [Anonymous], 2015, Nature, DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]
  • [5] [Anonymous], EMPIRICAL SOFTWARE E
  • [6] [Anonymous], 2015, SEKE
  • [7] [Anonymous], 2013, FOUND TRENDS SIGNAL, DOI DOI 10.1561/2000000039
  • [8] [Anonymous], 2014, EMNLP
  • [9] [Anonymous], ADV NEURAL INFORM PR
  • [10] [Anonymous], 34 INT C SOFTW ENG I