Bug assignment in software engineering is a critical task that heavily relies on bug reports to identify and fix issues. With the advancement of Natural Language Processing (NLP) techniques, researchers have been exploring the effectiveness of textual features in automatically assigning bugs to developers. However, the presence of noise in textual data has posed challenges to the accuracy of bug assignments.
Zexuan Li and his research team conducted a study to investigate the impact of textual and nominal features on bug assignment approaches. They utilized an NLP technique, TextCNN, to analyze the performance of textual features in bug assignments. Surprisingly, the results showed that even with advanced NLP techniques, textual features did not outperform nominal features in bug assignments.
The research team further delved into identifying influential features for bug assignment approaches. Through the use of a statistical perspective, they found that nominal features, which reflect the preferences of developers, played a significant role in achieving competitive results without relying solely on textual data. By employing the wrapper method and a bidirectional strategy, they were able to determine the importance of features in bug assignments.
The study aimed to answer three key questions regarding bug assignments. Firstly, they compared the effectiveness of textual features with deep-learning-based NLP techniques, such as TextCNN. Secondly, they identified and explained the influential features for bug assignments, highlighting the importance of nominal features in reducing the search scope of classifiers. Lastly, they assessed the extent to which the selected influential features could enhance bug assignments.
Despite the limited improvement observed with improved NLP techniques, the selected key features were able to achieve 11-25% accuracy under popular classifiers such as Decision Tree and SVM. The research suggests that future work should focus on incorporating source files to establish a knowledge graph that connects influential features with descriptive words. This approach could lead to better embedding of nominal features and further enhance bug assignment accuracy in software engineering projects.