Developer Identity Linkage and Behavior Mining Across GitHub and StackOveflow

作者:Xiong, Yunxiang; Meng, Zhangyuan; Shen, Beijun*; Yin, Wei
来源:International Journal of Software Engineering and Knowledge Engineering, 2017, 27(9-10): 1409-1425.
DOI:10.1142/S0218194017400034

摘要

Nowadays, software developers are increasingly involved in GitHub and StackOverflow, creating a lot of valuable data in the two communities. Researchers mine the information in these software communities to understand developer behaviors, while previous works mainly focus on mining data within a single community. In this paper, we propose a novel approach to developer identity linkage and behavior mining across GitHub and StackOverflow. This approach links the accounts from two communities using a CART decision tree, leveraging the features from usernames, user behaviors and writing styles. Then, it explores cross-site developer behaviors through T-graph analysis, LDA-based topics clustering and cross-site tagging. We conducted several experiments to evaluate this approach. The results show that the precision and F-score of our identity linkage method are higher than previous methods in software communities. Especially, we discovered that (1) active issue committers are also active question askers; (2) for most developers, the topics of their contents in GitHub are similar to those of those questions and answers in StackOverflow; (3) developers' concerns in StackOverflow shift over the time of their current participating projects in GitHub; (4) developers' concerns in GitHub are more relevant to their answers than questions and comments in StackOverflow.