Linguistic documentation of software history
Document Type
Conference Proceeding
Publication Date
7-13-2020
Abstract
Open Source Software (OSS) projects start with an initial vocabulary, often determined by the first generation of developers. Thisvocabulary, embedded in code identifier names and internal codecomments, goes through multiple rounds of change, influenced bythe interrelated patterns of human (e.g., developers joining anddeparting) and system (e.g., maintenance activities) interactions.Capturing the dynamics of this change is crucial for understandingand synthesizing code changes over time. However, existing codeevolution analysis tools, available in modern version control systems such as GitHub and SourceForge, often overlook the linguisticaspects of code evolution. To bridge this gap, in this paper, wepropose to study code evolution in OSS projects through the lensof developers' language, also known as code lexicon. Our analysisis conducted using 32 OSS projects sampled from a broad range ofapplication domains. Our results show that different maintenanceactivities impact code lexicon differently. These insights lay outa preliminary foundation for modeling the linguistic history ofOSS projects. In the long run, this foundation will be utilized toprovide support for basic program comprehension tasks and helpresearchers gain new insights into the complex interplay betweenlinguistic change and various system and human aspects of OSSdevelopment.
Publication Source (Journal or Book title)
IEEE International Conference on Program Comprehension
First Page
386
Last Page
390
Recommended Citation
Tushev, M., & Mahmoud, A. (2020). Linguistic documentation of software history. IEEE International Conference on Program Comprehension, 386-390. https://doi.org/10.1145/3387904.3389288