Linguistic documentation of software history

Document Type

Conference Proceeding

Publication Date

7-13-2020

Abstract

Open Source Software (OSS) projects start with an initial vocabulary, often determined by the first generation of developers. Thisvocabulary, embedded in code identifier names and internal codecomments, goes through multiple rounds of change, influenced bythe interrelated patterns of human (e.g., developers joining anddeparting) and system (e.g., maintenance activities) interactions.Capturing the dynamics of this change is crucial for understandingand synthesizing code changes over time. However, existing codeevolution analysis tools, available in modern version control systems such as GitHub and SourceForge, often overlook the linguisticaspects of code evolution. To bridge this gap, in this paper, wepropose to study code evolution in OSS projects through the lensof developers' language, also known as code lexicon. Our analysisis conducted using 32 OSS projects sampled from a broad range ofapplication domains. Our results show that different maintenanceactivities impact code lexicon differently. These insights lay outa preliminary foundation for modeling the linguistic history ofOSS projects. In the long run, this foundation will be utilized toprovide support for basic program comprehension tasks and helpresearchers gain new insights into the complex interplay betweenlinguistic change and various system and human aspects of OSSdevelopment.

Publication Source (Journal or Book title)

IEEE International Conference on Program Comprehension

First Page

386

Last Page

390

This document is currently not available here.

Share

COinS