Commit c5b7a191 authored by Gaurav Kumar's avatar Gaurav Kumar
Browse files


parent 2a3b1276
Pipeline #28573 failed with stages
in 0 seconds
# About
This repository contains the code underlying the master thesis "Applying API Categories to the Abstractions Using APIs" (2021) written by Katharina Gorjatschev in the Software Languages Team at the Computer Science Department of the University of Koblenz and Landau. be filled
This readme was created to help the MSR 2021/22 course to understand the code structure easier.
# Input data
Here we make use of the repositories_with_dependencies.csv file which uses the java repositories that use Apache Maven as a build automation tool from Github and their POM files that has the dependency list to determine the similarity between the repositories.
# Code structure
The code consists of Java and Python code. Java is used for data collection and parsing, Python is used for analysis and visualization (because of PySpark and Plotly). All Java code is executed from the `` file. There are four Python files. Three of them are independent of each other and all have an own main function for the execution of the code. The last file `` is just a utils file.
### 1. Collection of repositories
Collects repositories from GitHub.
* Where: `` (line 51)
* Involved files: ``, ``
### 2. Collection and analysis of the dependencies of the collected repositories
This step is performed to decide which repositories and dependencies are worth looking into (collection performed in Java, analysis performed in Python).
* Where: `` (line 56 and 61), `` (line 125-130)
* Involved files: ``, ``, ``, ``, ``, ``
### 3. Selection of repositories
Selects the repositories from the collected repositories that actually will be parsed and analysed.
* Where: `` (line 66)
* Involved files: ``, ``
### 4. Parsing of repositories
Downloads Java files, collects dependencies from POM files, collects their MCR categories and MCR tags, downloads the dependencies, parses the Java files, resolves class usages in the Java files, and allocates dependencies to the found API usages.
* Where: `` (line 70f)
* Involved files: ``, ``, ``, ``, ``, ``, ``
### 5. Analysis
Analyses repositories by selecting all API usages and summarizing them in package, class, and method abstractions. Afterwards, analyses those repositories based on two dependencies (dependency pair) by counting the API usages of the dependencies and sampling abstractions for manual analysis/classification.
* Where: `` (line 222-224)
* Involved files: ``, ``
### 6. Visualization
Characterizes abstractions of repositories and visualizes the repositories.
* Where: `` (line 156-165)
* Involved files: ``, ``
# Software
* Java 11 (Maven project)
* Python 3.9.6 (plotly==5.1.0, pyspark==3.1.2)
Python 3.9.6 (plotly==5.1.0, pyspark==3.1.2)
For running the python files, we have used PyCharm CE
# Notes
You need to create a personal access token in your GitHub account and then replace the `USERNAME_AND_TOKEN` in `` with your username and token.
# Steps of Execution
* Inside /src/main folder, mark it executable using: **chmod +x** inside /src/main folder
* then run the bash file : **./** to run the project.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment