README.md 4.24 KB
Newer Older
Gaurav Kumar's avatar
Gaurav Kumar committed
1
2
3
4
5
# Names of team/students
Team Victor
* Charu Gupta (220202389)
* Priyanka Mandya Shivashankar (220201250)
* Gaurav Kumar (220200656)
Gaurav Kumar's avatar
Gaurav Kumar committed
6

Gaurav Kumar's avatar
Gaurav Kumar committed
7
8
9
# Baseline study
## Aspect of the reproduction project
This project answers the research question - To what extent can API categories abstract away combined API usage in method abstractions? Measure the effect of relative strength of the combined API usage.
Gaurav Kumar's avatar
Gaurav Kumar committed
10

Gaurav Kumar's avatar
Gaurav Kumar committed
11
## Input data: 
Gaurav Kumar's avatar
Gaurav Kumar committed
12
13
Data obtained by characterizing the abstractions of the repository based on characterization type and dependence relationship.
https://github.com/gorjatschev/applying-apis/blob/main/output/characterization/characterization_Novetta_CLAVIN_method_mcrCategories.csv
Gaurav Kumar's avatar
Gaurav Kumar committed
14

Gaurav Kumar's avatar
Gaurav Kumar committed
15
### Modified Input csv: 
Gaurav Kumar's avatar
Gaurav Kumar committed
16
We have used modified input csv file in order to get different probability values other than 1 with the dataset provided by master thesis.
Gaurav Kumar's avatar
Gaurav Kumar committed
17
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/modified_characterization_Novetta_CLAVIN_method_api.csv
Gaurav Kumar's avatar
Gaurav Kumar committed
18

Gaurav Kumar's avatar
Gaurav Kumar committed
19
20
21
22
## Output data:
From Thesis:
https://github.com/gorjatschev/applying-apis/blob/main/output/visualization/visualization_Novetta_CLAVIN_method_mcrCategories.pdf

Gaurav Kumar's avatar
Gaurav Kumar committed
23
### After measuring the effect through count and percentage:
Gaurav Kumar's avatar
Gaurav Kumar committed
24
25
26
27
With Original Input csv:
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Output/result.csv
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Output/output_logger

Gaurav Kumar's avatar
Gaurav Kumar committed
28
### After Modified Input csv:
Gaurav Kumar's avatar
Gaurav Kumar committed
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/output_images/modified_input_csv.jpg

# Findings of Research Question:
## Process Delta:
Our project takes the input data and applies measures like count and percentage to find the dominant mcr categories instead of visualization through graphs in the original thesis. And use probability to determine a certain api used given the mcr category.

## Output Delta:
We use metrics to measure the dominant categories which results in the same dominant mcr categories as that of the master thesis, but we have eliminated the mcr categories with null values. We have slightly modified the dataset, to show the difference in the probability of a certain api being used given the mcr category.


# Implementation of modification with subsections as follows:

## Hardware requirements:
All versions of windows 10/10+ operating system
MAC OS

## Software requirements
Python 3.0 and above
PyCharm CE
Jupyter Notebook

## Steps of Execution
Gaurav Kumar's avatar
Gaurav Kumar committed
51
* Inside /Process folder, mark it executable using: **chmod +x bash.sh** inside /Process folder
Gaurav Kumar's avatar
Gaurav Kumar committed
52
* then run the bash file : **./bash.sh** to run the project.
Gaurav Kumar's avatar
Gaurav Kumar committed
53
* Or it can be run using python command inside Process folder: python3 repositories_visualizer.py
Gaurav Kumar's avatar
Gaurav Kumar committed
54
55
56
57
58
59
60
61
62
63

# Validation:
Validation can be done based on original input and output file from the thesis by comparing the results of mcr categories. We can feed the dataset for which we want to find the dominant mcr categories by placing your dataset in this [file](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/characterization_Novetta_CLAVIN_method_api.csv).

For example, we have used modified input [file](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/modified_characterization_Novetta_CLAVIN_method_api.csv), in order to get different probability other than 1.

# Data:
We have used output data visualized in the form of graph from thesis and produced count of mcr categories and percentages of dominance over each other which used as the input data of our process. Then we have produced results that the most dominant category is ‘Testing Frameworks’ with percentage of 59.63% which is calculated by excluding null values. Then further data has been used to compare the probabilities of usage of API with respect to mcr category.

# Process:
Gaurav Kumar's avatar
Gaurav Kumar committed
64
In the Process folder, inside repositories_visualizer.py, we have two methods calculate_dominant_mcrcategories () and api_probability_in_mcrcategories () - https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Process/repositories_visualizer.py
Gaurav Kumar's avatar
Gaurav Kumar committed
65
66

calculate_dominant_mcrcategories (): This method calculates the dominant mcr category percentage.
Gaurav Kumar's avatar
Gaurav Kumar committed
67

Gaurav Kumar's avatar
Gaurav Kumar committed
68
api_probability_in_mcrcategories (): This method calculates the probability of an api having a particular mcr category.
Gaurav Kumar's avatar
Gaurav Kumar committed
69
70

In addition we have two other files that we have reused as in the original master thesis.