Commit 87753340 authored by Gaurav Kumar's avatar Gaurav Kumar
Browse files

Merge branch 'main' of gitlab.uni-koblenz.de:ggaurav/applying-apis into main

parents bde473e4 b2766c4f
......@@ -10,23 +10,23 @@ This project answers the research question - To what extent can API categories a
## Input data:
Data obtained by characterizing the abstractions of the repository based on characterization type and dependence relationship.
https://github.com/gorjatschev/applying-apis/blob/main/output/characterization/characterization_Novetta_CLAVIN_method_mcrCategories.csv
Here is the input file [characterization_Novetta_CLAVIN_method_api.csv](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Process/file/characterization/characterization_Novetta_CLAVIN_method_api.csv).
### Modified Input csv:
We have used modified input csv file in order to get different probability values other than 1 with the dataset provided by master thesis.
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/modified_characterization_Novetta_CLAVIN_method_api.csv
[modified_characterization_Novetta_CLAVIN_method_api.csv](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/modified_characterization_Novetta_CLAVIN_method_api.csv)
## Output data:
From Thesis:
https://github.com/gorjatschev/applying-apis/blob/main/output/visualization/visualization_Novetta_CLAVIN_method_mcrCategories.pdf
[visualization_Novetta_CLAVIN_method_mcrCategories.pdf](https://github.com/gorjatschev/applying-apis/blob/main/output/visualization/visualization_Novetta_CLAVIN_method_mcrCategories.pdf)
### After measuring the effect through count and percentage:
With Original Input csv:
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Output/result.csv
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Output/output_logger
[result.csv](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Output/result.csv)
[output_logger](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Output/output_logger)
### After Modified Input csv:
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/output_images/modified_input_csv.jpg
[modified_input_csv.jpg](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/output_images/modified_input_csv.jpg)
# Findings of Research Question:
## Process Delta:
......@@ -43,25 +43,26 @@ All versions of windows 10/10+ operating system
MAC OS
## Software requirements
Python 3.0 and above
Python 3.9.6 (plotly==5.1.0, pyspark==3.1.2)
PyCharm CE
Jupyter Notebook
## Steps of Execution
* Inside /Process folder, mark it executable using: **chmod +x bash.sh** inside /Process folder
* then run the bash file : **./bash.sh** to run the project.
* Or it can be run using python command inside Process folder: python3 repositories_visualizer.py
# Validation:
Validation can be done based on original input and output file from the thesis by comparing the results of mcr categories. We can feed the dataset for which we want to find the dominant mcr categories by placing your dataset in this [file](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/characterization_Novetta_CLAVIN_method_api.csv).
Validation can be done based on original input and output file from the thesis by comparing the results of mcr categories. We can feed the dataset for which we want to find the dominant mcr categories by placing your dataset in this file [characterization_Novetta_CLAVIN_method_api.csv](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/characterization_Novetta_CLAVIN_method_api.csv).
For example, we have used modified input [file](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/modified_characterization_Novetta_CLAVIN_method_api.csv), in order to get different probability other than 1.
For example, we have used modified input file [modified_characterization_Novetta_CLAVIN_method_api.csv](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/modified_characterization_Novetta_CLAVIN_method_api.csv), in order to get different probability other than 1.
# Data:
We have used output data visualized in the form of graph from thesis and produced count of mcr categories and percentages of dominance over each other which used as the input data of our process. Then we have produced results that the most dominant category is ‘Testing Frameworks’ with percentage of 59.63% which is calculated by excluding null values. Then further data has been used to compare the probabilities of usage of API with respect to mcr category.
# Process:
In the Process folder, inside repositories_visualizer.py, we have two methods calculate_dominant_mcrcategories () and api_probability_in_mcrcategories (). In addition we have two other files that we have reused as in the original master thesis - https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Process/repositories_visualizer.py
In the Process folder, inside repositories_visualizer.py, we have two methods calculate_dominant_mcrcategories () and api_probability_in_mcrcategories () - [repositories_visualizer.py](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Process/repositories_visualizer.py)
* calculate_dominant_mcrcategories (): This method calculates the dominant mcr category percentage.
* api_probability_in_mcrcategories (): This method calculates the probability of an api having a particular mcr category.
calculate_dominant_mcrcategories (): This method calculates the dominant mcr category percentage.
api_probability_in_mcrcategories (): This method calculates the probability of an api having a particular mcr category.
In addition we have two other files that we have reused as in the original master thesis.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment