Commit ae3a4f90 authored by Gaurav Kumar's avatar Gaurav Kumar
Browse files

Update README.md

parent d84dc8d2
# About
...to be filled
# Names of team/students
Team Victor
* Charu Gupta (220202389)
* Priyanka Mandya Shivashankar (220201250)
* Gaurav Kumar (220200656)
# Input data
Here we make use of the characterization_Novetta_CLAVIN_method_api.csv file which uses the java repositories that use Apache Maven as a build automation tool from Github and their POM files that has the dependency list to determine the dominant mcr category for the repositories.
# Baseline study
## Aspect of the reproduction project
This project answers the research question - To what extent can API categories abstract away combined API usage in method abstractions? Measure the effect of relative strength of the combined API usage.
##Input data:
Data obtained by characterizing the abstractions of the repository based on characterization type and dependence relationship.
https://github.com/gorjatschev/applying-apis/blob/main/output/characterization/characterization_Novetta_CLAVIN_method_mcrCategories.csv
# Software
Python 3.9.6 (plotly==5.1.0, pyspark==3.1.2)
For running the python files, we have used PyCharm CE
## Modified Input csv: We have used modified input csv file in order to get different probability values other than 1 with the dataset provided by master thesis.
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/modified_characterization_Novetta_CLAVIN_method_api.csv
# Steps of Execution
## Output data:
From Thesis:
https://github.com/gorjatschev/applying-apis/blob/main/output/visualization/visualization_Novetta_CLAVIN_method_mcrCategories.pdf
## After measuring the effect through count and percentage:
With Original Input csv:
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Output/result.csv
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Output/output_logger
## After Modified Input csv:
https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/output_images/modified_input_csv.jpg
# Findings of Research Question:
## Process Delta:
Our project takes the input data and applies measures like count and percentage to find the dominant mcr categories instead of visualization through graphs in the original thesis. And use probability to determine a certain api used given the mcr category.
## Output Delta:
We use metrics to measure the dominant categories which results in the same dominant mcr categories as that of the master thesis, but we have eliminated the mcr categories with null values. We have slightly modified the dataset, to show the difference in the probability of a certain api being used given the mcr category.
# Implementation of modification with subsections as follows:
## Hardware requirements:
All versions of windows 10/10+ operating system
MAC OS
## Software requirements
Python 3.0 and above
PyCharm CE
Jupyter Notebook
## Steps of Execution
* Inside /src/main folder, mark it executable using: **chmod +x bash.sh** inside /src/main folder
* then run the bash file : **./bash.sh** to run the project.
# Validation:
Validation can be done based on original input and output file from the thesis by comparing the results of mcr categories. We can feed the dataset for which we want to find the dominant mcr categories by placing your dataset in this [file](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/characterization_Novetta_CLAVIN_method_api.csv).
For example, we have used modified input [file](https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Data/Input/modified_characterization_Novetta_CLAVIN_method_api.csv), in order to get different probability other than 1.
# Data:
We have used output data visualized in the form of graph from thesis and produced count of mcr categories and percentages of dominance over each other which used as the input data of our process. Then we have produced results that the most dominant category is ‘Testing Frameworks’ with percentage of 59.63% which is calculated by excluding null values. Then further data has been used to compare the probabilities of usage of API with respect to mcr category.
# Process:
In the Process folder, inside repositories_visualizer.py, we have two methods calculate_dominant_mcrcategories () and api_probability_in_mcrcategories (). In addition we have two other files that we have reused as in the original master thesis - https://gitlab.uni-koblenz.de/ggaurav/applying-apis/-/blob/main/Process/repositories_visualizer.py
calculate_dominant_mcrcategories (): This method calculates the dominant mcr category percentage.
api_probability_in_mcrcategories (): This method calculates the probability of an api having a particular mcr category.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment