Commit 35d163de authored by Denis Oldenburg's avatar Denis Oldenburg
Browse files

added more histograms

parent 4b958bb5
%% Cell type:code id: tags:
``` python
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
import scipy as sc
sns.set()
```
%% Cell type:code id: tags:
``` python
df=pd.read_json("h-index.json")
df=df[df.h_index.notnull()]
df.head()
```
%% Output
citations citations_last_5_year \
11 18431.0 5524.0
14 5838.0 4603.0
15 41827.0 20595.0
16 1897.0 1246.0
17 87.0 87.0
conference_fields \
11 [software-programming, network-communication, ...
14 [machine-learning]
15 [machine-learning]
16 [hardware-electronics, signal-processing]
17 [hardware-electronics, signal-processing]
conference_name date gender \
11 AAMAS 2018 : International Conference on Auton... 2018-06-18 male
14 AAAI 2018 : AAAI Conference on Artificial Inte... 2018-06-18 female
15 AAAI 2018 : AAAI Conference on Artificial Inte... 2018-06-18 male
16 ACC 2018 : American Control Conference 2018-06-18 male
17 ACC 2018 : American Control Conference 2018-06-18 male
google_scholar_profile h_index \
11 https://scholar.google.com/citations?user=cXkm... 64.0
14 https://scholar.google.com/citations?user=pouy... 34.0
15 https://scholar.google.com/citations?user=0uTu... 88.0
16 https://scholar.google.com/citations?user=z1ru... 22.0
17 https://scholar.google.com/citations?user=iRff... 4.0
h_index_lat_5_year name \
11 40.0 Craig Boutilier
14 31.0 Percy Liang
15 67.0 Zoubin Ghahramani
16 18.0 Ketan Savla
17 4.0 Noah N. Emery
organization
11 Google
14 Stanford University
15 University of Cambridge / Uber
16 University of Southern California
17 Harvard Medical School & Massachusetts Institu...
%% Cell type:code id: tags:
``` python
df.shape
```
%% Output
(113, 11)
%% Cell type:code id: tags:
``` python
women=df.groupby('gender').get_group("female")['h_index']
women_f=df.groupby('gender').get_group("female")['h_index_lat_5_year']
men=df.groupby('gender').get_group("male")['h_index']
men_f=df.groupby('gender').get_group("male")['h_index_lat_5_year']
```
%% Cell type:code id: tags:
``` python
plt.hist(women, bins=5)
plt.show()
np.std(women)
```
%% Output
16.62255009310043
%% Cell type:code id: tags:
``` python
plt.hist(men, bins=5)
plt.show()
np.std(men)
```
%% Output
25.400634469840938
%% Cell type:code id: tags:
``` python
sns.distplot(women, hist=False, rug=True, color="red", label="Women");
sns.distplot(men, hist=False, rug=True, color="blue", label="Men");
plt.show()
```
%% Output
%% Cell type:code id: tags:
``` python
np.mean(women)
```
%% Output
39.80769230769231
%% Cell type:code id: tags:
``` python
np.mean(men)
```
%% Output
45.48275862068966
%% Cell type:markdown id: tags:
Women with an h-index between 10 & 60 are more likely to be invited than men, with a similar reputation.
%% Cell type:code id: tags:
``` python
sns.distplot(women, hist=False, rug=True, color="red", label="Woman Overall", axlabel="h_index");
sns.distplot(women_f, hist=False, rug=True, color="blue", label="Last 5 Years", axlabel="h_index");
plt.show()
sc.stats.entropy(women, women_f)
```
%% Output
0.014306345064714572
%% Cell type:code id: tags:
``` python
sns.distplot(men, hist=False, rug=True, color="red", label="Men Overall", axlabel="h_index");
sns.distplot(men_f, hist=False, rug=True, color="blue", label="Last 5 Years", axlabel="h_index");
plt.show()
sc.stats.entropy(men, men_f)
```
%% Output
0.019537544060561202
%% Cell type:code id: tags:
``` python
dg=df[df['conference_name'].str.contains('2017')]
dh=df[df['conference_name'].str.contains('2018')]
dg.head()
```
%% Output
citations citations_last_5_year \
34 4632.0 2618.0
36 13.0 13.0
37 9898.0 6571.0
38 16341.0 4239.0
82 1239.0 398.0
conference_fields \
34 [computational-theory]
36 [computational-theory]
37 [computational-theory]
38 [network-communication, signal-processing]
82 [machine-learning, computer-vision]
conference_name date gender \
34 ALT 2017 : Algorithmic Learning Theory (ALT) 2018-06-18 male
36 ALT 2017 : Algorithmic Learning Theory (ALT) 2018-06-18 male
37 ALT 2017 : Algorithmic Learning Theory (ALT) 2018-06-18 male
38 APCC 2017 : Asia-Pacific Conference on Communi... 2018-06-18 male
82 CIARP 2017 : Iberoamerican Congress on Pattern... 2018-06-18 female
google_scholar_profile h_index \
34 https://scholar.google.com/citations?user=gRxB... 33.0
36 https://scholar.google.com/citations?user=TIAJ... 2.0
37 https://scholar.google.com/citations?user=GkYI... 46.0
38 https://scholar.google.com/citations?user=f7PI... 57.0
82 https://scholar.google.com/citations?user=s5r4... 18.0
h_index_lat_5_year name organization
34 28.0 Adam Tauman Kalai Microsoft Research
36 2.0 Alexander Rakhlin University of Pennsylvania
37 35.0 Masashi Sugiyama The University of Tokyo
38 27.0 Fumiyuki Adachi Tohoku University
82 10.0 Ingela Nyström Uppsala University
%% Cell type:code id: tags:
``` python
women_s=dh.groupby('gender').get_group("female")['h_index']
women_h=dg.groupby('gender').get_group("female")['h_index']
men_s=dh.groupby('gender').get_group("male")['h_index']
men_h=dg.groupby('gender').get_group("male")['h_index']
plt.hist([women_s, men_s], label=["Women 2017", "Men 2017"])
plt.legend()
plt.show()
women_s.shape[0] / (women_s.shape[0] + men_s.shape[0])
```
%% Output
0.2261904761904762
%% Cell type:code id: tags:
``` python
plt.hist([women_s, men_s], label=["Women 2018", "Men 2018"])
plt.legend()
women_h.shape[0] / (women_h.shape[0] + men_h.shape[0])
```
%% Output
0.2727272727272727
%% Cell type:markdown id: tags:
The percentage of female speakers increased from 2017 to 2018.
%% Cell type:code id: tags:
``` python
sns.distplot(women_s, hist=False, rug=True, color="red", label="Women 2017");
sns.distplot(women_h, hist=False, rug=True, color="blue", label="Women 2018");
plt.show()
```
%% Output
%% Cell type:code id: tags:
``` python
sns.distplot(men_s, hist=False, rug=True, color="red", label="Men 2017");
sns.distplot(men_h, hist=False, rug=True, color="blue", label="Men 2018");
plt.show()
```
%% Output
%% Cell type:code id: tags:
``` python
```
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment