白人和黑人在求职路上会有种族的歧视吗?
import pandas as pdimport numpy as npfrom scipy import statsdata = pd.io.stata.read_stata('us_job_market_discrimination.dta')data.head()
blacks = data[data.race == 'b']whites = data[data.race == 'w']
black的数据:
whites.call.describe()
blacks.call.describe()
count 2435.000000mean 0.064476std 0.245649min 0.00000025% 0.00000050% 0.00000075% 0.000000max 1.000000Name: call, dtype: float64
white的数据描述:
whites.call.describe()
count 2435.000000mean 0.096509std 0.295346min 0.00000025% 0.00000050% 0.00000075% 0.000000max 1.000000Name: call, dtype: float64
卡方检验
白人获得职位白人被拒绝黑人获得职位黑人被拒绝
假设检验
H0:种族对求职结果没有显著影响H1:种族对求职结果有影响
blacks_called = len(blacks[blacks['call'] == True])#黑人获得职位blacks_not_called = len(blacks[blacks['call'] == False])#黑人被拒绝whites_called = len(whites[whites['call'] == True])#白人获得职位whites_not_called = len(whites[whites['call'] == False])#白人被拒绝
observed = pd.Dataframe({'blacks': {'called': blacks_called, 'not_called': blacks_not_called}, 'whites': {'called' : whites_called, 'not_called' : whites_not_called}})observed
num_called_back = blacks_called + whites_called#获得职位总数num_not_called = blacks_not_called + whites_not_called#没有获得职位的总数print(num_called_back)print(num_not_called)
3924478
rate_of_callbacks = num_called_back / (num_not_called + num_called_back)rate_of_callbacks
0.08049281314168377
expected_called = len(data) * rate_of_callbacksexpected_not_called = len(data) * (1 - rate_of_callbacks)print(expected_called)print(expected_not_called)
391.999999999999944478.0
import scipy.stats as stats#观测值observed_frequencies = [blacks_not_called, whites_not_called, whites_called, blacks_called]#期望值expected_frequencies = [expected_not_called/2, expected_not_called/2, expected_called/2, expected_called/2]#卡方检验stats.chisquare(f_obs = observed_frequencies, f_exp = expected_frequencies)
Power_divergenceResult(statistic=16.879050414270221, pvalue=0.00074839594410972638)
p值小于0.05,拒绝假设H0:种族对求职结果没有显著影响。