欢迎您访问365答案网,请分享给你的朋友!
生活常识 学习资料

使用cut分箱操作,创建二值响应变量

时间:2023-06-03

import pandas as pdd=pd.read_csv('D:/pandas活用/pandas_for_everyone-master/data/acs_ny.csv')print(d.columns)print('@'*66)print(d.head())

Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren', 'NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers', 'OwnRent', 'YearBuilt', 'HouseCosts', 'ElectricBill', 'FoodStamp', 'HeatingFuel', 'Insurance', 'Language'], dtype='object')@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Acres FamilyIncome FamilyType NumBedrooms NumChildren NumPeople 0 1-10 150 Married 4 1 3 1 1-10 180 Female Head 3 2 4 2 1-10 280 Female Head 4 0 2 3 1-10 330 Female Head 2 1 2 4 1-10 330 Male Head 3 1 2 NumRooms NumUnits NumVehicles NumWorkers OwnRent YearBuilt 0 9 Single detached 1 0 Mortgage 1950-1959 1 6 Single detached 2 0 Rented Before 1939 2 8 Single detached 3 1 Mortgage 2000-2004 3 4 Single detached 1 0 Rented 1950-1959 4 5 Single attached 1 0 Mortgage Before 1939 HouseCosts ElectricBill FoodStamp HeatingFuel Insurance Language 0 1800 90 No Gas 2500 English 1 850 90 No Oil 0 English 2 2600 260 No Oil 6600 Other European 3 1800 140 No Oil 0 English 4 860 150 No Gas 660 Spanish

以下对FamilyIncome 进行分箱操作:

#其中指定要进行分箱操作的列,指定收入在范围为0-150000的为0,150000到收入的最大值范围之间的为1,标签labels使用列表传入值,也可以指定字符串作为标签d['income_15w']=pd.cut(d['FamilyIncome'],[0,150000,d['FamilyIncome'].max()],labels=[0,1])print(d.info())print(d['income_15w'].value_counts())

RangeIndex: 22745 entries, 0 to 22744Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Acres 22745 non-null object 1 FamilyIncome 22745 non-null int64 2 FamilyType 22745 non-null object 3 NumBedrooms 22745 non-null int64 4 NumChildren 22745 non-null int64 5 NumPeople 22745 non-null int64 6 NumRooms 22745 non-null int64 7 NumUnits 22745 non-null object 8 NumVehicles 22745 non-null int64 9 NumWorkers 22745 non-null int64 10 OwnRent 22745 non-null object 11 YearBuilt 22745 non-null object 12 HouseCosts 22745 non-null int64 13 ElectricBill 22745 non-null int64 14 FoodStamp 22745 non-null object 15 HeatingFuel 22745 non-null object 16 Insurance 22745 non-null int64 17 Language 22745 non-null object 18 income_15w 22745 non-null categorydtypes: category(1), int64(10), object(8)memory usage: 3.1+ MBNone@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@0 182941 4451Name: income_15w, dtype: int64

Copyright © 2016-2020 www.365daan.com All Rights Reserved. 365答案网 版权所有 备案号:

部分内容来自互联网,版权归原作者所有,如有冒犯请联系我们,我们将在三个工作时内妥善处理。