欢迎您访问365答案网,请分享给你的朋友!
生活常识 学习资料

pandas库的学习记录

时间:2023-05-19

目录

1、pandas 解决什么问题

以下面的例子认识dataframe

columns 的介绍

2、表格数据的读写 read and write tabular data

读数据

写数据

 3、数据表子集的操作

4、绘图 create plots in pandas

单数据,plt 绘制

 多数据,O-O style

 5、 create new columns  and 列的名称修改

列的重新命名

 6、calculate summary statistics 列表数据信息统计

Aggregating statistics

汇总按类别分组的统计信息 Aggregating statistics grouped by category

 Count number of records by category

7、排序

Sort table rows 按某列的元素对表格排序

8、combine data from multiple tables 合并

​ 多个表格按行列合并

连接两个表格  merge


1、pandas 解决什么问题

What kind of data does pandas handle?
When working with tabular data(表格数据), such as data stored in spreadsheets or databases, pandas is the right tool for you、pandas will help you to explore, clean, and process your data.

In pandas, a data table(数据表) is called a Dataframe.

以下面的例子认识dataframe

import numpy as npimport pandas as pddf = pd.Dataframe( { "name": ["Braund, Mr、Owen Harris", "Allen, Mr、William Henry", "Bonnell, Miss、Elizabeth",], "age": [22, 35, 58], "sex": ["male", "female", "male"] })print(df)print(df.describe()) # 只针对数字类型的数据

 

 A Dataframe is a 2-dimensional data structure that can store data of different types (including characters, integers, floating point values, categorical data and more) in columns(列). 

columns 的介绍

Each column in a Dataframe is a Series

import numpy as npimport pandas as pddf = pd.Dataframe( { "name": ["Braund, Mr、Owen Harris", "Allen, Mr、William Henry", "Bonnell, Miss、Elizabeth",], "age": [22, 35, 58], "sex": ["male", "female", "male"] })print(df["age"])

 单纯的series

2、表格数据的读写 read and write tabular data

读数据

import numpy as npimport pandas as pdti_data = pd.read_excel("titanic.xlsx") # 读取 excel 数据print(ti_data) # 打印各列数据的类型print(ti_data.dtypes)print(ti_data.head(3)) # 只看头三个数据print(ti_data.tail(2)) # 末尾 两个

写数据

import numpy as npimport pandas as pddf = pd.Dataframe( { "name": ["Braund, Mr、Owen Harris", "Allen, Mr、William Henry", "Bonnell, Miss、Elizabeth",], "age": [22, 35, 58], "sex": ["male", "female", "male"] })'''写数据'''df.to_excel("df.xlsx")

 3、数据表子集的操作

原列表

import numpy as npimport pandas as pdt_data = pd.read_excel('df.xlsx')print(t_data)age = t_data[["age"]] # 选择特定的列print(age)age30 = t_data[t_data["age"] > 30] #选择某个数值进行筛选print(age30)'''行列综合操作'''print('键')sex_age = t_data.loc[t_data["age"] > 30, 'age']print(sex_age)print('坐标')row_col = t_data.iloc[1:2,1:3]print(row_col)

4、绘图 create plots in pandas

单数据,plt 绘制

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltt_data = pd.read_excel('df.xlsx')t_data["ID"] = [1,2,3] #增加了一列ID 数值1,2,3print(t_data)fig = t_data["age"].plot()fig.set_title("age")plt.show()

 多数据,O-O style

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltt_data = pd.read_excel('df.xlsx')t_data["ID"] = [1,2,3] #增加了一列ID 数值1,2,3print(t_data)fig, axs = plt.subplots(figsize=(12, 4))t_data.plot(ax=axs)axs.set_title("age and ID")plt.show()

 

 5、 create new columns  and 列的名称修改

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltt_data = pd.read_excel('df.xlsx')print(t_data)print('修改后的表格')t_data["ID"] = [1,2,3] #增加了一列ID 数值1,2,3t_data["age's cubic"] = t_data["age"] **3print(t_data)

 

列的重新命名

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltt_data = pd.read_excel('df.xlsx')print(t_data)print('修改后的表格')#修改列的名字t_data = t_data.rename( columns ={ "age":"年龄" })print(t_data)

 

 6、calculate summary statistics 列表数据信息统计

Aggregating statistics

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltt_data = pd.read_excel('df.xlsx')print(t_data)print("mean of age: ",t_data["age"].mean())print(t_data.describe())

 

汇总按类别分组的统计信息 Aggregating statistics grouped by category

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltt_data = pd.read_excel('df.xlsx')t_data["ID"] = [1,2,3] #增加了一列ID 数值1,2,3t_data["age's cubic"] = t_data["age"] **3print(t_data)#按 name 进行统计#group = t_data.groupby("name").mean()group = t_data[["age","ID","name"]].groupby("name").mean()print(group)

 

 Count number of records by category

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltt_data = pd.read_excel('df.xlsx')t_data["ID"] = [1,2,3] #增加了一列ID 数值1,2,3t_data["age's cubic"] = t_data["age"] **3print(t_data)print('first way')print(t_data["age"].value_counts())print('second way')print(t_data.groupby("age")["age"].count()) # 之前学的按group 进行统计

 

7、排序

Sort table rows 按某列的元素对表格排序

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltt_data = pd.read_excel('df.xlsx')print(t_data)#按年龄进行排序 顺序print(t_data.sort_values(by="age"))#按年龄进行排序 逆序print(t_data.sort_values(by="age",ascending=False))

8、combine data from multiple tables 合并

 多个表格按行列合并

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltdata1 = pd.Dataframe( { "name": ["Braund, Mr、Owen Harris", "Allen, Mr、William Henry", "Bonnell, Miss、Elizabeth", ] })print(data1)data2 = pd.Dataframe( { "age": [22, 35, 58], "sex": ["male", "female", "male"] })print(data2)#合并data = pd.concat([data1,data2],axis=1) #axis = 0, 列print(data)

 

 

连接两个表格  merge

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltdata1 = pd.Dataframe( { "name": ["Braund, Mr、Owen Harris", "Allen, Mr、William Henry", "Bonnell, Miss、Elizabeth", ], "age": [22, 35, 58] })print(data1)data2 = pd.Dataframe( { "age": [22, 35, 58], "sex": ["male", "female", "male"] })print(data2)#按其中的 age 列 进行 mergedata = pd.merge(data1,data2,how='left',on='age')print(data)

 

 还有用多个列的参数进行合并的操作

Copyright © 2016-2020 www.365daan.com All Rights Reserved. 365答案网 版权所有 备案号:

部分内容来自互联网,版权归原作者所有,如有冒犯请联系我们,我们将在三个工作时内妥善处理。