您现在的位置是:网站首页> 编程资料编程资料
Pandas如何对Categorical类型字段数据统计实战案例_python_
2023-05-26
408人已围观
简介 Pandas如何对Categorical类型字段数据统计实战案例_python_
一、Pandas如何对Categorical类型字段数据统计
实战场景:对Categorical类型字段数据统计,Categorical类型是Pandas拥有的一种特殊数据类型,这样的类型可以包含基于整数的类别展示和编码的数据
1.1主要知识点
- 文件读写
- 基础语法
- Pandas
- read_csv
实战:
1.2创建 python 文件
import pandas as pd #读取csv文件 df = pd.read_csv("Telco-Customer-Churn.csv") # 填充 TotalCharges 的缺失值 median = df["TotalCharges"][df["TotalCharges"] != ' '].median() df.loc[df["TotalCharges"] == ' ', 'TotalCharges'] = median df["TotalCharges"] = df["TotalCharges"].astype(float) # 将分类列转换成 Categorical 类型 number_columns = ['tenure', 'MonthlyCharges', 'TotalCharges'] for column in number_columns: df[column] = df[column].astype(float) #对三列变成float类型 for column in set(df.columns) - set(number_columns): df[column] = pd.Categorical(df[column]) print(df.info()) print(df.describe(include=["category"]))1.3运行结果
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null category
1 gender 7043 non-null category
2 SeniorCitizen 7043 non-null category
3 Partner 7043 non-null category
4 Dependents 7043 non-null category
5 tenure 7043 non-null float64
6 PhoneService 7043 non-null category
7 MultipleLines 7043 non-null category
8 InternetService 7043 non-null category
9 OnlineSecurity 7043 non-null category
10 OnlineBackup 7043 non-null category
11 DeviceProtection 7043 non-null category
12 TechSupport 7043 non-null category
13 StreamingTV 7043 non-null category
14 StreamingMovies 7043 non-null category
15 Contract 7043 non-null category
16 PaperlessBilling 7043 non-null category
17 PaymentMethod 7043 non-null category
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null float64
20 Churn 7043 non-null category
dtypes: category(18), float64(3)
memory usage: 611.1 KB
None
customerID gender SeniorCitizen Partner ... Contract PaperlessBilling PaymentMethod Churn
count 7043 7043 7043 7043 ... 7043 7043 7043 7043
unique 7043 2 2 2 ... 3 2 4 2
top 0002-ORFBO Male 0 No ... Month-to-month Yes Electronic check No
freq 1 3555 5901 3641 ... 3875 4171 2365 5174[4 rows x 18 columns]
二、Pandas如何从股票数据找出收盘价最低行
实战场景:Pandas如何从股票数据找出收盘价最低行
2.1主要知识点
- 文件读写
- 基础语法
- Pandas
- read_csv
2.2创建 python 文件
""" 数据是CSV格式 1、加载到dataframe 2、找出收盘价最低的索引 3、根据索引找出数据行4 打印结果数据行 """ import pandas as pd df = pd.read_csv("./00700.HK.csv") df["Date"] = pd.to_datetime(df["Date"]) df["Year"] = df["Date"].dt.year df["Month"] = df["Date"].dt.month print(df) print(df.groupby("Year")["Close"].mean()) print(df.describe())2.3运行结果
Date Open High Low Close Volume Year Month
0 2021-09-30 456.000 464.600 453.800 461.400 17335451 2021 9
1 2021-09-29 461.600 465.000 450.200 465.000 18250450 2021 9
2 2021-09-28 467.000 476.200 464.600 469.800 20947276 2021 9
3 2021-09-27 459.000 473.000 455.200 464.600 17966998 2021 9
4 2021-09-24 461.400 473.400 456.200 460.200 16656914 2021 9
... ... ... ... ... ... ... ... ...
4262 2004-06-23 4.050 4.450 4.025 4.425 55016000 2004 6
4263 2004-06-21 4.125 4.125 3.950 4.000 22817000 2004 6
4264 2004-06-18 4.200 4.250 3.950 4.025 36598000 2004 6
4265 2004-06-17 4.150 4.375 4.125 4.225 83801500 2004 6
4266 2004-06-16 4.375 4.625 4.075 4.150 439775000 2004 6[4267 rows x 8 columns]
Year
2004 4.338686
2005 6.568927
2006 15.865951
2007 37.882724
2008 54.818367
2009 96.369679
2010 157.299598
2011 189.737398
2012 228.987045
2013 337.136066
2014 271.291498
2015 144.824291
2016 176.562041
2017 291.066667
2018 372.678862
2019 346.225203
2020 479.141129
2021 586.649189
Name: Close, dtype: float64
三、Pandas如何给股票数据新增年份和月份
实战场景:Pandas如何给股票数据新增年份和月份
3.1主要知识点
- 文件读写
- 基础语法
- Pandas
- Pandas的Series对象
- DataFrame
实战:
3.2创建 python 文件
""" 给股票数据新增年份和月份 """ import pandas as pd df = pd.read_csv("./00100.csv") print(df) # to_datetime变成时间类型 df["Date"] = pd.to_datetime(df["Date"]) df["Year"] = df["Date"].dt.year df["Month"] = df["Date"].dt.month print(df)3.3运行结果
Date Open High Low Close Volume
0 2021-09-30 456.000 464.600 453.800 461.400 17335451
1 2021-09-29 461.600 465.000 450.200 465.000 18250450
2 2021-09-28 467.000 476.200 464.600 469.800 20947276
3 2021-09-27 459.000 473.000 455.200 464.600 17966998
4 2021-09-24 461.400 473.400 456.200 460.200 16656914
... ... ... ... ... ... ...
4262 2004-06-23 4.050 4.450 4.025 4.425 55016000
4263 2004-06-21 4.125 4.125 3.950 4.000 22817000
4264 2004-06-18 4.200 4.250 3.950 4.025 36598000
4265 2004-06-17 4.150 4.375 4.125 4.225 83801500
4266 2004-06-16 4.375 4.625 4.075 4.150 439775000[4267 rows x 6 columns]
Date Open High Low Close Volume Year Month
0 2021-09-30 456.000 464.600 453.800 461.400 17335451 2021 9
1 2021-09-29 461.600 465.000 450.200 465.000 18250450 2021 9
2 2021-09-28 467.000 476.200 464.600 469.800 20947276 2021 9
3 2021-09-27 459.000 473.000 455.200 464.600 17966998 2021 9
4 2021-09-24 461.400 473.400 456.200 460.200 16656914 2021 9
... ... ... ... ... ... ... ... ...
4262 2004-06-23 4.050 4.450 4.025 4.425 55016000 2004 6
4263 2004-06-21 4.125 4.125 3.950 4.000 22817000 2004 6
4264 2004-06-18 4.200 4.250 3.950 4.025 36598000 2004 6
4265 2004-06-17 4.150 4.375 4.125 4.225 83801500 2004 6
4266 2004-06-16 4.375 4.625 4.075 4.150 439775000 2004 6[4267 rows x 8 columns]
四、Pandas如何获取表格的信息和基本数据统计
实战场景:Pandas如何获取表格的信息和基本数据统计
4.1主要知识点
- 文件读写
- 基础语法
- Pandas
- Pandas的Series对象
- numpy
实战:
4.2创建 python 文件
import pandas as pd import numpy as np df = pd.DataFrame( data={ "norm": np.random.normal(loc=0, scale=1, size=1000), "uniform": np.random.uniform(low=0, high=1, size=1000), "binomial": np.random.binomial(n=1, p=0.2, size=1000)}, index=pd.date_range(start='2021-01-01', periods=1000)) # df.info(),查看多少行,多少列,类型等基本信息 # df.describe(),查看每列的平均值、最小值、最大值、中位数等统计信息; print(df.info()) print() print(df.describe())4.3运行结果
D
相关内容
- Pandas如何将表格的前几行生成html实战案例_python_
- python基础教程之csv文件的写入与读取_python_
- Python Pandas 修改表格数据类型 DataFrame 列的顺序案例_python_
- Pandas中Series的创建及数据类型转换_python_
- python类参数定义及数据扩展方式unsqueeze/expand_python_
- python用opencv将标注提取画框到对应的图像中_python_
- Python基础之类的定义和使用详解_python_
- Python 变量教程私有变量详解_python_
- Python 变量教程字节对象与字符串_python_
- Pandas操作MySQL的方法详解_python_
