python数据可视化分析—matplotlib
文章目录
python数据可视化分析—matplotlib1.numpy简介:2.散点图:3.折线图:4.条形图:5.直方图6.饼状图:7.箱形图:
1.numpy简介:
存储和处理大型矩阵核心数据对象ndarray: np.array() np.arange() np.loadtxt(‘文件名’,delimiter=’,’,skiprows = 1,usecols = (1,4,6),unpack=False) unpack 写为True ,就可以把这些数据分开放调用函数两种方法:min ,max , median , mean(均值) ,variance(方差) ,sort np.min(data) data.min()排序: np.sort(data) #新生成的不改变原来的 data.sort() #这个是更改原来的
2.散点图:
观察两种变量的相关性plt.scatter()四个常见参数: 颜色:c 点的大小:s(面积) 形状:marker 透明度:alpha
3.折线图:
np.linspace(-10,10,100)在两个数之间等分直接调用 plt.plot()plt.plot_date():自动转换成时间的形式,但是不会是折线图,需要加一个“,‘-’”加了一个线的形状:linestyleimport matplotlib.dates as mdates
4.条形图:
plt.bar(left = index,height = y)调整参数:color width水平的条形图:加参数:orientation = ‘horizontal’,然后变left为0,让bottom=index记录报错:plt.bar(x=0,bottom=x,width=y,color=‘red’,height=0.5,orientation=‘horizontal’) plt.barh(y=x,width=y)层叠式: plt.bar(index,sales_BJ,bar_width,color = ‘b’) plt.bar(index,sales_SH,bar_width,color = ‘r’,bottom=sales_BJ)并列式: plt.bar(index,sales_BJ,bar_width,color = ‘b’) plt.bar(index+bar_width,sales_SH,bar_width,color = ‘r’)
5.直方图
plt.hist(x,bins=10,color,normed=True) bins是分组,normed是频率是True,总数是Falseplt.hist2d(x,y,bins=40):双变量的直方图
`
import numpy
as np
import matplotlib
.pyplot
as plt
mu
= 10
sigma
= 3
x
= mu
+ sigma
* np
.random
.randn
(2000)
plt
.hist
(x
, bins
=10,normed
=True)
plt
.hist
(x
, bins
=50,normed
=False)
import numpy
as np
import matplotlib
.pyplot
as plt
x
=np
.random
.randn
(2000)+1
y
=np
.random
.randn
(2000)+5
plt
.hist2d
(x
,y
,bins
=40)
6.饼状图:
labels
= 'A','B','C','D'
fracs
= [15,30,45,10]
explode
= [0,0.05,0.08,0]
plt
.axes
(aspect
= 1)
plt
.pie
(x
= fracs
,labels
= labels
,autopct
='%.0f%%',explode
=explode
,shadow
=True)
7.箱形图:
不常见但是还挺有用的图上边缘,上四分位数,中位数,下四分位数,下边缘,异常值
np
.random
.seed
(100)
data
= np
.random
.normal
(size
=1000,loc
= 0,scale
=1)
plt
.boxplot
(data
,sym
= 'o',whis
=1.5)
np
.random
.seed
(100)
data
= np
.random
.normal
(size
=(1000,4),loc
= 0,scale
=1)
labels
= ['A','B','C','D']
plt
.boxplot
(data
,labels
= labels
)