diff --git a/Figure_3.png b/Figure_3.png new file mode 100644 index 00000000..6c91c381 Binary files /dev/null and b/Figure_3.png differ diff --git "a/docs/views/data/2021-11-16-python\345\237\272\347\241\200\347\237\245\350\257\206.md" "b/docs/views/data/2021-11-16-python\345\237\272\347\241\200\347\237\245\350\257\206.md" new file mode 100644 index 00000000..d66586ea --- /dev/null +++ "b/docs/views/data/2021-11-16-python\345\237\272\347\241\200\347\237\245\350\257\206.md" @@ -0,0 +1,116 @@ +--- +layout: post +title: python的基本语法知识及部分函数 +date: 2021-11-12 +author: yuping0924 +categories: + - 数据分析部 +tags: + - python +--- + + 例如: + temp=input("猜猜我心里想的是哪个数字"), + +其中temp 是变量名,= 执行赋值操作,input为输入函数,()必须要有,""在要打印字符串时必须加上。 打印的函数为 print(), + + 例如: + print("你猜对啦"), + 其用法与 input 基本相同。 + +#### 变量赋值和字符串: + + 例如:a = 'A'+'B' + 打印:AB + + 例如:'Let \'s go' + 打印:Let's go + + + 例如:a = r'C:\now' + 打印:C:\now + 第二可以再 \ 后面直接加转义字符 \ + + 例如:a = 'C:\\now' + 打印:C:\now + +如果需要得到一个跨多行的字符串,需要用到长字符串:在字符串的首尾加上3个连续的单引号或双引号, + + 例如:"""西财的校训是: + 经世济民, + 孜孜以求。""" + 打印:西财的校训是: + 经世济民, + 孜孜以求。 + + + if 条件 : + 条件符合时执行的语句 + else : + 条件不符合时执行的语句 +连续的条件分支可写成: + + if 条件1 : + elif 条件2 : + elif 条件3 : + ...... + 条件表达式:x if 条件 else y + 意为如果条件成立,就返回x,不成立,就返回y。 + + + while 循环的格式 + while 条件 : + +for 循环的格式 + + for 目标(变量) in 表达式(字符串,元组,列表,字典等):循环体 + + + 例如:a=5.99 + b=int (a) + 打印:b=5 + + + type (a) + 打印:class' float ' + + isinstance(a,str) + 打印:true + + + random.randint()表示返回一个随机的整数 + + 例如:a = random.randint(1,10) 表示返回1到10中间的任意一个整数 + + range([start,] stop [,step]) + + + ###step = 1 表示默认值是1 + + + 例1:range(5) + 打印:0~4中的任意一个整数 + + range(5) 相当于range(0,5),打印时包括左边的数值但不包括右边的数值 + + 例2:for i in range(1,10,2): + print(i) + 打印:1 3 5 7 9 + + 最后的2表示每相隔两个数打印一次 + + list() :表示以列表的形式输出 + len() :计算字符串的长度 + + + + + + + + + + + + + diff --git "a/docs/views/data/matplotlib\347\224\273\345\233\276\344\270\212.md" "b/docs/views/data/matplotlib\347\224\273\345\233\276\344\270\212.md" new file mode 100644 index 00000000..c7a8a573 --- /dev/null +++ "b/docs/views/data/matplotlib\347\224\273\345\233\276\344\270\212.md" @@ -0,0 +1,161 @@ +--- +layout: post +title: matplotlib画图上 +date: 2021-11-26 +author: yuping0924 +categories: + - 数据分析部 +tags: + - matplotlib +--- + +#### 引入函数库 + +import matplotlib.pyplot as plt +mport numpy as np + +#### 打印函数 + +plt.plot(x,y) +plt.show() [这个函数是将图像展示出来,在结尾处必须要有] +注:plt.plot(x,y,color='red',linewitdth=1.0,linestyle="--") +color用来规定线段的颜色,linewidth用来表示线段的宽窄,linestyle用来表示线段的类型 + +#### plt.figure()的用处 + +plt.figure(num=1,figsize=(8,5), dpi=None, facecolor=None, edgecolor=None, frameon=True) + +num表示图像的编号,figsize改变图像的长和宽,dpi表示分辨率,默认为80,facecolor表示背景色,edgecolor表示边框颜色,frameon表示是否显示边框 + +##### 1.同时生成多个独立窗口 + +在每个函数前加上:plt.figure() +例如: +plt.figure() +plt.plot(x,y1) +plt.figure() +plt.plot(x,y2) +plt.show() +##### 2.将两个函数图像放在一个窗口 + +在一个figure下生成两次图像 +例如: +plt.figure() +plt.plot(x,y1) +plt.plot(x,y2) +#### 坐标轴设置 + +##### 1.设置取值范围 + +plt.xlim((a,b)) 表示X轴 +plt.ylim((a,b)) 表示Y轴 +##### 2.设置名称 + +plt.xlabel('X轴') +plt.ylabel('Y轴') +##### 3.替换下标值 + +1.举例说明: +a=np.linspace(-1,2,5) +print(a) +plt.xticks(a) +plt.yticks([-2,-1.8,-1,1.22,3], +['really bad','bad','normal','good','really good']) +此时,Y轴中的-2被替换成really bad, -1.8被替换成bad,以此类推 + + +2.plt.xticks用法说明(plt.yticks同理): +plt.xticks(locs,[labels],**kwargs) +locs:表示一个范围,可设置为range(1,13,1)类似 +[labels]:可任意赋值,且是一一对应 +**kwargs:是用于控制label的参数 +如果想隐藏刻度可将Y轴输为空值或不设置,即:plt.xticks(x,())或plt.xticks(x) + +##### 4.改变坐标轴位置 + +a = plt.gca() 此处gca是'get current axis'即取出坐标轴,必有 +1.改变单个坐标轴颜色 +a.spines['right'].set_color('none') +其中spines['right']表示选择右边的轴线,none表示无色 + + +2.设置某条边为X或Y轴 +a.xaxis.set_ticks_position('bottom') +a.yaxis.set_ticks_position('left') +表示将底部的轴设置为X轴,左边的设置为Y轴 + + +3.改变位置 +a.spines['bottom'].set_position(('data',0)) +a.spines['left'].set_position(('data',0)) +表示将下面的轴放到y=0的位置,将左边的轴放到x=0的位置 +注:这里的data表示Y轴的数值,也可用axes移动位置,此时0处表示的是百分比,若为0.1,就表示10% + +#### 添加注释 + +##### 1.强调某一个点 + +举例说明: +x0=1 +y0=2*x0+1 +plt.scatter(x0,y0,s=50,color='blue') +plt.plot([x0,x0],[y0,0],'k--',linewidth=2.5) +其中scatter表示打印点状,s表示点的大小,color是颜色 +plot表示打印线段,[x0,x0],[y0,0]表示X=1的图像,'k--'是简写,k表示颜色是黑色,'--'表示线段的样式是虚线 + +##### 2.添加文本说明 + +用plt.text(): +plt.text(x,y,s,fontdict=None,withdash=False,**kwargs) +x,y:放置text的位置 +s:text的内容 +fontdict:定义s的格式 +其他参数: +fontsize:设置字体大小,默认12 +fontweight:设置字体粗细,可用数字表示,也可用标准的名称 +fontstyle:设置字体类型,可选参数['normal'|'italic'|'oblique'] +backgroundcolor:设置字体背景色 + +举例: +plt.text(-3.7,3,r'$This\ is\ the\ text\ \mu\ \sigma_1\ \alpha_2$',fontdict={'size':16,'color':'red'}) +plt.show() +此处涉及文本中空格和特殊字符表示,需要在空格和特殊字符前加上转义字符 \ 转换,若要在字符后加上下标就在字符和下标中间加上下划线 _ 即可 + +#### 代码展示 +```python +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(-3,3,50) +y = 2*x+1 + +plt.figure(num=1,figsize=(8,5)) +plt.plot(x,y) + +a = plt.gca() +a.spines['right'].set_color('none') +a.spines['top'].set_color('none') +a.xaxis.set_ticks_position('bottom') +a.yaxis.set_ticks_position('left') +a.spines['bottom'].set_position(('data',0)) +a.spines['left'].set_position(('data',0)) + + +x0=1 +y0=2*x0+1 +plt.scatter(x0,y0,s=50,color='blue') +plt.plot([x0,x0],[y0,0],'k--',linewidth=2.5) + + +plt.text(-3.7,4,r'$This\ is\ \mu\ \sigma_i\ \alpha_t$', + fontsize=16,fontweight=10,fontstyle='italic') + +plt.text(-3.7,3,r'$This\ is\ \mu\ \sigma_i\ \alpha_t$', + fontsize=16,backgroundcolor='yellow') +plt.text(-3.7,2,r'$This\ is\ \mu\ \sigma_i\ \alpha_t$', + fontdict={'size':16,'color':'red'}) + + +plt.show() +``` +![Figure_3.png](https://i.loli.net/2021/11/28/89BIubinorveALO.png) \ No newline at end of file diff --git "a/docs/views/data/matplotlib\347\224\273\345\233\276\344\270\213.md" "b/docs/views/data/matplotlib\347\224\273\345\233\276\344\270\213.md" new file mode 100644 index 00000000..9ef93df6 --- /dev/null +++ "b/docs/views/data/matplotlib\347\224\273\345\233\276\344\270\213.md" @@ -0,0 +1,327 @@ +--- +layout: post +title: matplotlib 画图下 +date: 2021-12-3 +author: yupinng0924 +categories: + - 数据分析部 +tags: + - matplotlib 画图 +--- +#### tick能见度 +此处的tick是指坐标刻度的数字及其背后的背景,改变能见度需要用到bbox()函数,特别注明,在较新的python版本中,需要在plot的时候加上zorder的设置 + +举例说明: +``` +import matplotlib.pyplot as plt +import numpy as py + +x=py.linspace(-3,3,50) +y=0.2*x + +plt.figure() +plt.plot(x,y,zorder=1,linewidth=10) +plt.ylim(-2,2) + +ax=plt.gca() +ax.spines['right'].set_color('none') +ax.spines['top'].set_color('none') +ax.xaxis.set_ticks_position('bottom') +ax.yaxis.set_ticks_position('left') +ax.spines['bottom'].set_position(('data',0)) +ax.spines['left'].set_position(('data',0)) + +for label in ax.get_xticklabels() + ax.get_yticklabels() : + label.set_fontsize(12) + label.set_bbox(dict(facecolor='white',edgecolor='none',alpha=0.7)) + +plt.show() +``` +![能见度.png](https://i.loli.net/2021/12/03/V1PptnbUmilIZHq.png) +dict()用于创建一个字典,facecolor设置背景颜色,edgecolor设置边框颜色,alpha设置透明度,从0(完全透明)到1(完全不透明) + +zorder解释:是用于确定补丁,线条,文本(此为默认顺序,对应为1,2,3)的绘制顺序的,任何一个单独的plot()调用都可以为特定的该项设置zorder,zorder的值越小改项就在越上面 +#### 散点图 +打印直接用:scatter()函数 ,可设置(s点的大小,alpha透明度,c颜色)等 +举例说明: +``` +import matplotlib.pyplot as plt +import numpy as np + +n=1024 +x = np.random.normal(0,1,n) +y = np.random.normal(0,1,n) + +##x=np.linspace(-3,3) +##y=2*x+1 +plt.scatter(x,y,s=75,alpha=0.5) +plt.xlim((-2,2)) +plt.ylim((-2,2)) +plt.xticks(()) +plt.yticks(()) + +plt.show() +``` +![散点图_1.png](https://i.loli.net/2021/12/03/BZ6bocud8N2931g.png) +![散点图_2.png](https://i.loli.net/2021/12/03/F6ToGHejpYMWqhK.png) +#### 柱状图 +打印用:bar()函数,可设置(facecolor矩形的颜色,edgecolor边框色),朝向(在y前面加上负号)等 +举例说明: +``` +import matplotlib.pyplot as plt +import numpy as np + +n=12 +X=np.arange(n) +y1=(1-X/float(n))*np.random.uniform(0.5,1.0,n) +y2=(1-X/float(n))*np.random.uniform(0.5,1.0,n) + +plt.bar(X,y1,facecolor='green',edgecolor='white') +plt.bar(X,-y2) + +for x,y in zip(X,y1): + plt.text(x,y+0.05,'%.2f'%y,ha='center',va='bottom') + +for x,y in zip(X,y2): + plt.text(x,-y-0.05,'%.2f'%y,ha='center',va='top') + + +plt.xlim(-3,n) +plt.ylim(-1.25,1.25) +plt.xticks(()) +plt.yticks(()) + +plt.show() +``` +说明:zip()可以同时赋多个值,其参数可以是多个列表,元组,字典,集合,字符串,以及range()区间构成的列表。ha 是横向对齐(horizon alignment),va是纵向对齐 +![柱状图.png](https://i.loli.net/2021/12/03/75LfPIYysJ8XHtC.png) +#### 等高线图 +首先要绘制底图:用contourf()函数,其参数为(长, 宽, 高,分成部份数(最低为2),透明度alpha,颜色camp ) + +上色需要用到camp,其设置方式有三种: + + + camp=plt.get_camp('rad') + + camp='red' + + camp=plt.cm.binary + +然后生成等高线:用contour()函数,其参数为(长,宽,高,分成部分数,颜色,线宽) +举例说明: +``` +import matplotlib.pyplot as plt +import numpy as np + +def f(x,y): + return (1-x/2+x**5+y**3)*np.exp(-x**2-y**2) + +n=256 +x=np.linspace(-3,3,n) +y=np.linspace(-3,3,n) +X,Y=np.meshgrid(x,y) + +plt.contourf(X,Y,f(X,Y),8,alpha=0.75,cmap=plt.cm.hot) + +C=plt.contour(X,Y,f(X,Y),8,colors='black',linewidths=0.5) + +plt.clabel(C,inline=1,fontsize=10) + +plt.xticks(()) +plt.yticks(()) +plt.show() +``` +注:inline=1(Ture)表示将数值嵌套在线里面 +![等高线图.png](https://i.loli.net/2021/12/03/aJn6kpl4mdLKX71.png) +#### 生成图片 +用inshow()函数,其参数为(数据,颜色camp,起始位置origin='upper'/'lower') +用colorbar()生成颜色标注,参数可为(长度比例shrink,位置location) +举例说明: +``` +import matplotlib.pyplot as plt +import numpy as np + +a=np.array([0.31,0.38,0.42, + 0.49,0.51,0.43, + 0.62,0.55,0.72]).reshape(3,3) + +plt.imshow(a,interpolation='none',cmap='bone',origin='lower') +plt.colorbar(shrink=0.7) + + +plt.xticks(()) +plt.yticks(()) +plt.show() +``` +![生成图片.png](https://i.loli.net/2021/12/03/ahOLrHbzxdFSgDJ.png) +#### 3D数据 +首先要新增函数库:from mpl_toolkits.mplot3d import Axes3D +举例说明: + +``` +import matplotlib.pyplot as plt +import numpy as np +from mpl_toolkits.mplot3d import Axes3D + +##创建一个窗口 +fig=plt.figure() +##生成一个3D坐标 +ax=Axes3D(fig) +X=np.arange(-4,4,0.25) +Y=np.arange(-4,4,0.25) +##把x,y放到网格图上 +X,Y =np.meshgrid(X,Y) +R=np.sqrt(X**2+Y**2) +Z=np.sin(R) + +ax.plot_surface(X,Y,Z,rstride=1,cstride=1,cmap=plt.get_cmap('rainbow')) +ax.contourf(X,Y,Z,zdir='z',offset=-2,cmap='rainbow') +ax.set_zlim(-2,2) +plt.show() +``` +注:rstride表示横线间隔,cstride表示竖线间隔,zdir='z',表示从z轴俯视,offset表示从哪个坐标点开始 +![3D数据.png](https://i.loli.net/2021/12/03/SwdHAzRGqnmku9U.png) +#### 在一个窗口中放多个坐标图 ++ 方法一:用subplot()函数,此函数只能均分,其参数为(行数,列数,位置) +``` +import matplotlib.pyplot as plt + +plt.figure() +plt.subplot(2,2,1) +plt.plot([0,1],[0,1]) + +plt.subplot(2,2,2) +plt.plot([0,1],[0,2]) + +plt.subplot(2,2,3) +plt.plot([0,2],[0,4]) + +plt.subplot(2,2,4) +plt.plot([0,5],[0,2]) + + +plt.figure() +plt.subplot(2,1,1) +plt.plot([0,1],[0,1]) + +plt.subplot(2,3,4) +plt.plot([0,1],[0,2]) + +plt.subplot(2,3,5) +plt.plot([0,2],[0,4]) + +plt.subplot(2,3,6) +plt.plot([0,5],[0,2]) + +plt.show() +``` +![Figure_1.png](https://i.loli.net/2021/12/03/9jvAETq1FRXCytg.png) ++ 方法二: +用subplot2grid()函数,它能够在画布特定位置创建axes对象(绘图区),使用不同数量的行列数,进行非等分的分割,再按照绘图区的大小来展示结果。其参数为(划分形式(3,3),起始位置(0,0),rowspan占的行数,colspan占的列数) ++ 方法三: +先引入函数库:import matplotlib.gridspec as gridspec +再用gridspec.GridSpec()函数生成初始形状,参数为(行数,列数) +最后对每个axes用subplot()函数确定位置 ++ 方法四: +用plt.subplots()函数,返回一个包含fig和axes对象的元组,因此使用fig,ax=plt,subplots()将元组分为fig和ax两个变量, +例如:f,((ax11,ax12),(ax21,ax22))=plt.subplots(行数,列数,sharex=Ture共享x轴,sharey=Ture共享y轴) +举例说明: +``` +import matplotlib.pyplot as plt +import matplotlib.gridspec as gridspec + +plt .figure() +ax1 = plt.subplot2grid((3,3),(0,0),colspan=3,rowspan=1) +ax1.plot([1,2],[1,2]) +ax1.set_title('ax1') +ax2 = plt.subplot2grid((3,3),(1,0),colspan=2,rowspan=2) +ax2.plot([2,3],[3,5]) +ax2.set_title('ax2') +ax3 = plt.subplot2grid((3,3),(1,2),colspan=1,rowspan=2) +ax3.plot([-2,1],[-1,3]) +ax3.set_title('ax3') + +plt.figure() +gs = gridspec.GridSpec(3,3) +ax1 = plt.subplot(gs[0,:]) +ax1.set_title('ax1_1') +ax2 = plt.subplot(gs[1,:2]) +ax2.set_title('ax2_2') +ax3 = plt.subplot(gs[1:,2]) +ax3.set_title('ax3_3') +ax4 = plt.subplot(gs[2,0]) +ax4.set_title('ax4_4') +ax5 = plt.subplot(gs[2,1]) +ax5.set_title('ax5_5') + + +plt.figure() +f,((ax11,ax12),(aax21,ax22)) = plt.subplots(2,2,sharex=True,sharey=True) +ax11.scatter([1,2],[1,2]) +ax12.scatter([-1,-2],[1,2]) + +plt.show() +``` +![Figure_2.png](https://i.loli.net/2021/12/03/WdzXplc19o8YiEy.png) +![Figure_3.png](https://i.loli.net/2021/12/03/EMa53v9Zm1kYCry.png) +![Figure_4.png](https://i.loli.net/2021/12/03/ycpPGjWSCnidQAf.png) +#### 再一个图中再嵌套图形 +重点是确定图片的位置: +left,bottom,width,height=0.1,0.1,0.8,0.8(通过改变百分比来改变位置) +ax1(坐标名)=fig.add_axes([left,bottom,width,height]) +或者:plt.axes([left,bottom,width,height]) +举例说明: +``` +import matplotlib.pyplot as plt + +fig = plt.figure() +x = [1,2,3,4,5,6,7] +y = [1,3,4,2,5,8,6] + +left,bottom,width,height = 0.1,0.1,0.8,0.8 +ax1 = fig.add_axes([left,bottom,width,height]) +ax1.plot(x,y,'red') +ax1.set_xlabel('x') +ax1.set_ylabel('y') +ax1.set_title('title') + + +left,bottom,width,height = 0.2,0.6,0.25,0.25 +ax2 = fig.add_axes([left,bottom,width,height]) +ax2.plot(y[::-1],x,'blue') +ax2.set_xlabel('x') +ax2.set_ylabel('y') +ax2.set_title('inside_1') + +x1= [1,2,3,4,5,6,7] +y1= [2,5,3,4,7,1,4] +plt.axes([0.6,0.2,0.25,0.25]) +plt.plot(y1,x1,'green') +plt.xlabel('x') +plt.ylabel('y') +plt.title('inside_2') + +plt.show() +``` +![图中图.png](https://i.loli.net/2021/12/03/nCb8vjiNt6pc7Dk.png) +#### 设置主次坐标轴 +重点用twinx()函数,例如:ax2=ax1.twinx()将ax1的y轴镜面反映到ax2的y轴 +举例说明: +``` +import matplotlib.pyplot as plt +import numpy as np + +x = np.arange(0,10,0.1) +y1 = 0.05*x**2 +y2 = -1*x + +fig,ax1 = plt.subplots() +ax2 = ax1.twinx() +ax1.plot(x,y1,'g-') +ax2.plot(x,y2,'b--') + +ax1.set_xlabel('X') +ax1.set_ylabel('Y1',color='green') +ax2.set_ylabel('Y2',color='blue') + +plt.show() +``` +![主次坐标轴.png](https://i.loli.net/2021/12/03/mX4asEJjRNycOYL.png) + diff --git "a/docs/views/data/python turtle\346\227\266\351\222\237.md" "b/docs/views/data/python turtle\346\227\266\351\222\237.md" new file mode 100644 index 00000000..0c225449 --- /dev/null +++ "b/docs/views/data/python turtle\346\227\266\351\222\237.md" @@ -0,0 +1,78 @@ +--- +layout: post +title: python turtle 时钟画法 +date: 2021-11-20 +author: yuping0924 +categories: + - 数据分析部 +tags: + - 时钟程序 + - python turtle +--- + + #引入turtle datetime 和time 三个函数库 + import turtle as t + import datetime as dt + import time + #设置窗口大小 + t.setup(600,600) + #隐藏画图的轨迹,这里的0就代表False + t.tracer(0) + t.speed(0) + t.hideturtle() + #定义函数画时钟 + def draw_clock(h,m,s): + t.clear()#因为要多次重复画图每次将屏幕清空后才不会闪现 + t.penup() + t.pensize(3) + t.goto(0,-210)#移动画笔,使圆出现在正中间 + t.setheading(0)#上一步后画笔指向正东,要将其指向正上方 + t.pendown() + t.circle(210) + + #画刻度 + t.left(90) + t.penup() + t.goto(0,0)#将画笔放回正中间 + #利用循环画刻度 + for __ in range(12): + t.penup() + t.forward(190) + t.pendown() + t.forward(20) + t.penup() + t.goto(0,0) + t.right(30) + #画时针 + t.pensize(5) + t.right(h/12*360) + t.pendown() + t.forward(70) + + #画分针和秒针 + t.penup() + t.goto(0,0) + t.setheading(90) + t.pendown() + t.right(m/60*360) + t.forward(110) + + t.penup() + t.goto(0,0) + t.setheading(90) + t.pendown() + t.right(s/60*360) + t.forward(170) + + #重置画笔位置 + t.penup() + t.goto(-75,-70) + s="{}年{}月{}日".format(now.year,now.month,now.day) + t.write(s,"center",font=["Arial","20"]) + #用循环无限更新指针位置 + while True: + t.update()#更新前面隐藏的轨迹,使画面出现 + time.sleep(1)#延迟一秒画图,减少占用的内存 + #实时更新 + now=dt.datetime.now() + draw_clock(now.hour,now.minute,now.second) diff --git "a/docs/views/data/\346\211\213\345\206\231\346\225\260\345\255\227.ipynb" "b/docs/views/data/\346\211\213\345\206\231\346\225\260\345\255\227.ipynb" new file mode 100644 index 00000000..82947406 --- /dev/null +++ "b/docs/views/data/\346\211\213\345\206\231\346\225\260\345\255\227.ipynb" @@ -0,0 +1,568 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 157, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy\n", + "import matplotlib.pyplot as plt\n", + "import scipy.special" + ] + }, + { + "cell_type": "code", + "execution_count": 158, + "metadata": {}, + "outputs": [], + "source": [ + "data_file = open(\"E:\\OneDrive\\文档\\数据集\\手写数字MNIST\\mnist_train_100.txt\",\"r\")\n", + "data_list = data_file.readlines()\n", + "data_file.close()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "在对文件数据进行操作时不能先关闭文件\n", + "调用方法时要记得加括号" + ] + }, + { + "cell_type": "code", + "execution_count": 159, + "metadata": {}, + "outputs": [], + "source": [ + "# len(data_list)\n", + "# print(data_list)" + ] + }, + { + "cell_type": "code", + "execution_count": 160, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "d:\\Anaconda3\\envs\\pytorch38\\lib\\site-packages\\ipykernel_launcher.py:7: MatplotlibDeprecationWarning: Case-insensitive properties were deprecated in 3.3 and support will be removed two minor releases later\n", + " import sys\n" + ] + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAANf0lEQVR4nO3dfYxU1RnH8V5A/hBR2BoWgliEEKwSuzaIjZIqIatANLi+NG5iYwMB/2ATTBpSQv8Q02BIBZoSjdk1otBY1EQNSIxABKGNCXFFUFiKUEN1ccPW4MqLLxSY/g49a7br3jPLzJ25d+b5fpIn584883Ic+e29M3dnT5TL5X4EoPoNSHsCAMqDsANGEHbACMIOGEHYASMGlfPJoijio3+gxHSGLUp8z67wzlAdVB1WLS7msQCUVlToeXaFe6CGj1X1qnbVe6pGPV5b4D7s2YEK3LNPUR3WA3+iOqPtl1Szi3g8ACVUTNhHqz7rcbndX9d7bz5f1eqqiOcCkOIHdH0dKvzgMF17/RYNrjiMByp0z+725GN6XL5K9Xlx0wGQxbC7D+Qm6PD8GtVgbT+o2pjMtABk5jBeh+dnFfImbW5WuU/m1+i6/YnNDEA2Tr0V9GScegMq85dqAFQOwg4YQdgBIwg7YARhB4wg7IARhB0wgrADRhB2wAjCDhhB2AEjCDtgBGEHjCDsgBGEHTCCsANGEHbACMIOGEHYASMIO2AEYQeMIOyAEYQdMIKwA0YQdsAIwg4YQdgBIwg7YARhB4woeMlmVIaBA91q2vGuuOKKkj5/U5Nb1btvl156afC+EydODPYXLFgQ7K9YsSK219jYGLzvt99+G+wvX7482H/88ceD/YoLexRFRzScVJ1Tnc3lcpMTmRWATO7ZpynkXyTwOABKiPfsgBHFhj2n2qLD+fdV8/u6gbte1eqqyOcCkOJh/K06hP9cQR6h7a0a/6HLO3veQJdbNLT44LsfDgAqbc/ugu7HTg2vq6YkMSkAGQq79tJDVEO7tzXcodqX1MQAZOcwvlb1uoLe/Th/1R7+rURmVWWuvvrqYH/w4MHB/i233BLsT506NbY3bNiw4H3vu+++YD9N7e3twf7q1auD/YaGhtjeyZPujHG8vXv3Bvs7duwI9qsq7Ar2Jxp+luBcAJQQp94AIwg7YARhB4wg7IARhB0wItKn6uV7sir9Dbq6urpgf9u2bal+zTSrzp8/H+zPmTMn2D916lTBz93R0RHsf/nll8H+wYMHC37uUlOmL5wP7409O2AEYQeMIOyAEYQdMGJA2hMAUB6EHTCCsANGcJ49ATU1NcH+rl27gv1x48YlMY2SyDf3rq6uYH/atGmxvTNnzgTva/X3D4rFeXbAOA7jASMIO2AEYQeMIOyAEYQdMIKwA0awZHMCjh8/HuwvWrQo2L/rrruC/Q8++KCoP6kcsmfPnmC/vr4+2D99+nSwf/3118f2Fi5cGLwvksWeHTCCsANGEHbACMIOGEHYASMIO2AEYQeM4PvsGXD55ZcH+/mWF25ubo7tzZ07N3jfhx56KNhfv359sI8q+j57FEVrVJ2qfT2uq1FtVR3y4/AkJwsgncP4F1Qzel23WPW2foJMcKO/DKCSw65A79TQ+/dBZ6vW+m033pPwvABk5Hfja/VD4MJiWW7UYfyIuBuqN1+DKwDV/EUY/TBo0dBSzQs7AtV86u2YgjvKbfixM7kpAchS2DeqHvbbbtyQzHQApHYYrz23O9F6u+pKbbdrfEy1XPWKLruTuJ+qHijVBC04ceJEUff/6quvCr7vvHnzgv2XX365qDXWUUFh13vuxpjW9ITnAqCE+HVZwAjCDhhB2AEjCDtgBGEHjOArrlVgyJAhsb033ngjeN/bbrst2J85c2awv2XLlmAf5ceSzYBxHMYDRhB2wAjCDhhB2AEjCDtgBGEHjOA8e5UbP358sL979+5gv6urK9jfvn17sN/a2hrbe/rpp/OdLw72kfCfkgZQHQg7YARhB4wg7IARhB0wgrADRhB2wAjOsxvX0NAQ7D///PPB/tChQwt+7iVLlgT769atC/Y7Oi6sQIZeOM8OGMdhPGAEYQeMIOyAEYQdMIKwA0YQdsAIzrMjaNKkScH+qlWrgv3p0wtf7Le5uTnYX7ZsWbB/9OjRgp/b5Hn2KIrWqDpV+3pct1R1VLXH16wkJwsgncP4F1Qz+rj+T/oJUufrzYTnBaDcYVeQd2o4nvDzAqigD+iadPj+oT/MHx53I/Xmq1pdFfFcAFIK+zMq95cM61Tu2wgrA0cGLarJrgp8LgBphV3BPaY6pzqvi8+qpiQwFwBZC7sOyUf1uOi+I/n9J/UAKvQ8u4K9XsPtqitVx1SP+cvuEN7d+YjqET1O3i8X67H4Q+BVZtiwYcH+3XffXfB35fXvJdjftm1bsF9fXx/sWzvPPqgfd2zs4+rnip4RgLLi12UBIwg7YARhB4wg7IARhB0wgq+4IjXfffddsD9oUPhk0dmzZ4P9O++8M7b3zjvvBO9byfhT0oBxHMYDRhB2wAjCDhhB2AEjCDtgBGEHjMj7rTfYdsMNNwT7999/f7B/0003FXwePZ+2trZgf+dO9+cT0Y09O2AEYQeMIOyAEYQdMIKwA0YQdsAIwg4YwXn2Kjdx4sRgv6mpKdi/9957g/2RI0de9Jz669y5c8F+R0f4r5efP+/WMEE39uyAEYQdMIKwA0YQdsAIwg4YQdgBIwg7YATn2StAvnPZjY19LbTbv/PoY8eOLWhOSWhtbQ32ly1bFuxv3LgxyelUvbx79iiKxqi2qw6o9qsW+utrVFtVh/w4vPTTBVDKw3i37MZvc7ncTzX+QrVAwb5O42LV27p+ghv9ZQCVGnaFuUO122+f1HBANVo1W7XW38yN95RqkgDK/J5de3T3Bu9G1S5VrftB4K53o3ojYu4zX4MrAJUQdoX2Mg2vqh5VuE/ocr/up9u2aGjxj5ErZJIAynTqTSG9xAf9RYX3NX/1MV0/yvfd2Fn8dACktmeP/rcLf051QEFf1aPlzns8rFruxw0lmWEVqK2tDfavu8593hnvqaeeCvavvfbai55TUnbtcu/o4j355JOxvQ0bwv9k+Ipq+Q/jb1X9WvWRcr/HX7fEh/wVXTdX46eqB5KdGoCyhl17879riHuDPj3JyQAoHX5dFjCCsANGEHbACMIOGEHYASP4ims/1dTUxPaam5uD962rqwv2x40b199pJO7dd98N9leuXBnsb968Odj/5ptvLnpOKA327IARhB0wgrADRhB2wAjCDhhB2AEjCDtghJnz7DfffHOwv2jRomB/ypQpsb3Ro92f5EvP119/HdtbvXp18L5PPPFEsH/69OmC5oTsYc8OGEHYASMIO2AEYQeMIOyAEYQdMIKwA0aYOc/e0NBQVL8YbW1twf6mTZuC/bNn3dqahX3nvKurK3hf2MGeHTCCsANGEHbACMIOGEHYASMIO2AEYQeMiHK5XPgGUTRGwzrVSNV5VYvu82ddv1Tb81T/9jddouvfzPNY4ScDUDTlMCo07KM0jNLtdmt7qLbfV92j+pXqlK5f0d9JEHYgvbD3Z332Dg0dfvukAntAm+n+aRYApX3PrqCP1XCjape/qknXfahaoxoec5/5qlZXFz07AInJexj//Q2j6DINO1TLdJ/XdLlW21+o3AP8wR/qz8nzGLxnB7L6nv3CjaLoEg3u2xqbdftVMXv8TepNyvM4hB1IKex5D+MVUHfH51QHegbdf3DXzX1lbF+xkwRQOv35NH6qhr+pPvKn3pwlqkaVW4vYPcAR1SP+w7zQY7FnB7J8GJ8Uwg5k+DAeQHUg7IARhB0wgrADRhB2wAjCDhhB2AEjCDtgBGEHjCDsgBGEHTCCsANGEHbACMIOGFHuJZvdn7H6V4/LV/rrsiirc8vqvBzmlv7r9pO4Rlm/z/6DJ4+iVj3/5NQmEJDVuWV1Xg5zy/brxmE8YARhB4wYkPLzt6T8/CFZnVtW5+Uwtwy/bqm+ZwdgZ88OoEwIO2DEgJRONcxQHVQdVi1OYw5xNJ8jqo9Ue9Jen86vodep+n4BDm3XqLaqDvlxeIbmtlR11L92rmalNLcxqu1uEVLVftXCLLx2gXmV5XUr+3t2/YcM1PCxql7VrnpP1ah5tJV1IoGwa5is+XyRgbn8UsMp1brupbV03R81HNfl5f4H5XBt/y4jc1t6sct4l2huccuM/ybN1y7J5c8rZc8+RXVY/2GfqM5o+yXV7BTmkXl6fXZqON7ravdarfXba/0/lqzMLRPcykQuUH77pIYDfpnxVF+7wLzKIo2wu/+4z3pcbs/Yeu/uUGeLfvK+75abTnsyfajtXmbLjyNSnk9veZfxLqdey4zXZuW1K2T580oMe19L02Tp/N+t+ofwc40zVQv84Sr65xnVeL8GoAvTyjRfOL/M+KuqR/X/9ESac8kzr7K8bmmE3e3Jx/S4fJXq8xTm0Se9+BfmorFTw+v+bUeWHOteQdePbp6ZoNfsmOqcyi0A+myar51fZtwF6kXN57WsvHZ9zatcr1saYXcfyE3Qf/Q1qsHaflC1MYV5/IDmM8R/cHJhW8MdGVyK2r1WD/ttN25IcS7/JyvLeMctM572a5f68ufu0/hyl8zyn8j/U/X7NOYQM69xqr2+9qc9N1nvD+v+44+I5qp+rHpbdciPNRma21/80t4f+mCNSmlubpnxnJ/HHl+z0n7tAvMqy+vGr8sCRvAbdIARhB0wgrADRhB2wAjCDhhB2AEjCDtgxH8Bn9Zm7UoXXzIAAAAASUVORK5CYII=", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from configparser import Interpolation\n", + "from numpy import imag\n", + "\n", + "\n", + "all_values = data_list[0].split(',')\n", + "imag_array = numpy.asfarray(all_values[1:]).reshape((28,28))\n", + "plt.imshow(imag_array,cmap=\"gray\",Interpolation='none')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "plt.imshow()后面加上plt.show()才能显示,plt.imshow()仅仅是生成图像" + ] + }, + { + "cell_type": "code", + "execution_count": 161, + "metadata": {}, + "outputs": [], + "source": [ + "class neuralNetwork:\n", + " # 初始化\n", + " def __init__(self,inputnodes,hiddennodes,outputnodes,learningrate):\n", + " # 设置输入层,隐藏层,输出层的节点数\n", + " self.inodes = inputnodes\n", + " self.hnodes = hiddennodes\n", + " self.onodes = outputnodes\n", + "\n", + " # 定义链接的权重矩阵,此处是根据正态分布初始化\n", + " self.wih = numpy.random.normal(0.0,pow(self.hnodes,-0.5),(self.hnodes,self.inodes))\n", + " self.who = numpy.random.normal(0.0,pow(self.hnodes,-0.5),(self.onodes,self.hnodes))\n", + "\n", + " # 设置学习率\n", + " self.lr = learningrate\n", + "\n", + " # 定义激活函数\n", + " self.activation_function = lambda x:scipy.special.expit(x) \n", + " pass\n", + "\n", + " # 训练\n", + " def train(self,input_list,output_list):\n", + " # 将列表转为数组\n", + " inputs = numpy.array(input_list,ndmin=2).T\n", + " targets = numpy.array(output_list,ndmin=2).T\n", + "\n", + " # 计算隐藏层的输入\n", + " hidden_inputs = numpy.dot(self.wih,inputs)\n", + " # 将结果输入激活函数作为下一层的输入\n", + " hidden_outputs =self.activation_function(hidden_inputs)\n", + " # 计算输出层的输入\n", + " final_inputs = numpy.dot(self.who,hidden_outputs)\n", + " # 将结果输入激活函数作为最后输出\n", + " final_outputs = self.activation_function(final_inputs)\n", + "\n", + " # 计算误差\n", + " output_errors = targets-final_outputs\n", + " hidden_errors = numpy.dot(self.who.T,output_errors)\n", + "\n", + " # 更新权重\n", + " self.who += self.lr*numpy.dot((output_errors*final_outputs*(1.0-final_outputs)),numpy.transpose(hidden_outputs))\n", + " self.wih += self.lr*numpy.dot((hidden_errors*hidden_outputs*(1.0-hidden_outputs)),numpy.transpose(inputs))\n", + "\n", + " def query(self,input_list):\n", + " # 将列表转为数组\n", + " inputs = numpy.array(input_list,ndmin=2).T\n", + "\n", + " # 计算隐藏层的输入\n", + " hidden_inputs = numpy.dot(self.wih,inputs)\n", + " # 将结果输入激活函数作为下一层的输入\n", + " hidden_outputs =self.activation_function(hidden_inputs)\n", + " # 计算输出层的输入\n", + " final_inputs = numpy.dot(self.who,hidden_outputs)\n", + " # 将结果输入激活函数作为最后输出\n", + " final_outputs = self.activation_function(final_inputs)\n", + "\n", + " return final_outputs\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "训练模型" + ] + }, + { + "cell_type": "code", + "execution_count": 162, + "metadata": {}, + "outputs": [], + "source": [ + "input_nodes = 784\n", + "hidden_nodes = 200\n", + "output_nodes = 10\n", + "\n", + "learning_rate = 0.1\n", + "epoch = 50\n", + "\n", + "n = neuralNetwork(input_nodes,hidden_nodes,output_nodes,learning_rate)" + ] + }, + { + "cell_type": "code", + "execution_count": 163, + "metadata": {}, + "outputs": [], + "source": [ + "training_data_file = open(\"E:\\OneDrive\\文档\\数据集\\手写数字MNIST\\mnist_train_100.txt\",\"r\")\n", + "training_data_list = training_data_file.readlines()\n", + "training_data_file.close()" + ] + }, + { + "cell_type": "code", + "execution_count": 164, + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(epoch):\n", + " for record in training_data_list:\n", + " all_values = record.split(\",\")\n", + " inputs = (numpy.asfarray(all_values[1:])/255.0*0.99)+0.1\n", + " targets = numpy.zeros(output_nodes)+0.01\n", + " targets[int(all_values[0])] = 0.99\n", + " outputs = n.train(inputs,targets)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "测试模型" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "code", + "execution_count": 165, + "metadata": {}, + "outputs": [], + "source": [ + "test_data_file = open(\"E:\\OneDrive\\文档\\数据集\\手写数字MNIST\\mnist_test_10.txt\",\"r\")\n", + "test_data_list = test_data_file.readlines()\n", + "test_data_file.close()" + ] + }, + { + "cell_type": "code", + "execution_count": 166, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7\n" + ] + } + ], + "source": [ + "# 获取测试集的第一个值\n", + "all_values2 = test_data_list[0].split(\",\")\n", + "print(all_values2[0])" + ] + }, + { + "cell_type": "code", + "execution_count": 167, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAPsAAAD4CAYAAAAq5pAIAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAMfUlEQVR4nO3dW4gUVx7HcVtjQGIeZrztrBk1Kz4YFvGOoC6KJLj6oEGzxIfgQnDyoJJgkBX3QR+DbBLWF2GCErO4BsF4QcRVNChBDRmDt8mgjuLqxMsoI0QF0TG9v2OOMjt2nRm7uquq5//9wJ9TXWeq+9j2b6qmq6tPLp/P9wLQ8/VOewAAkkHYASMIO2AEYQeMIOyAES8l+WC5XI63/oEy0xm2XMn37ArvbNU5VbNqVZz7AlBeuWLPsyvcfdScV72palH9oFqk+/spsA17dqAC9+yTVc2640uqh1r+WjUvxv0BKKM4YR+qutrhdotf13lvXqdqcBXjsQCk+AZdoUOF5w7TtdevV+OKw3igQvfsbk9e2+H2a6pr8YYDIIthd2/IjdLh+euql7X8rmp3aYYFIDOH8To8b1fIl2nxPyr3zvwmrWss2cgAZOPUW1EPxqk3oDI/VAOgchB2wAjCDhhB2AEjCDtgBGEHjCDsgBGEHTCCsANGEHbACMIOGEHYASMIO2AEYQeMIOyAEYQdMIKwA0YQdsAIwg4YQdgBIwg7YARhB4wg7IARhB0wgrADRhB2wAjCDhhB2AEjCDtgBGEHjCh6fnY/BfNlNXdVj1Xt+Xx+YklGBSBbYfdmKuS3S3A/AMqIw3jAiLhhz6v263D+hKqu0A+49aoGVzEfC0AMOR2CF79xLvd7bX9N7WDdPKBarttHAj9f/IMB6BZlMFfyPbsLum9b1exQTY5zfwDKp+iway/9iurVp8tq3lKdLdXAAGTn3fghqh0K+tP7+bf28PtKMioA2fqb/YUfjL/Zgcr8mx1A5SDsgBGEHTCCsANGEHbAiFJcCGPCwoULI/uWLFkS3PbatSefPYr04MGDYP+WLVuC/Tdu3Ijsa25uDm4LO9izA0YQdsAIwg4YQdgBI3qnPQAAySDsgBGEHTCCq9666dKlS5F9I0aMKMl/RrHu3nVf8FtYY2NjgiPJlpaWlsi+devWBbdtaKjcb1HjqjfAOA7jASMIO2AEYQeMIOyAEYQdMIKwA0ZwPXs3ha5ZHzNmTHDbpqamYP/o0aOD/ePHjw/2z5gxI7JvypQpwW2vXr0a7K+trQ32x9He3h7sv3XrVrC/pqam6Me+cuVKjz3PHoU9O2AEYQeMIOyAEYQdMIKwA0YQdsAIwg4YwfXsPUBVVVVk39ixY4PbnjhxItg/adKkosbUHV19X/758+djfX6huro6sm/p0qXBbTds2BDs75HXs+dyuU2qVtXZDuuqVQdUF3wb/WoDUDGH8V+qZndat0p1UL9BRrnW3wZQyWFXoI+oaeu0ep5qs1927fwSjwtARj4bP0S/BK67BdfqMH5w1A+qr06NKwA9+UIY/TKoV1Pvg58v9+MBKO2pt5sK7pNLjnzbWuT9AMh42HerFvtl1+4qzXAApHaeXXvurWrcBdMDVTdVa1Q7VdtUw1TuwuB3dD+d38QrdF8cxqPbFixYEOzfts29BKOdPfvsbPFzZs6c+dy6jtraunw5V9x59pe6seGiiK5ZsUYEIFF8XBYwgrADRhB2wAjCDhhB2AEjuMQVqRk8OPJT1k+cOXMm1vYLFy6M7Nu+fXtw20rGlM2AcRzGA0YQdsAIwg4YQdgBIwg7YARhB4xgymakpquvcx40aFCw/86dO8H+c+fOvfCYejL27IARhB0wgrADRhB2wAjCDhhB2AEjCDtgBNezo6ymTp0a2Xfo0KHgtn379g32z5jhvuE82pEjbppCe/LFTtkMoGcg7IARhB0wgrADRhB2wAjCDhhB2AEjuJ4dZTVnzpyiz6MfPHgw2H/s2LGixmRV727Mqb5J1ap6Ntm1lteqflad9BX9PwqgYg7jv1TNLrD+c31SZ6yvvSUeF4Ckw64gu88ctpX4cQFU0Bt0y3T4ftof5ldF/ZD66lQNrmI8FoCUwr5BNVI1VnVd9WngyKBeNdFVkY8FIK2wK7g3VY9Vv+rmF6rJJRgLgKyFXYfkNR1uvq169k49gAo9z65gb1XjLhweqOUWtWvcbS27Q/i86rLqg7KOEpnVr1+/YP/s2YVO5Pzm4cOHwW3XrHEvtWiPHj0K9uMFw65D9UUFVm/sajsA2cLHZQEjCDtgBGEHjCDsgBGEHTCCS1wRy8qVK4P948aNi+zbt29fcNujR48WNSYUxp4dMIKwA0YQdsAIwg4YQdgBIwg7YARhB4xgymYEzZ07N9i/c+fOYP/9+/eLuvzVOX78eLAfhTFlM2Ach/GAEYQdMIKwA0YQdsAIwg4YQdgBI7ie3bgBAwYE+9evXx/s79OnT7B/797oOT85j54s9uyAEYQdMIKwA0YQdsAIwg4YQdgBIwg7YATXs/dwXZ0H7+pc94QJE4L9Fy9eDPaHrlnvalskfD17LperVX2ralI1qj7066tVB1QXfFtV3NAAZOUwvl31sX5bjFY7RbVUwX5D7SrVQa0f5Vp/G0Clhl1hvq760S/fVdOkGqqap9rsf8y188s1SAAJfzZee/QRatzkXd+rhrhfBG69a9U3OGKbOjWuAFRC2BXa/mq2qz5SuH/R7W5tp5+tV1Pv7yNfzCABJHTqTSHt64O+ReH9xq++qfU1vt+1rfGHAyC1PXvut134RlWTgv5Zh67dqsWqT3y7qywjRCwjR46MdWqtKytWrAj2c3qtsg7jp6reU51R7k/6dat9yLdp3ftqr6jeKc8QASQSdu3Nv1MT9Qf6rFIMAkD58XFZwAjCDhhB2AEjCDtgBGEHjOCrpHuA4cOHR/bt378/1n2vXLky2L9nz55Y94/ksGcHjCDsgBGEHTCCsANGEHbACMIOGEHYASM4z94D1NVFf+vXsGHDYt334cOHu7oqMtb9Izns2QEjCDtgBGEHjCDsgBGEHTCCsANGEHbACM6zV4Bp06YF+5cvX57QSFDJ2LMDRhB2wAjCDhhB2AEjCDtgBGEHjCDsgBHdmZ+9Vs1Xqt+pflXV5/P5f2r9Wi0vUd3yP7pa6/eWbaSGTZ8+Pdjfv3//ou+7q/nT7927V/R9o/I+VNOu+lhB/lEBf1XLJ9Qe8H2fa/0/yjc8AEnOz35dzXW/fFdBb9Li0FINAEAG/2ZX0EeoGaf63q9apnWnVZtUVRHb1KkaXMUcK4Akwq6wuj8Mt6s+0h7+F7UbVCNVY/2e/9NC2+ln3d/4E13FGCeAJMKuoPf1Qd+i0H7j1qm9qXqscm/afaGaHHMsANIMu4KeU7NR1aRgf9ZhfU2HH3tbdbb0wwOQ5LvxU1Xvqc4o4Cf9utWqRbrtDuHddwlfVn1QqkGhdE6dOhXsnzVrVrC/ra2N/w5D78Z/p8bt3TvjnDpQQfgEHWAEYQeMIOyAEYQdMIKwA0YQdsCIXJJT7uq8PPP7AmWmTBc6Vc6eHbCCw3jACMIOGEHYASMIO2AEYQeMIOyAEUlP2Xxb9d8Otwf6dVmU1bFldVwOY0v/eRueiQ/VPPfguVxDVr+bLqtjy+q4HMaW7eeNw3jACMIOGNE75cevT/nxQ7I6tqyOy2FsGX7eUv2bHYCdPTuAhBB2wIjeKZ1qmK06p2pWrUpjDFE0nsuqJ9+Rn/b8dH4OvVbVswk4tFztZtFVXfBtVYbGtlb1s3/uXM1JaWy1qm/dJKSqRtWHWXjuAuNK5HlL/G92/UP6qDmvelPVovpBtUjj+CnRgQTCrsbNTXc7A2P5kxo3QfpXGs8f/bp1atp0+xP/i7JKy3/LyNjWunVpT+PtZyuq6TjNeK9evear/prmcxcY11+SeN7S2LO7OeGa9Q+7pHqo5a9V81IYR+bp+TmipvOULO652uyXN/sXS1bGlglumnEXKL98V02Tn2Y81ecuMK5EpBF294+72uF2S8bme3eHOvv1m/eEm2467cEUMMS9aNyCbwenPJ7OupzGO0mdphkfkpXnrpjpzysx7IW+HytL5/+m6oUwXu2fVUv94Sq6p1vTeCelwDTjmVDs9OeVGHa3J6/tcPs11bUUxlGQnvwnY1HbqmZHBqeivvl0Bl3funFmQpam8S40zXgWnrs0pz9PI+zuDblR+ke/rnpZy++qdqcwjudoPK/4N06eLKt5K4NTUbvnarFfdu2uFMfyf7IyjXfUNONpP3epT3/u3o1PumSOf0f+ourvaYwhYlx/ULk5jl01pj022eoP6x75I6L3VQNUB1UXfFudobH9S3VGddoHqyalsU1zL20/DjfN+En/mkv1uQuMK5HnjY/LAkbwCTrACMIOGEHYASMIO2AEYQeMIOyAEYQdMOJ/rLwRQCB8ViQAAAAASUVORK5CYII=", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "image_array = numpy.asfarray(all_values2[1:]).reshape((28,28))\n", + "plt.imshow(image_array,cmap=\"gray\",interpolation='none')\n", + "plt.show()" + ] + }, + { + "cell_type": "code", + "execution_count": 168, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[0.01224118],\n", + " [0.00445431],\n", + " [0.03148557],\n", + " [0.01233577],\n", + " [0.00438021],\n", + " [0.03198868],\n", + " [0.00109165],\n", + " [0.96160137],\n", + " [0.00850337],\n", + " [0.01706819]])" + ] + }, + "execution_count": 168, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "n.query((numpy.asfarray(all_values2[1:])/255.0*0.99)+0.1)" + ] + }, + { + "cell_type": "code", + "execution_count": 169, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "7 correct_label\n", + "7 answer\n", + "2 correct_label\n", + "2 answer\n", + "1 correct_label\n", + "1 answer\n", + "0 correct_label\n", + "0 answer\n", + "4 correct_label\n", + "4 answer\n", + "1 correct_label\n", + "1 answer\n", + "4 correct_label\n", + "9 answer\n", + "9 correct_label\n", + "4 answer\n", + "5 correct_label\n", + "1 answer\n", + "9 correct_label\n", + "9 answer\n", + "0 correct_label\n", + "0 answer\n", + "6 correct_label\n", + "9 answer\n", + "9 correct_label\n", + "7 answer\n", + "0 correct_label\n", + "0 answer\n", + "1 correct_label\n", + "1 answer\n", + "5 correct_label\n", + "3 answer\n", + "9 correct_label\n", + "7 answer\n", + "7 correct_label\n", + "7 answer\n", + "3 correct_label\n", + "3 answer\n", + "4 correct_label\n", + "4 answer\n", + "9 correct_label\n", + "9 answer\n", + "6 correct_label\n", + "6 answer\n", + "6 correct_label\n", + "1 answer\n", + "5 correct_label\n", + "5 answer\n", + "4 correct_label\n", + "4 answer\n", + "0 correct_label\n", + "0 answer\n", + "7 correct_label\n", + "7 answer\n", + "4 correct_label\n", + "4 answer\n", + "0 correct_label\n", + "0 answer\n", + "1 correct_label\n", + "1 answer\n", + "3 correct_label\n", + "3 answer\n", + "1 correct_label\n", + "1 answer\n", + "3 correct_label\n", + "3 answer\n", + "4 correct_label\n", + "4 answer\n", + "7 correct_label\n", + "8 answer\n", + "2 correct_label\n", + "2 answer\n", + "7 correct_label\n", + "7 answer\n", + "1 correct_label\n", + "1 answer\n", + "2 correct_label\n", + "1 answer\n", + "1 correct_label\n", + "1 answer\n", + "1 correct_label\n", + "1 answer\n", + "7 correct_label\n", + "7 answer\n", + "4 correct_label\n", + "4 answer\n", + "2 correct_label\n", + "2 answer\n", + "3 correct_label\n", + "3 answer\n", + "5 correct_label\n", + "3 answer\n", + "1 correct_label\n", + "9 answer\n", + "2 correct_label\n", + "2 answer\n", + "4 correct_label\n", + "4 answer\n", + "4 correct_label\n", + "4 answer\n", + "6 correct_label\n", + "6 answer\n", + "3 correct_label\n", + "3 answer\n", + "5 correct_label\n", + "4 answer\n", + "5 correct_label\n", + "3 answer\n", + "6 correct_label\n", + "4 answer\n", + "0 correct_label\n", + "0 answer\n", + "4 correct_label\n", + "4 answer\n", + "1 correct_label\n", + "1 answer\n", + "9 correct_label\n", + "4 answer\n", + "5 correct_label\n", + "5 answer\n", + "7 correct_label\n", + "7 answer\n", + "8 correct_label\n", + "8 answer\n", + "9 correct_label\n", + "9 answer\n", + "3 correct_label\n", + "2 answer\n", + "7 correct_label\n", + "4 answer\n", + "4 correct_label\n", + "9 answer\n", + "6 correct_label\n", + "1 answer\n", + "4 correct_label\n", + "4 answer\n", + "3 correct_label\n", + "3 answer\n", + "0 correct_label\n", + "0 answer\n", + "7 correct_label\n", + "7 answer\n", + "0 correct_label\n", + "0 answer\n", + "2 correct_label\n", + "2 answer\n", + "9 correct_label\n", + "8 answer\n", + "1 correct_label\n", + "1 answer\n", + "7 correct_label\n", + "7 answer\n", + "3 correct_label\n", + "3 answer\n", + "2 correct_label\n", + "7 answer\n", + "9 correct_label\n", + "9 answer\n", + "7 correct_label\n", + "7 answer\n", + "7 correct_label\n", + "7 answer\n", + "6 correct_label\n", + "6 answer\n", + "2 correct_label\n", + "2 answer\n", + "7 correct_label\n", + "7 answer\n", + "8 correct_label\n", + "4 answer\n", + "4 correct_label\n", + "4 answer\n", + "7 correct_label\n", + "7 answer\n", + "3 correct_label\n", + "3 answer\n", + "6 correct_label\n", + "6 answer\n", + "1 correct_label\n", + "1 answer\n", + "3 correct_label\n", + "3 answer\n", + "6 correct_label\n", + "6 answer\n", + "9 correct_label\n", + "9 answer\n", + "3 correct_label\n", + "3 answer\n", + "1 correct_label\n", + "1 answer\n", + "4 correct_label\n", + "4 answer\n", + "1 correct_label\n", + "9 answer\n", + "7 correct_label\n", + "3 answer\n", + "6 correct_label\n", + "6 answer\n", + "9 correct_label\n", + "4 answer\n" + ] + } + ], + "source": [ + "scorecard = []\n", + "\n", + "for code in test_data_list:\n", + " all_values = code.split(\",\")\n", + " correct_label = int(all_values[0])\n", + " print(correct_label,\"correct_label\")\n", + " inputs = (numpy.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01\n", + " outputs = n.query(inputs)\n", + " label = numpy.argmax(outputs)\n", + " print(label,\"answer\")\n", + " if(correct_label==label):\n", + " scorecard.append(1)\n", + " else:\n", + " scorecard.append(0)" + ] + }, + { + "cell_type": "code", + "execution_count": 170, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0]\n", + "performance = 0.74\n" + ] + } + ], + "source": [ + "print(scorecard)\n", + "scorecard_array = numpy.asarray(scorecard) \n", + "print (\"performance = \", scorecard_array.sum() / scorecard_array.size)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6.15 ('pytorch38')", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.15" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "dfb7ab611ee6793a2bf93701295d770513e1e2edb32d16599c78c4ba42795f3f" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git "a/docs/views/data/\346\225\260\346\215\256\346\214\226\346\216\230\345\237\272\347\241\200\347\237\245\350\257\206.md" "b/docs/views/data/\346\225\260\346\215\256\346\214\226\346\216\230\345\237\272\347\241\200\347\237\245\350\257\206.md" new file mode 100644 index 00000000..dc59cadc --- /dev/null +++ "b/docs/views/data/\346\225\260\346\215\256\346\214\226\346\216\230\345\237\272\347\241\200\347\237\245\350\257\206.md" @@ -0,0 +1,113 @@ +--- +layout: post +title:数据挖掘 +date: 2019-05-05 +author: yuping +categories: + -数据分析部 +tags: + - 数据挖掘基础知识 +--- +###数据挖掘的定义: +数据挖掘是从大量的,不完全的,有噪音的,模糊的随机的数据中提取隐含在其中的,人们事先不知道的,但又潜在有效的信息和知识的过程。 +###数据挖掘的过程: +数据:是未经组织的数字、词语、声音、图像等,是对某种情况的记录。 +信息:在数据之上的数据之间的一些关系。 +知识:从信息中可以推导出的结论。 + +数据 ---(筛选)---> 目标数据 ---(预处理)--->已预处理数据 ---(变换)---> 变换后数据 ---(数据挖掘)---> 模式 ---(解释/评价)--->知识 +###数据挖掘的主要内容: +####关联规则挖掘: +概念:找到事物A和B的联系,例如沃尔玛超市啤酒与尿布的销量在周末总会同时出现增长的案例。 +包括:Apriori算法和FP-growth算法 +####标签分类/数值预测: +概念:通过大量的训练集得到数据的类别特征,再通过类别特征将新输入的数据进行分类(输出数值)。 +包括:贝叶斯,决策树,最邻近,逻辑回归,支持向量,神经网络,集成算法 + +通常分为两步:第一步,学习建模,即收集大量的训练样本,通过分类算法建立一个分类模型;第二步,分类测试(预测测试),即输入新的测试数据,通过模型推测结果。 +####聚类: +概念:通过一些相似性算法直接将数据分成不同的类或簇,同一类(簇)相似度高,不同的类(簇)之间差别较大。 +包括:K均值,层次聚类,密度聚类 + +分类与聚类的差别: + +数据分类:先给定标准,然后按照标准来进行分类 +数据聚类:一群事物中找共同性,然后按照找出的共同性来分类新的事物 +####数据预处理: +包括数据清洗,数据集成,数据归约,数据规范 +###数据类型和统计: + +1. 数据对象:代表了一个实体(例如,客户,患者等),也可被称为样品,实例,示例,数据点,对象,元组。 +2. 数据集:由数据对象组成。 +3. 数据属性:一个数据字段,代表一个数据对象的特征或功能。 +4. 数据库中的行,对应数据对象;列,对应属性。 +####数据属性类型 +1. 标称类型数据:此类型数据的状态是可列举的,例如,颜色,职业,身份证号码,邮政编码,婚姻状态等。 +此类型包括特殊的二进制类型数据(只包含两种状态),其分为对称二进制(两种数据规模相当)和不对称二进制(两种数据规模差距很大) + +2. 序数类型数据:有一个有意义的顺序,但不知道两个排名之间的差距有多大。例如,分为大中小三个等级,但每两个等级之间的差距不清楚。 + +3. 区间类型数据:以单位长度顺序性度量,其值是有序的,不存在零点,即不适用于乘法运算,例如,温度,日历等。 + +4. 比率标度类型数据:值是有序的,有固定的零点,即可以进行乘法运算,例如,长度,重量等。 +####数据统计汇总 +中性化趋势:平均值,中位数,众数,截断均值 +分布趋势:方差和标准差,分位数 + +###数据可视化 +箱线图:可以分析多个属性数据的分布差异,找到相应属性数据的离群点。 +直方图:分析单个属性在各个区间变化分布,发现特征对类别的区分度。 +散点图:用来显示数据的相关性分布(正相关,负相关,不相关)。 +###数据相似性 +数据矩阵:N行表示N个数据,P列表示P个维度。 +相异矩阵:用来储存n个对象两两之间的相似性,通常表示为三角矩阵。 +相似度:度量两个对象有多相似,值越大越相似,取值[0,1]。 +相异度:度量两个对象差别程度,值越小越相似,最小相异度为0,取值0到正无穷。 +邻近性:指相似度或相异度。 +####标称属性的邻近性度量 +![](E:/Desktop\学习\数据分析部\数据挖掘\Screenshot_20220409_174242_com.netease.edu.ucmooc.jpg) +方法:简单匹配 +d(i,j) = (p-m)/p +m为匹配次数(相同情况),p为属性总数 +####二进制属性的邻近性度量 +![](E:/Desktop\学习\数据分析部\数据挖掘\Screenshot_20220409_174204.jpg) +1. 建立一个邻接表 + +![](E:/Desktop\学习\数据分析部\数据挖掘\Screenshot_20220409_174129.jpg) + +1. 对称的二进制距离度量: +d(i,j) = (r+s)/(q+r+s+t) + +3. 不对称的二进制距离度量: +d(i,j) = (r+s)/(q+r+s) + +4. 杰卡德相似系数: +d(i,j) = q/(q+r+s) + +####数值属性的邻近性度量 +![](E:/Desktop\学习\数据分析部\数据挖掘\Screenshot_20220409_174256_com.netease.edu.ucmooc.jpg) +闵可夫斯基距离 + +![](E:/Desktop\学习\数据分析部\数据挖掘\Screenshot_20220409_175242_com.netease.edu.ucmooc.jpg) + +曼哈顿距离 +欧氏距离 +上确界距离(两个数据的距离中最大的一个) +![](E:/Desktop\学习\数据分析部\数据挖掘\Screenshot_20220409_175306_com.netease.edu.ucmooc.jpg) + + +###数据预处理 + +####数据清洗 +代码教学: +https://scikit-learn.org/stable/modules/impute.html#impute +#####处理丢失数据 +1. 忽略元组:当类标号缺少时,删除这条数据,但若删除过多,其效果会变差 +2. 手动填写遗漏值 +3. 自动填写:使用属性的平均值填充空缺值 +#####处理噪音数据 +箱线图检测离群点:删除离群点 +#####处理不一致的数据 +1. 计算推理,替换 +2. 全局替换 + diff --git a/docs/views/imgs/2021/12/yuping0924/Figure_3.png b/docs/views/imgs/2021/12/yuping0924/Figure_3.png new file mode 100644 index 00000000..6c91c381 Binary files /dev/null and b/docs/views/imgs/2021/12/yuping0924/Figure_3.png differ