singularity-lab · MuShang9 · Nov 7, 2021 · Nov 12, 2021 · Nov 12, 2021 · Nov 12, 2021
diff --git a/docs/views/data/2021-03-27-如何利用pandas做简单的数据分析.md b/docs/views/data/2021-03-27-如何利用pandas做简单的数据分析.md
@@ -1,6 +1,7 @@
 ---
 layout: post
 title: 如何用pandas做简单的数据分析
+
 date: 2021-03-27
 author: LZY
 categories:

diff --git a/docs/views/data/2021-11-08-week1学习内容.md b/docs/views/data/2021-11-08-week1学习内容.md
@@ -0,0 +1,32 @@
+---
+layout: post
+title: week1学习内容
+date: 2021-11-08
+author: 饶翰宇
+categories:
+    - 数据分析部
+tags:
+    - 数据分析
+    - Python
+---
+
+## python：
+
+1. pandas
+
+算法数据结构：
+
+1. 决策树
+1. 随机森林
+2. 栈和队列
+
+数学：
+
+1. 二维随机变量的分布
+
+其他：
+
+1. 学习了基本的markdown语法
+2. 利用Typora书写markdown
+3. 安装好了pandoc，配置好了上传博客的基础工具
+
diff --git a/docs/views/data/2021-11-25-MySQL和数据可视化.md b/docs/views/data/2021-11-25-MySQL和数据可视化.md
@@ -0,0 +1,212 @@
+---
+layout: post
+title: MySQL和数据可视化
+date: 2021-11-25
+author: 饶翰宇
+categories:
+    - 数据分析部
+tags:
+    - MySQL
+    - Python	
+    - 函数画图
+---
+
+## week3
+
+### MySQL进阶1
+
+1. 查询🐱‍🐉
+
+   - 排序查询
+
+     ```mysql
+     select * from person order by age desc,id asc;
+     ```
+
+2. 函数🐱‍🚀
+
+   - 单行函数
+
+     1. 字符函数
+
+        - concat
+
+          ```mysql
+          select concat('这瓜','保熟','吗');
+          ```
+
+        - length(返回字节长度)
+
+          ```mysql
+          select length('张三a123');
+          ```
+
+        - substr/substring
+
+          ```mysql
+          SELECT SUBSTR('今天希望你开心',5,3);
+          ```
+
+        - upper & lower
+
+          ```mysql
+          select upper('abAb1');
+          select lower('abAb1');
+          ```
+
+        - instr
+
+          ```mysql
+          select instr('泊松分布','分布');
+          ```
+
+        - trim
+
+          ```mysql
+          select trim('  abcd    ');
+          select trim('ab' from 'ab  abcc b');
+          ```
+
+        - lpad & rpad
+
+          ```mysql
+          SELECT LPAD('哥谭市',8,'*');
+          SELECT RPAD('哥谭市',8,'*');
+          ```
+
+        - replace
+
+          ```mysql
+          SELECT REPLACE('想登上高山欲穷千里目','想','不想');
+          ```
+
+
+
+     2. 数学函数
+
+        - round
+
+          ```mysql
+          select round(1.22,1);
+          ```
+
+        - ceil & floor(向上、向下取整)
+
+          ```mysql
+          select ceil(1.9);
+          select floor(1.9);
+          ```
+
+        - truncate(保留几位小数)
+
+          ```mysql
+          select truncate(1.231313,3);
+          ```
+
+        - mod
+
+          ```mysql
+          select mod(10,3);
+          ```
+
+     3. 日期函数
+
+        - now
+        - curdate
+        - curtime
+
+     4. 其他函数
+
+     5. 流程控制函数
+
+   - 分组函数（统计使用）
+
+
+
+### 绘图
+
+#### 绘制正态分布:jack_o_lantern:
+
+1. 利用随机数绘画:baby_chick:
+
+   - 首先利用numpy生成随机标准正态分布数组
+
+     ```python
+     import numpy as np
+     np.random.seed(0)
+     data = np.random.standard_normal(100000000)
+     data
+     ```
+
+     ```python
+     array([ 1.76405235,  0.40015721,  0.97873798, ...,  0.32191089,
+             0.25199669, -1.22612391])
+     ```
+
+   - 然后使用matplotlib绘出图像
+
+     ```python
+     import matplotlib.pyplot as plt
+     %matplotlib inline
+     plt.hist(data,1000)
+     ```
+
+     ![屏幕截图 2021-11-26 121445.png](https://i.loli.net/2021/11/26/2yPKNiYHb6kuaQR.png)
+
+
+
+2. 利用sympy画图:label:
+
+   - ```mysql
+     from sympy import *
+     from sympy.stats import Normal,density
+     ```
+
+   - ```mysql
+     y = symbols('y')
+     x = symbols('x')
+     y = Normal(y,0,1)
+     plot(density(y)(x))
+     ```
+
+   - ```python
+     density(y)(x)
+     ```
+
+   - ![屏幕截图 2021-11-26 135103.png](https://i.loli.net/2021/11/26/tYrEBCmaT67iFWX.png)
+
+   - 
+
+     ![屏幕截图 2021-11-26 133204.png](https://i.loli.net/2021/11/28/EdlYUr84F1ceCXi.png)
+
+
+
+绘制其他函数
+
+1. sympy
+
+   - ```python
+     plot(x,pow(x,2))
+     ```
+
+   - ![屏幕截图 2021-11-26 141341.png](https://i.loli.net/2021/11/26/LMZstOnfU2JHKF9.png)
+
+2. matplotlib
+
+   - ```python
+     x = np.arange(1,10,0.01)
+     y = np.log10(x)
+     u = np.arange(1,10,0.01)
+     w = np.exp(u)
+     ```
+
+   - ```python
+     plt.style.use('ggplot')
+     fig,ax = plt.subplots(1,2,figsize=(8,4))
+     ax[0].plot(x,y,label='log10',color='r')
+     ax[0].legend(loc='best')
+     ax[1].plot(u,w,label='ex',color='b')
+     ax[1].legend(loc='best')
+     ```
+
+   - ![屏幕截图 2021-11-26 143624.png](https://i.loli.net/2021/11/26/wZLR4r2SmQ1G8XW.png)
diff --git a/docs/views/data/2021-12-3-朴素贝叶斯算法实现文本分类.md b/docs/views/data/2021-12-3-朴素贝叶斯算法实现文本分类.md
@@ -0,0 +1,119 @@
+---
+layout: post
+title: 朴素贝叶斯算法实现文本分类
+date: 2021-12-3
+author: 饶翰宇
+categories:
+    - 数据分析部
+tags:
+    - Python	
+    - 文本分类
+---
+
+## 文本分类
+
+现实中的文本复杂多样，文本分类和文本情感分析是我们开展机器学习的重要组成部分。
+
+以下将用一个案例来实现对文本的分类。
+
+- 首先导入原始的数据
+
+  这里我们使用一个对餐厅评价的数据集
+
+  ```python
+  import pandas as pd
+  data = pd.read_csv('./restaurant.csv',encoding='gb18030')
+  data
+  ```
+
+  ![A5SF1_K_AV1CUQ9__9_Z8M7.png](https://s2.loli.net/2021/12/04/1cTFRozlOeU2W7I.png)
+
+- 紧接着对每条数据附上标签，将star高于3的划分为1，反之则为0
+
+  ```python
+  import numpy as np
+  star = np.array(data.star)
+  star[star <= 3] = 0
+  star[star > 3] = 1
+  data['label'] = star
+  data
+  ```
+
+  ![98ST`_R_XMSE__CTL_YN_GV.png](https://s2.loli.net/2021/12/04/RlbGJV6ZQmYShsa.png)
+
+- 然后我们对每条评论进行切词并且新增加一列“words”
+
+  ```python
+  import jieba
+  data['words'] = data['comment'].apply(lambda x:' '.join(jieba.lcut(x,cut_all=True)))
+  data
+  ```
+
+  ![8BI6_7E8FI7864I_CVVY1_T.png](https://s2.loli.net/2021/12/04/Z8QMfj7LEolWqrX.png)
+
+- 对数据集进行训练集和测试集的划分
+
+  ```python
+  from sklearn.model_selection import train_test_split
+  x_train,x_test,y_train,y_test = train_test_split(data.words,data.label,test_size=0.2,random_state=42)
+  ```
+
+- 导入文本特征提取方法
+
+  ```python
+  from sklearn.feature_extraction.text import CountVectorizer
+  ```
+
+- 计算次数
+
+  ```python
+  counter = CountVectorizer()
+  x_train = counter.fit_transform(x_train)
+  x_test = counter.transform(x_test)
+  ```
+
+- 画出图表
+
+  ```python
+  amount = x_train.toarray()
+  name = counter.get_feature_names()
+  result = pd.DataFrame(data=amount,columns=name)
+  result
+  ```
+
+  ![屏幕截图 2021-12-05 164011.png](https://s2.loli.net/2021/12/05/ABOXHVwYGRyFZhM.png)
+
+- 搭建模型
+
+  ```python
+  from sklearn.naive_bayes import MultinomialNB
+  estimator = MultinomialNB()
+  estimator.fit(x_train,y_train)
+  ```
+
+  ```python
+  y_predict = estimator.predict(x_test)
+  ```
+
+  ![屏幕截图 2021-12-05 164841.png](https://s2.loli.net/2021/12/05/gsTkdQZNf3vCY6o.png)
+
+- 计算准确率
+
+  ```python
+  estimator.score(x_test,y_test)
+  ```
+
+  $$
+  0.8475
+  $$
+
+
+
+- 查看测试集和预测目标值的正确率
+
+  ```python
+  np.array(y_test == y_predict)
+  ```
+
+  ![屏幕截图 2021-12-05 165057.png](https://s2.loli.net/2021/12/05/DWs2x9eSQcEFirm.png)
+