To record the learning process for data science, including sentiment analysis and machine learning
Main difference between 2.x and 3.x
-
except Exception,e: -> except Exception as e:
-
print must include brackets i.e. print()
-
raw_input() -> input()
-
do not need decode('gbk')
For detailed learning, please check python-class repo
Given data results, opponents and game data from NBA stats website, I use cross validation and linear model. Training dataset with Logistic Regression, we can know the probability of winning the game between specific teams.
Along with Jupyter Notebook, I use few python library to scrape data of now-playing movies from Douban. urllib and BeautifulSoup might be curcial for scraping website and extracting useful information. Besides that, I use re for cleaning data, jieba for lexicon, pandas and numpy for calculating frequecy of every single word. Finally, using matplotlib and wordcloud for making the graph.
Scraping information from Baidu stock and eastmoney, list all the potential rising stocks.
Practice of scraping useful information from Enterprise info website Qichacha and making a spreadsheet.
Few practice with beginners of TensorFlow
- Introduction of TensorFlow
Building a linear model and generating a graph.
- MNIST Application (with input_data.py)
- softmax
accuracy 92%
- Regression models
Logistic Regression and Linear Regression
-
RNN
-
Auto text generation
LSTM and RNN
- CNN(advanced)
accuracy 97%
- Twitter Sentiment of U.S. airlines
Discard all neutral words, using regression. Accuracy on training and validation.
- Twitter Prediction of U.S. airlines
Drawing mood counting graph, which reveals the comparison and negative reasons of different airlines. Comparison performance among different classifiers: LogisticRegression, KNeighborsClassifier SVC, DecisionTreeClassifier, RandomForestClassifier, AdaBoostClassifier, GaussianNB
- Amazon reviews text generation
TensorFlow, RNN and LSTM
- Data science and Machine Learning beginners
-
matplotlib, pandas
-
sklearn,
Supervised: KNN, CV
Unsupervised: Kmeans, TSNE, PCA
- feature ranking
-
deep learning
-
(linear regression) house price Prediction
-
(classifiers) sentiment analysis