Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities. But typing on mobile devices can be a serious pain. SwiftKey, our corporate partner in this capstone, builds a smart keyboard that makes it easier for people to type on their mobile devices. One cornerstone of their smart keyboard is predictive text models. When someone types:
I went to the
the keyboard presents three options for what the next word might be. For example, the three words might be gym, store, restaurant. In this capstone we will work on understanding and building predictive text models like those used by SwiftKey.
This project covers:
- Text Mining and Analysis of Text Data.
- Natural Language Processing.
Work on the project involves:
- Analyzing a large corpus of text documents to discover the structure in the data and how words are put together.
- Cleaning and analyzing text data.
- Building and sampling from a predictive text model.
- Building a predictive text product, in a form of a Shiny app.
- Creating a R Presentations for the app.
This repository contains the code for:
- The pre-processing of the data set as provided by Swiftkey.
- A milestone report that outlines the initial exploration of the datasets.
- The ngram modeling
- The word prediction model
- A Shiny app that takes as input a phrase (multiple words), and it predicts the next word.
- A slide deck created with R presentations pitching the algorithm and app.
Some natural language processing resources:
- Text mining infrastucture in R
- CRAN Task View: Natural Language Processing
- Videos and Slides from Stanford Natural Language Processing course