Capstone Project for the Johns Hopkins University Data Science Specialization on Coursera.

Overview

Background

Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities. But typing on mobile devices can be a serious pain. SwiftKey, our corporate partner in this capstone, builds a smart keyboard that makes it easier for people to type on their mobile devices. One cornerstone of their smart keyboard is predictive text models. When someone types:

I went to the

the keyboard presents three options for what the next word might be. For example, the three words might be gym, store, restaurant. In this capstone we will work on understanding and building predictive text models like those used by SwiftKey.

Project

This project covers:

Text Mining and Analysis of Text Data.
Natural Language Processing.

Work on the project involves:

Analyzing a large corpus of text documents to discover the structure in the data and how words are put together.
Cleaning and analyzing text data.
Building and sampling from a predictive text model.
Building a predictive text product, in a form of a Shiny app.
Creating a R Presentations for the app.

Repository

This repository contains the code for:

The pre-processing of the data set as provided by Swiftkey.
A milestone report that outlines the initial exploration of the datasets.
The ngram modeling
The word prediction model
A Shiny app that takes as input a phrase (multiple words), and it predicts the next word.
A slide deck created with R presentations pitching the algorithm and app.

Resources

Some natural language processing resources:

Text mining infrastucture in R
CRAN Task View: Natural Language Processing
Videos and Slides from Stanford Natural Language Processing course

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
NextWordPredictor		NextWordPredictor
.gitignore		.gitignore
01_dataprep.R		01_dataprep.R
02_modeling.R		02_modeling.R
03_prediction.R		03_prediction.R
DSCapstone.Rproj		DSCapstone.Rproj
NextWordPredictor-rpubs.html		NextWordPredictor-rpubs.html
NextWordPredictor.Rpres		NextWordPredictor.Rpres
NextWordPredictor.md		NextWordPredictor.md
README.md		README.md
milestone_report.Rmd		milestone_report.Rmd
milestone_report.html		milestone_report.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capstone Project for the Johns Hopkins University Data Science Specialization on Coursera.

Overview

Background

Project

Repository

Resources

About

Uh oh!

Releases

Packages

Languages

sarizzuz/nextwordpredictor

Folders and files

Latest commit

History

Repository files navigation

Capstone Project for the Johns Hopkins University Data Science Specialization on Coursera.

Overview

Background

Project

Repository

Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages