Skip to content

NLP using ngram models based on MLE, repo contains R presentation, ngram modeling and Shiny app codes

Notifications You must be signed in to change notification settings

sarizzuz/nextwordpredictor

Repository files navigation

Capstone Project for the Johns Hopkins University Data Science Specialization on Coursera.

Overview

Background

Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities. But typing on mobile devices can be a serious pain. SwiftKey, our corporate partner in this capstone, builds a smart keyboard that makes it easier for people to type on their mobile devices. One cornerstone of their smart keyboard is predictive text models. When someone types:

I went to the

the keyboard presents three options for what the next word might be. For example, the three words might be gym, store, restaurant. In this capstone we will work on understanding and building predictive text models like those used by SwiftKey.

Project

This project covers:

  1. Text Mining and Analysis of Text Data.
  2. Natural Language Processing.

Work on the project involves:

  1. Analyzing a large corpus of text documents to discover the structure in the data and how words are put together.
  2. Cleaning and analyzing text data.
  3. Building and sampling from a predictive text model.
  4. Building a predictive text product, in a form of a Shiny app.
  5. Creating a R Presentations for the app.

Repository

This repository contains the code for:

  1. The pre-processing of the data set as provided by Swiftkey.
  2. A milestone report that outlines the initial exploration of the datasets.
  3. The ngram modeling
  4. The word prediction model
  5. A Shiny app that takes as input a phrase (multiple words), and it predicts the next word.
  6. A slide deck created with R presentations pitching the algorithm and app.

Resources

Some natural language processing resources:

  • Text mining infrastucture in R
  • CRAN Task View: Natural Language Processing
  • Videos and Slides from Stanford Natural Language Processing course

About

NLP using ngram models based on MLE, repo contains R presentation, ngram modeling and Shiny app codes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published