Chatbot en Español

Introduction

Conversational agent in spanish done with deep learning and a dataset of movies subtitles. If you want to walk directly to the transformer version you can do it here.

Installation

Clone the repository and install:

pip install .

You can alternatively install from pip, which doesn't download the exploratory notebooks:

pip install spanish_chatbot

For a quickstart:

from spanish_chatbot import TransformerChatbot
chatbot = TransformerChatbot(load_quant=True,use_cuda=False) # load pre-trained model
chatbot.evaluateOneInput('Hola')                             # one input, one output
chatbot.evaluateCycle()                                      # Cicle of input and outputs

Model description

Seq2seq. For a detailed explanation in spanish you can see this blog post. Features:
- Loung attention
- Output embedding with wegiht tying
Transformer. Features:
- Weight tying
- Beam search
- Quantization: Pytorch Dynamic Quantization. Model size reduced to 41% of the original and 2x inference speed up. Backends suported:
  - x86 CPUs with AVX2 support or higher (without AVX2 some operations have inefficient implementations)
  - ARM CPUs (typically found in mobile/embedded devices)

Instructions

For training:
1. Download dataset from here here (2Gb) and put it on /data
2. Generate data with python pre_processing.py. Arguments:
  - --lines: number of lines from the orignial dataset to be processed. Default 500_00
  - --max_len: max length of the sentence. Default: 40
  - --min_count: min count of a word to be left of the vocabulary. Default: 10
3. Run the training notebook for training and evaluating of the model

For a detailed explanation of the processing see the notebook.

For evaluation:
1. Download the parameters for the seq2seq mode, the full transformer model or the quantized transformer model and uncompress on ./data.
2. Run the evaluation notebook.

Credits

Pytorch tutorial, for the base of the model. Link
OpenSubtitle and thier collection of datasets of movies subtitles in every language.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
img		img
notebooks		notebooks
spanish_chatbot		spanish_chatbot
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py
slackbot.py		slackbot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chatbot en Español

Introduction

Installation

Model description

Instructions

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Chatbot en Español

Introduction

Installation

Model description

Instructions

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages