GPT-1

Generative Pre-trained Transformer 1 (GPT-1)

Architecture 🏗️

The GPT-1 architecture is a twelve-layer decoder-only transformer, utilizing twelve masked self-attention heads, with 64-dimensional states each (for a total of 768). The model utilizes the Adam optimization algorithm, diverging from simple stochastic gradient descent; the learning rate was progressively increased from zero over the first 2,000 updates to a peak of 2.5×10−4, followed by annealing to 0 using a cosine schedule.

Training 📚

GPT-1 was trained on the BooksCorpus dataset, containing over 7,000 unique unpublished books, amounting to nearly 800 million words. This extensive corpus provided a diverse range of vocabulary, narrative styles, and topics, enabling the model to develop a broad understanding of language patterns and structures.

Capabilities 🚀

GPT-1 showcased remarkable capabilities in various natural language processing tasks, such as:

Language modeling
Machine translation
Text summarization
Question answering

The model's performance was particularly notable in tasks requiring contextual understanding and the generation of coherent, contextually relevant text.

Implementation 🛠️

This repository provides implementation details and resources for GPT-1. Users can utilize this model for various NLP tasks, adapting it to specific requirements and datasets.

Additional documentation is available in MODEL.md, DATASET.md, and PROGRAM_FLOW.md.

Getting Started ⚡

See SETUP.md for environment setup and training instructions. Once dependencies are installed you can pretrain the model with:

python train.py -c conf/pretrain.yml

After training you can generate text with:

python generate.py -l 100

Prerequisites 🖥️

Details about necessary prerequisites, including software and hardware requirements.

Installation 📦

Step-by-step guide to installing and configuring GPT-1 on your system.

Contributing 🤝

We welcome contributions from the community. Please refer to the CONTRIBUTING.md for guidelines on how to contribute.

Versioning 🗂️

For the versions available, see the tags on this repository.

Authors and Acknowledgements 🙏

[Your Name] - Mind-Interfaces/GPT-1/
Akshat Pandey - Pytorch implementation of GPT-1
Yu Guo - GPT-1 结构的简单复现
Sosuke Kobayashi - Homemade BookCorpus
Acknowledgements to anyone whose resources were used

License 📄

This project is licensed under MIT - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-1

Architecture 🏗️

Training 📚

Capabilities 🚀

Implementation 🛠️

Getting Started ⚡

Prerequisites 🖥️

Installation 📦

Contributing 🤝

Versioning 🗂️

Authors and Acknowledgements 🙏

License 📄

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github/workflows		.github/workflows
conf		conf
model		model
preprocessing		preprocessing
tests		tests
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MODEL.md		MODEL.md
PROGRAM_FLOW.md		PROGRAM_FLOW.md
README.md		README.md
SETUP.md		SETUP.md
generate.py		generate.py
minimal_yaml.py		minimal_yaml.py
requirements.txt		requirements.txt
train.py		train.py

License

SYSTEMS-OPERATOR/GPT-NEXT

Folders and files

Latest commit

History

Repository files navigation

GPT-1

Architecture 🏗️

Training 📚

Capabilities 🚀

Implementation 🛠️

Getting Started ⚡

Prerequisites 🖥️

Installation 📦

Contributing 🤝

Versioning 🗂️

Authors and Acknowledgements 🙏

License 📄

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages