- TextBlob now depends on NLTK 3. The vendorized version of NLTK has been removed.
- Fix bug that raised a SyntaxError when translating text with non-ascii characters on Python 3.
- Fix bug that showed "double-escaped" unicode characters in translator output (issue #56). Thanks Evan Dempsey.
- Backwards-incompatible: Completely remove
import text.blob. You shouldimport textblobinstead. - Backwards-incompatible: Completely remove
PerceptronTagger. Installtextblob-aptaggerinstead. - Backwards-incompatible: Rename
TextBlobExceptiontoTextBlobErrorandMissingCorpusExceptiontoMissingCorpusError. - Backwards-incompatible:
Formatclasses are passed a file object rather than a file path. - Backwards-incompatible: If training a classifier with data from a file, you must pass a file object (rather than a file path).
- Updated English sentiment corpus.
- Add
feature_extractorparameter toNaiveBayesAnalyzer. - Add
textblob.formats.get_registry()andtextblob.formats.register()which allows users to register custom data source formats. - Change
BaseClassifier.detectfrom astaticmethodto aclassmethod. - Improved docs.
- Tested on Python 3.4.
- Fix display (
__repr__) of WordList slices on Python 3. - Add download_corpora module. Corpora must now be downloaded using
python -m textblob.download_corpora.
- Sentiment analyzers return namedtuples, e.g.
Sentiment(polarity=0.12, subjectivity=0.34). - Memory usage improvements to NaiveBayesAnalyzer and basic_extractor (default feature extractor for classifiers module).
- Add
textblob.tokenizers.sent_tokenizeandtextblob.tokenizers.word_tokenizeconvenience functions. - Add
textblob.classifiers.MaxEntClassifer. - Improved NLTKTagger.
- Fix bug in spelling correction that stripped some punctuation (Issue #48).
- Various improvements to spelling correction: preserves whitespace characters (Issue #12); handle contractions and punctuation between words. Thanks @davidnk.
- Make
TextBlob.wordsmore memory-efficient. - Translator now sends POST instead of GET requests. This allows for larger bodies of text to be translated (Issue #49).
- Update pattern tagger for better accuracy.
- Fix bug that caused
ValueErrorupon sentence tokenization. This removes modifications made to the NLTK sentence tokenizer. - Add
Word.lemmatize()method that allows passing in a part-of-speech argument. Word.lemmareturns correct part of speech for Word objects that have theirposattribute set. Thanks @RomanYankovsky.
- Backwards-incompatible: Renamed package to
textblob. This avoids clashes with other namespaces called text. TextBlob should now be imported withfrom textblob import TextBlob. - Update pattern resources for improved parser accuracy.
- Update NLTK.
- Allow Translator to connect to proxy server.
- PerceptronTagger completely deprecated. Install the
textblob-aptaggerextension instead.
- Bugfix updates.
- Fix bug in feature extraction for
NaiveBayesClassifier. basic_extractoris now case-sensitive, e.g. contains(I) != contains(i)- Fix
reproutput when a TextBlob contains non-ascii characters. - Fix part-of-speech tagging with
PatternTaggeron Windows. - Suppress warning about not having scikit-learn installed.
- Wordnet integration.
Wordobjects havesynsetsanddefinitionsproperties. Thetext.wordnetmodule allows you to createSynsetandLemmaobjects directly. - Move all English-specific code to its own module,
text.en. - Basic extensions framework in place. TextBlob has been refactored to make it easier to develop extensions.
- Add
text.classifiers.PositiveNaiveBayesClassifier. - Update NLTK.
NLTKTaggernow working on Python 3.- Fix
__str__behavior.print(blob)should now print non-ascii text correctly in both Python 2 and 3. - Backwards-incompatible: All abstract base classes have been moved to the
text.basemodule. - Backwards-incompatible:
PerceptronTaggerwill now be maintained as an extension,textblob-aptagger. Instantiating atext.taggers.PerceptronTagger()will raise aDeprecationWarning.
- Word tokenization fix: Words that stem from a contraction will still have an apostrophe, e.g.
"Let's" => ["Let", "'s"]. - Fix bug with comparing blobs to strings.
- Add
text.taggers.PerceptronTagger, a fast and accurate POS tagger. Thanks @syllog1sm. - Note for Python 3 users: You may need to update your corpora, since NLTK master has reorganized its corpus system. Just run
curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | pythonagain. - Add
download_corpora_lite.pyscript for getting the minimum corpora requirements for TextBlob's basic features.
- Fix bug that resulted in a
UnicodeEncodeErrorwhen tagging text with non-ascii characters. - Add
DecisionTreeClassifier. - Add
labels()andtrain()methods to classifiers.
- Classifiers can be trained and tested on CSV, JSON, or TSV data.
- Add basic WordNet lemmatization via the
Word.lemmaproperty. WordList.pluralize()andWordList.singularize()methods returnWordListobjects.
- Add Naive Bayes classification. New
text.classifiersmodule,TextBlob.classify(), andSentence.classify()methods. - Add parsing functionality via the
TextBlob.parse()method. Thetext.parsersmodule currently has one implementation (PatternParser). - Add spelling correction. This includes the
TextBlob.correct()andWord.spellcheck()methods. - Update NLTK.
- Backwards incompatible:
clean_htmlhas been deprecated, just as it has in NLTK. Use Beautiful Soup'ssoup.get_text()method for HTML-cleaning instead. - Slight API change to language translation: if
from_langisn't specified, attempts to detect the language. - Add
itokenize()method to tokenizers that returns a generator instead of a list of tokens.
- Unicode fixes: This fixes a bug that sometimes raised a
UnicodeEncodeErrorupon creating accessingsentencesfor TextBlobs with non-ascii characters. - Update NLTK
- Important patch update for NLTK users: Fix bug with importing TextBlob if local NLTK is installed.
- Fix bug with computing start and end indices of sentences.
- Fix bug that disallowed display of non-ascii characters in the Python REPL.
- Backwards incompatible: Restore
blob.jsonproperty for backwards compatibility with textblob<=0.3.10. Add ato_json()method that takes the same arguments asjson.dumps. - Add
WordList.appendandWordList.extendmethods that append Word objects.
- Language translation and detection API!
- Add
text.sentimentsmodule. Contains thePatternAnalyzer(default implementation) as well as aNaiveBayesAnalyzer. - Part-of-speech tags can be accessed via
TextBlob.tagsorTextBlob.pos_tags. - Add
polarityandsubjectivityhelper properties.
- New
text.tokenizersmodule withWordTokenizerandSentenceTokenizer. Tokenizer instances (from either textblob itself or NLTK) can be passed to TextBlob's constructor. Tokens are accessed through the newtokensproperty. - New
Blobberclass for creating TextBlobs that share the same tagger, tokenizer, and np_extractor. - Add
ngramsmethod. - Backwards-incompatible:
TextBlob.json()is now a method, not a property. This allows you to pass arguments (the same that you would pass tojson.dumps()). - New home for documentation: https://textblob.readthedocs.org/
- Add parameter for cleaning HTML markup from text.
- Minor improvement to word tokenization.
- Updated NLTK.
- Fix bug with adding blobs to bytestrings.
- Bundled NLTK no longer overrides local installation.
- Fix sentiment analysis of text with non-ascii characters.
- Updated nltk.
- ConllExtractor is now Python 3-compatible.
- Improved sentiment analysis.
- Blobs are equal (with ==) to their string counterparts.
- Added instructions to install textblob without nltk bundled.
- Dropping official 3.1 and 3.2 support.
- Importing TextBlob is now much faster. This is because the noun phrase parsers are trained only on the first call to
noun_phrases(instead of training them every time you import TextBlob). - Add text.taggers module which allows user to change which POS tagger implementation to use. Currently supports PatternTagger and NLTKTagger (NLTKTagger only works with Python 2).
- NPExtractor and Tagger objects can be passed to TextBlob's constructor.
- Fix bug with POS-tagger not tagging one-letter words.
- Rename text/np_extractor.py -> text/np_extractors.py
- Add run_tests.py script.
- Every word in a
BloborSentenceis aWordinstance which has methods for inflection, e.gword.pluralize()andword.singularize(). - Updated the
np_extractormodule. Now has an new implementation,ConllExtractorthat uses the Conll2000 chunking corpus. Only works on Py2.