All notable changes to this project will be documented in this file.
- Add Github Action that build & pushes to PYPI index
- Adds nlwiki models with sample of probable D, C, and B-class articles for review
- Allow setting custom classes and weights when extracting scores
- Added
non-external-idstatement count as a signal - Add tests to ensure item parts are being counted correctly
- Add check for image and commons media
- Add retraining model documentation
- Add
is_astronomical_objectfeature for wikidatawiki - Add
is_scholarlyarticlefeature to wikidatawiki - Add test instructions
- Add some basic installation instructions
- Add new ukwiki model
- Added
words_to_watchto ptwikifeature_lists - Add
weighted_sumutility
- Rebuilds enwiki model with revscoring 2.11.1
- Builds new model for nlwiki using new features and manual labels
- Remove impactless property suggester feature
- Builds new wikidata model
- Remove number of sitelinks signal from wikibase item quality model
- Reduce the size of wikidata model and simplify its logic
- Move tests to outside of the production code
- Rebuilds ptwiki models with revscoring-2.8.2
- Rebuilds all models with revscoring-2.8.2
- Increase revscoring version requirement
- Update Makefile to remove revisions older than 2014
- Rebuild enwiki model with new image counts
- Rebuilds ptwiki models with more observations
- Fix
extract_scoresutility - Fix fatal error when creating the model info
- Fix module names import type
- Convert page id to string explicitly
- Fix extraction when there are multiple reverts
- Match articles to talk pages using the API
- Detect labels in old ptwiki templates
- Fix typo in
user_agent - Fix misleading dataset filenames
- Update
extract_labelingsdoc - Fix doc for ptwiki extractor
- Feature list for ptwiki
- Bumped revscoring to v2.5.1
- Old code examples (
examples/test_model.pyandexamples/train_model.py)
- Bumped revscoring to v2.4.x
- Added
content_typeparam to setup.py - Minor formatting edits in README
- Added features for English Wikipedia's short-format notes.
- Release Criteria document
- svwiki feature lists
- Added ability to do a fast filtering pass before parsing wikitext.
- Added svwiki extractor.
- Added Wikibase item features.
- Added
utilutility helpers. - Added
fetch_labelsutilities. - Added trwiki extractor.
- Added
words_to_watchcount to enwiki feature lists. - Added new features to wikidatawiki - (@glorianY)
- Added basic extraction pattern for item quality model.
- Added Persian Wikipedia features.
- Added glwiki feature lists.
- Adds
item_completesto wikidatawiki.
- Rename wikiclass to articlequality.
- Bumped revscoring to v2.3.4
- Updated
fetch_textfor newrvslotsAPI param. - Remove target files when commands error out.
- Replaced filenames with automatic Make variables.
- Update classification examples to revscoring 2.x
- Started using TravisCI for automated builds.
- Use PyTest for testing now.
- Rename pagelevel prediction classes in frwikisource.
- Rename
wp10->articlequality. - Change wikidatawiki models to use GradientBoosting.
- Fixed bug in
fetch_item_info. - Update about.py in wikiclass folder to the right github link.
- Resolved mwxml/mwtypes version conflict.
- Fixed "who" templates in enwiki features.
- Fixed trwiki extractor so that it works for 'baslagıç'.
- Added feature lists for ruwiki.
- Added
extract_scoresutility.
- Implemented modular
about.pypattern for pkg info. - Bumped revscoring to v1.3.0
- Add HTML comment filtering to Russian extractor
- Added testcase to ruwiki extractor.
- Switched RF for GradientBoosting models in Makefile.
- Cleaned up
extract_from_textutility.
- Wrong variable name in frwiki extractor.
- Fixed division with modifiers in
wikipedia.article.
- Added Russian assessment extractor. - @nettrom
- Flexibility for revscoring version requirement.
- Typo in French extractor. - @nettrom
- Added basic counts for cn templates and dict_words/word to frwiki feature list.
- Added tuning reports to Makefile.
- Bumped revscoring requirement to v1.1.0.
- Updated feature extractor for revscoring 1.x
- Updates enwiki and frwiki
feature_listsfor revscoring 1.x
- Using
mwreverts,mwxml,mwapilibraries instead ofmwlib.
- Bumped revscoring requirement to 0.7.10 and fixed issues this causes.
- Updated requirement for mwtypes >= 0.2.0
- Adds new
templates_that_matchmeta feature. - Added
not_an_articlefilter. - Added
who,citation_neededandmain_articletemplates to enwiki.
- Bumped revscoring requirement to 0.7.2
- Switched text extraction to be API-based.
- Added verbose option to
extract_features. - Parallelization for
extract_features.
- Minor divide-by-zero errors in enwiki and frwiki features.
- Template list error for frwiki. - @gpaumier
- Remove empty sections from CHANGELOG, they occupy too much space and create too much noise in the file. People will have to assume that the missing sections were intentionally left out because they contained no notable changes.
- Cleanup to feature sets for enwiki and frwiki.
- Spaces to tabs in Makefile
- Pass
page_labelingtoextract_textas arg.
- Fixed issue with generator requirements in setup.
- README format changed from
.rstto.md. - Update functions documentation.
- Minor updates to Makefile and
extract_textfor running on stat3
- Basic API.
- Added tests for all features and datasources.
- Added frwiki extractor
- Added
extract_textutility.
- Restructured wikiclass to make use of the revscoring package.
- Completed enwiki extractor.
- Added error handling in case mwparserfromhell fails.
- Switches
extract_labelingsto use mwxml library - Remove post '/' stuff from titles during normalization.
- Additional documentation.
- Minor issues in
extract_features.pyscript.
- Removed duplicated feature definitions(now part of revscoring).
- Added minimal docs setup.
- Added a LICENSE.
- Moved
add_textutil toscripts/dir. - Completed basic docs.
- README errors.
- Handle division-by-zero case for articles with no words.
- First release on PyPI.
- Working RFTextModel
- Added
add_textutil. - Basic README.