Noventis is a powerful Python library designed to revolutionize your data analysis workflow through intelligent automation. Built with modern data scientists and analysts in mind, Noventis provides cutting-edge tools for automated exploratory data analysis, predictive modeling, and data cleaningβall with minimal code.
- π EDA Auto - Automated exploratory data analysis with comprehensive visualizations and statistical insights
- π― Predictor - Intelligent ML model selection and training with automated hyperparameter tuning
- π§Ή Data Cleaner - Smart data preprocessing and cleaning with advanced imputation strategies
- β‘ Fast & Efficient - Optimized for performance with large datasets
- π Rich Visualizations - Beautiful, publication-ready charts and reports
- π§ Highly Customizable - Fine-tune every aspect to match your needs
pip install noventisgit clone https://github.com/bccfilkom/noventis.git
cd noventis
pip install -e .import noventis
print(noventis.__version__)
noventis.print_info() # Show detailed installation infoGet started with intelligent data preprocessing and cleaning.
import pandas as pd
from noventis.data_cleaner import AutoCleaner
# Load your data
df = pd.read_csv('your_data.csv')
# Automatic data cleaning
cleaner = AutoCleaner()
df_clean = cleaner.fit_transform(df)
# The cleaned data is ready for analysis!
print(df_clean.info())π Read the Data Cleaner Guide
Automatically generate comprehensive exploratory data analysis reports.
from noventis.eda_auto import EDAuto
# Create EDA report
eda = EDAuto(df_clean)
# Generate comprehensive analysis
eda.generate_report()
# Show specific analyses
eda.show_distributions()
eda.show_correlations()
eda.show_missing_patterns()Build and train machine learning models with automated optimization.
from noventis.predictor import PredictorAuto
# Prepare data
X = df_clean.drop('target', axis=1)
y = df_clean['target']
# Automatic model training
predictor = PredictorAuto()
predictor.fit(X, y, task='classification')
# Make predictions
predictions = predictor.predict(X_test)
# Get model performance
print(predictor.get_metrics())import pandas as pd
from noventis.data_cleaner import AutoCleaner
from noventis.eda_auto import EDAuto
from noventis.predictor import PredictorAuto
# 1. Load data
df = pd.read_csv('your_data.csv')
# 2. Clean data
cleaner = AutoCleaner()
df_clean = cleaner.fit_transform(df)
# 3. Explore data
eda = EDAuto(df_clean)
eda.generate_report()
# 4. Train model
X = df_clean.drop('target', axis=1)
y = df_clean['target']
predictor = PredictorAuto()
predictor.fit(X, y, task='classification')
# 5. Evaluate
print(f"Model Accuracy: {predictor.score(X_test, y_test):.2%}")Intelligent data preprocessing and cleaning with advanced strategies:
- Missing Data Handling - Multiple imputation strategies (mean, median, KNN, iterative)
- Outlier Treatment - Statistical and ML-based detection (IQR, Z-score, Isolation Forest)
- Feature Scaling - Normalization and standardization techniques
- Encoding - Automatic categorical variable encoding (One-Hot, Label, Target)
- Data Type Detection - Intelligent type inference and conversion
- Duplicate Removal - Smart duplicate detection and handling
Comprehensive exploratory data analysis automation:
- Statistical Summary - Descriptive statistics for all features
- Distribution Analysis - Histograms, KDE plots, and normality tests
- Correlation Analysis - Heatmaps and correlation matrices
- Missing Data Analysis - Visualization and patterns of missing values
- Outlier Detection - Automatic identification of anomalies
- Feature Relationships - Scatter plots and pairwise analysis
Automated machine learning with intelligent model selection:
- Auto Model Selection - Automatically selects the best algorithm for your data
- Hyperparameter Tuning - Optimizes model parameters using advanced search algorithms
- Feature Engineering - Creates and selects relevant features automatically
- Cross-Validation - Robust model evaluation with k-fold validation
- Model Explainability - SHAP values and feature importance analysis
- Ensemble Methods - Combines multiple models for better performance
Supported Algorithms:
- Scikit-learn: Random Forest, Gradient Boosting, Logistic Regression, SVM
- XGBoost: Extreme Gradient Boosting
- LightGBM: Light Gradient Boosting Machine
- CatBoost: Categorical Boosting
- And many more...
- Python 3.8 or higher
- 4GB RAM minimum (8GB+ recommended for large datasets)
- Windows, macOS, or Linux
Noventis automatically installs these dependencies:
- Data Processing: pandas, numpy, scipy
- Visualization: matplotlib, seaborn
- Machine Learning: scikit-learn, xgboost, lightgbm, catboost
- AutoML: optuna, flaml, shap
- Feature Engineering: category_encoders, statsmodels
See requirements.txt for complete list.
We welcome contributions from the community! Here's how you can help:
- π Report Bugs - Found a bug? Open an issue
- π‘ Suggest Features - Have ideas? We'd love to hear them!
- π Improve Documentation - Help us make the docs better
- π§ Submit Pull Requests - Fix bugs or add features
# Clone the repository
git clone https://github.com/bccfilkom/noventis.git
cd noventis
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .[dev]
# Run tests
pytest tests/
# Run linting
flake8 noventis/
black noventis/See CONTRIBUTING.md for detailed guidelines.
This project exists thanks to all the people who contribute:
| Contributor | Role |
|---|---|
| Richard | Product Manager |
| Fatoni Murfids | AI Product Manager |
| Ahmad Nafi Mubarok | Lead Data Scientist |
| Orie Abyan Maulana | Lead Data Analyst |
| Grace Wahyuni | Data Analyst |
| Alexander Angelo | Data Scientist |
| Rimba Nevada | Data Scientist |
| Jason Surya Winata | Frontend Engineer |
| Nada Musyaffa Bilhaqi | Product Designer |
A huge thank you to the maintainers of our dependencies:
- pandas, numpy, scikit-learn, and the entire Python scientific computing community
- XGBoost, LightGBM, and CatBoost teams for excellent gradient boosting libraries
- Optuna and FLAML teams for amazing AutoML frameworks
The folder structure of Noventis project:
.
βββ π dataset_for_examples/ # Sample datasets for testing
βββ π docs/ # Documentation files
βββ π examples/ # Example notebooks and scripts
βββ π noventis/ # Main library code
β βββ π __pycache__/
β βββ π asset/ # Asset files (if any)
β βββ π core/ # Core functionality
β βββ π data_cleaner/ # Data cleaning module
β β βββ π __init__.py
β β βββ π auto.py
β β βββ π data_quality.py
β β βββ π encoding.py
β β βββ π imputing.py
β β βββ π orchestrator.py
β β βββ π outlier_handling.py
β β βββ π scaling.py
β βββ π eda_auto/ # EDA automation module
β β βββ π __init__.py
β β βββ π eda_auto.py
β βββ π predictor/ # Prediction module
β β βββ π __init__.py
β β βββ π auto.py
β β βββ π manual.py
β βββ π __init__.py # Main package init
βββ π noventis.egg-info/ # Package metadata
β βββ π dependency_links.txt
β βββ π PKG-INFO
β βββ π SOURCES.txt
β βββ π top_level.txt
βββ π tests/ # Unit tests
βββ π .gitignore # Git ignore rules
βββ π LICENSE # MIT License
βββ π MANIFEST.in # Package manifest
βββ π pyproject.toml # Modern Python packaging config
βββ π README.md # This file
βββ π requirements.txt # Production dependencies
βββ π requirements-dev.txt # Development dependencies
βββ π setup.py # Package setup script- The
noventis/folder contains the main library code - The
tests/folder is dedicated to unit testing and integration testing setup.pyandpyproject.tomlare used for packaging and distributionrequirements.txtlists the external dependencies needed for the project
π With this structure, the project is ready for development, testing, and publishing on PyPI or GitHub.
Problem: ModuleNotFoundError: No module named 'noventis'
# Solution: Reinstall the package
pip uninstall noventis
pip install noventisProblem: Dependencies conflict
# Solution: Create a fresh virtual environment
python -m venv fresh_env
source fresh_env/bin/activate
pip install noventisProblem: Import errors after installation
# Solution: Verify installation
import noventis
print(noventis.__version__)
noventis.print_info() # Check all dependencies- π Documentation
- π GitHub Issues
This project is licensed under the MIT License - see the LICENSE file for details.
Noventis uses several open-source libraries. We are grateful to their maintainers:
- Data Processing: pandas (BSD), numpy (BSD), scipy (BSD)
- Visualization: matplotlib (PSF), seaborn (BSD)
- Machine Learning: scikit-learn (BSD), xgboost (Apache 2.0), lightgbm (MIT), catboost (Apache 2.0)
- AutoML: optuna (MIT), flaml (MIT), shap (MIT)
- Feature Engineering: category_encoders (BSD), statsmodels (BSD)
All dependencies are licensed under permissive open-source licenses (BSD, MIT, Apache 2.0).
If you use Noventis in your research, please cite:
@software{noventis2025,
author = {Noventis Team},
title = {Noventis: Intelligent Automation for Data Analysis},
year = {2025},
url = {https://github.com/bccfilkom/noventis}
}Made with β€οΈ by Noventis Team
If you find Noventis useful, please consider giving it a β on GitHub!
