Keke is a cutting-edge, AI-powered web-based tool designed to revolutionize Excel functionality with advanced data processing, machine learning, and intelligent analysis capabilities. It provides a modern interface for working with Excel files, CSV data, and JSON files, enhanced with artificial intelligence for smarter data insights.
- Natural Language Queries: Ask questions about your data in plain English
- AI Data Analysis: Intelligent insights and pattern recognition
- Smart Data Cleaning: AI-driven recommendations for data quality improvements
- Automated Visualization: AI suggests the best charts and graphs for your data
- Predictive Analytics: Machine learning models for forecasting and classification
- Multi-format Support: Excel (.xlsx, .xls), CSV, JSON, Parquet
- Real-time Collaboration: Multiple users can work on the same dataset simultaneously
- Cloud Integration: Seamless integration with AWS S3, Google Drive, Dropbox
- Batch Processing: Handle multiple files simultaneously
- Advanced Formulas: Excel-like formula engine with AI enhancements
- Scalable Architecture: Kubernetes-ready with auto-scaling
- Comprehensive Monitoring: Prometheus and Grafana integration
- Security: JWT authentication, rate limiting, data encryption
- CI/CD Pipeline: Automated testing and deployment
- Multi-environment Support: Development, staging, and production deployments
- Comprehensive data analysis with statistics and quality metrics
- Pattern detection including outliers and correlations
- Data type analysis and recommendations
- Memory usage optimization insights
- Remove duplicates and empty rows
- Handle missing values with multiple strategies
- Data type conversion and validation
- Column renaming and restructuring
- Create professional charts (Bar, Line, Pie)
- Customizable chart styling and formatting
- Export charts as Excel files
- Multiple chart types support
- Export data in multiple formats: CSV, JSON, Excel, Parquet
- Batch processing for multiple files
- Custom export configurations
- High-performance data serialization
- Excel-like formula evaluation
- Support for common functions (SUM, AVERAGE, COUNT)
- Arithmetic operations between cells
- Custom formula application
- Process multiple files simultaneously
- Parallel processing for efficiency
- Progress tracking and error handling
- Bulk operations support
- Node.js 18+
- Python 3.9+
- Docker (for containerized deployment)
- Kubernetes (for production deployment)
- npm or yarn
-
Clone the repository
git clone https://github.com/keke-team/keke-excel-tool.git cd keke-excel-tool -
Install dependencies
# Install all dependencies automatically python3 run.py install # Or install manually npm install pip install -r requirements.txt
-
Configure environment
# Copy environment template cp env.example .env # Edit .env with your configuration nano .env
-
Start Keke
# Start with all services (databases, monitoring) python3 run.py start --mode full # Or start application only python3 run.py start --mode app-only
# Build Docker image
docker build -t keke-excel-tool .
# Run with Docker Compose
docker-compose up -d
# Check status
docker-compose ps# Deploy to development
./scripts/deploy.sh deploy development
# Deploy to production
./scripts/deploy.sh deploy production
# Check deployment status
./scripts/deploy.sh status
# Rollback if needed
./scripts/deploy.sh rollback-
Install Node.js dependencies
npm install
-
Install Python dependencies
pip install -r requirements.txt
-
Create necessary directories
mkdir -p uploads logs data temp models cache
-
Set up environment variables
export OPENAI_API_KEY="your_openai_api_key" export ANTHROPIC_API_KEY="your_anthropic_api_key" export DATABASE_URL="postgresql://user:pass@localhost:5432/keke"
# Using the comprehensive runner (recommended)
python3 run.py start
# Development mode with auto-reload
npm run dev
# Production mode
npm start
# With specific configuration
python3 run.py start --mode with-db --verboseThe server will start on http://localhost:3000
# Ask questions about your data
curl -X POST http://localhost:3000/api/ai/query/session123/Sheet1 \
-H "Content-Type: application/json" \
-d '{"query": "What insights can you provide about this sales data?"}'
# Get AI-powered cleaning suggestions
curl http://localhost:3000/api/ai/cleaning-suggestions/session123/Sheet1
# Get visualization recommendations
curl http://localhost:3000/api/ai/visualization-suggestions/session123/Sheet1- Open your browser and navigate to
http://localhost:3000 - Upload your Excel, CSV, or JSON file using drag-and-drop or file picker
- Use the available tools to analyze, clean, and process your data
- Export results in your preferred format
POST /api/excel/upload
Content-Type: multipart/form-data
file: [your file]GET /api/excel/{sessionId}/analyze/{sheetName}POST /api/excel/{sessionId}/clean/{sheetName}
Content-Type: application/json
{
"operations": [
{
"type": "remove_duplicates"
},
{
"type": "remove_nulls",
"params": {
"strategy": "drop_rows"
}
}
]
}POST /api/excel/{sessionId}/chart/{sheetName}
Content-Type: application/json
{
"chart_config": {
"type": "bar",
"title": "Sales Data",
"x_column": "Month",
"y_columns": ["Sales", "Profit"]
}
}GET /api/excel/{sessionId}/export/{sheetName}?format=csvPOST /api/excel/{sessionId}/formulas/{sheetName}
Content-Type: application/json
{
"formulas": {
"Total": "=SUM(A:A)",
"Average": "=AVERAGE(B:B)"
}
}POST /api/excel/{sessionId}/predict/{sheetName}
Content-Type: application/json
{
"target_column": "Sales",
"feature_columns": ["Price", "Marketing", "Season"],
"model_type": "auto"
}POST /api/ai/query/{sessionId}/{sheetName}
Content-Type: application/json
{
"query": "What are the main trends in this data?",
"context": {}
}- Excel: .xlsx, .xls
- CSV: .csv (with automatic delimiter detection)
- JSON: .json (arrays of objects or single objects)
- Parquet: .parquet (for high-performance data processing)
{
"type": "remove_duplicates"
}{
"type": "remove_nulls",
"params": {
"strategy": "drop_rows" | "drop_columns" | "fill",
"threshold": 0.5,
"method": "forward" | "backward" | "mean" | "median"
}
}{
"type": "convert_types",
"params": {
"conversions": {
"column_name": "numeric" | "datetime" | "string"
}
}
}{
"type": "rename_columns",
"params": {
"mapping": {
"old_name": "new_name"
}
}
}{
"type": "filter_rows",
"params": {
"condition": "column_name > 100"
}
}bar: Bar chartline: Line chartpie: Pie chart
{
"type": "bar",
"title": "Chart Title",
"x_column": "Category",
"y_columns": ["Value1", "Value2"],
"style": 10
}SUM(range): Sum of values in rangeAVERAGE(range): Average of values in rangeCOUNT(range): Count of non-empty cells in range
A1,B2: Individual cell referencesA:A: Entire column reference1:1: Entire row reference
A1+B1: AdditionA1-B1: SubtractionA1*B1: MultiplicationA1/B1: Division
- Auto Model Selection: Automatically chooses the best model for your data
- Supported Models: Linear Regression, Random Forest, XGBoost, LightGBM
- Cross-validation: Comprehensive model evaluation
- Feature Importance: Analysis of which features matter most
- K-Means Clustering: Group similar data points
- Hierarchical Clustering: Tree-based clustering
- DBSCAN: Density-based clustering
- Auto Parameter Tuning: Automatic selection of optimal parameters
- Statistical Validation: Comprehensive data quality checks
- Custom Rules: Define your own validation rules
- Anomaly Detection: Identify outliers and unusual patterns
- Data Profiling: Detailed analysis of data characteristics
- File Size: Recommended maximum 50MB per file
- Memory Usage: Large files are processed in chunks
- Concurrent Requests: Rate limited to 100 requests per 15 minutes
- Batch Processing: Up to 10 files simultaneously
The API provides comprehensive error handling with:
- Detailed error messages
- HTTP status codes
- Validation error details
- File processing error recovery
# Run all tests
python3 run.py test
# Run Node.js tests
npm test
# Run Python tests
pytestnpm run lintnpm run formatpytestkeke/
├── api/ # API endpoints and server code
│ ├── server.js # Main Express server
│ ├── api_routes.js # API route definitions
│ ├── excel_processor.py # Core Excel processing logic
│ ├── assistant.py # AI assistant functionality
│ ├── ml_processor.py # Machine learning processing
│ ├── security.py # Security utilities
│ ├── cloud_services.py # Cloud storage integration
│ └── public/ # Static web interface
├── k8s/ # Kubernetes deployment manifests
├── monitoring/ # Monitoring configuration
├── scripts/ # Deployment and utility scripts
├── tests/ # Test suites
├── run.py # Main startup script
├── docker-compose.yml # Docker Compose configuration
├── Dockerfile # Docker image definition
├── requirements.txt # Python dependencies
├── package.json # Node.js dependencies
└── env.example # Environment configuration template
KEKE_ENV: Environment (development, staging, production)HOST: Server host (default: localhost)PORT: Server port (default: 3000)DEBUG: Enable debug mode
AI_ENABLED: Enable AI featuresOPENAI_API_KEY: OpenAI API keyANTHROPIC_API_KEY: Anthropic Claude API key
DATABASE_URL: Primary database connection stringREDIS_URL: Redis connection stringMONGODB_URL: MongoDB connection stringPOSTGRES_URL: PostgreSQL connection string
AWS_ACCESS_KEY_ID: AWS access keyAWS_SECRET_ACCESS_KEY: AWS secret keyAWS_REGION: AWS regionAZURE_STORAGE_CONNECTION_STRING: Azure storage connectionGOOGLE_APPLICATION_CREDENTIALS: Google Cloud credentials
JWT_SECRET: JWT signing secretENCRYPTION_KEY: Data encryption keyALLOWED_ORIGINS: CORS allowed originsRATE_LIMIT: API rate limitRATE_WINDOW: Rate limit window
MAX_FILE_SIZE: Maximum file size in bytesWORKER_PROCESSES: Number of worker processesMAX_MEMORY_USAGE: Memory limitCACHE_TTL: Cache time-to-live
FEATURE_AI_ASSISTANT: Enable AI assistantFEATURE_MACHINE_LEARNING: Enable ML featuresFEATURE_CLOUD_STORAGE: Enable cloud storageFEATURE_COLLABORATION: Enable real-time collaborationFEATURE_ADVANCED_ANALYTICS: Enable advanced analytics
- JWT-based authentication
- Role-based access control
- API rate limiting
- CORS configuration
- Encryption at rest and in transit
- Secure secret management
- Input validation and sanitization
- SQL injection prevention
- Non-root containers
- Network policies
- Pod security policies
- Regular security scanning
- Application performance metrics
- Custom business metrics
- Kubernetes cluster metrics
- Database performance metrics
- Application health monitoring
- Performance analytics
- Error tracking and alerting
- Resource utilization
- Structured JSON logging
- Centralized log aggregation
- Error tracking and debugging
- Performance monitoring
- Kubernetes HPA for automatic scaling
- Load balancer configuration
- Multi-replica deployments
- Database connection pooling
- Caching strategies (Redis)
- CDN integration
- Resource optimization
- Database indexing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- v1.0.0: Initial release with core Excel processing features
- v1.1.0: Added chart generation and advanced data cleaning
- v1.2.0: Enhanced formula engine and batch processing
- v1.3.0: AI assistant and machine learning integration