Project Kandro is a decentralized platform for sharing and monetizing datasets. It features a system for evaluating data quality, enabling users to upload, view, and purchase datasets. The platform also includes a discussion forum for community interaction. The backend handles data quality checks and file uploads, while the frontend provides the user interface for interacting with datasets and smart contracts on a blockchain.
The project is organized into two main directories:
backend/: Contains the Node.js server with Express.js for API endpoints, Python scripts for data quality analysis, and machine learning model for quality scoring.frontend/: Contains the React application built with Vite, interacting with a Solidity smart contract for dataset management on the blockchain.data/: Contains dataset metadata.
- React: JavaScript library for building user interfaces.
- Vite: Fast build tool and development server.
- Tailwind CSS: Utility-first CSS framework.
- Ethers.js/Web3.js: Libraries for interacting with Ethereum smart contracts.
- Solidity: Language for writing smart contracts.
- React Router: For navigation within the React application.
- Axios: For making HTTP requests to the backend.
- Node.js: JavaScript runtime environment.
- Express.js: Web application framework for Node.js.
- Python: For data quality analysis scripts.
- Pandas, NumPy: For data manipulation.
- XGBoost, scikit-learn: For the dataset quality prediction model.
- Joblib: For saving/loading the trained model.
- pyclamd, tika, pyod: For additional data validation and analysis.
- Multer: Middleware for handling
multipart/form-data(file uploads). - Pinata SDK / web3.storage: For decentralized file storage (likely IPFS).
- ClamAV.js: For virus scanning uploads.
- Solidity: Used to write the
DatasetStoragecontract. - OpenZeppelin Contracts: For utility libraries like
Strings.sol.
- Node.js and npm (or yarn)
- Python and pip
- A blockchain development environment (e.g., Hardhat, Ganache) if running smart contracts locally.
- Access to an Ethereum-compatible blockchain and a wallet (e.g., MetaMask) for frontend interaction.
- Navigate to the
backenddirectory:cd backend - Install Node.js dependencies:
npm install
- Install Python dependencies:
pip install -r requirements.txt
- Create a
.envfile in thebackenddirectory and configure necessary environment variables (e.g.,PYTHON_PATH, Pinata API keys). Example:PYTHON_PATH=python # or path to your python executable PINATA_API_KEY=your_pinata_api_key PINATA_SECRET_API_KEY=your_pinata_secret_api_key
- If you haven't trained the quality model yet, run the training script:
This will generate
python train_model.py
dataset_quality_model_xgb.pkl.
- Navigate to the
frontenddirectory:cd frontend - Install Node.js dependencies:
npm install
- Deploy the
DatasetStorage.solsmart contract to your chosen blockchain network. Update the contract address and ABI in the frontend code (likely infrontend/src/Context/DatasetStorageABI.jsxor a similar configuration file). - Ensure your MetaMask or other wallet is configured for the network where the contract is deployed.
- Navigate to the
backenddirectory. - Start the backend server (defaults to port 9000):
This uses
npm start
nodemonto automatically restart the server on file changes.
- Navigate to the
frontenddirectory. - Start the Vite development server (defaults to port 3000):
or
npm run dev
npm start
- Open your browser and go to
http://localhost:3000.
- Decentralized Dataset Storage: Datasets are intended to be stored on decentralized systems like IPFS via Pinata.
- Smart Contract Interaction: Manages dataset metadata and ownership on the blockchain.
- Data Quality Check: Python backend analyzes uploaded CSV files for quality metrics.
- Missing values
- Duplicate rows
- Data type consistency
- Outlier detection
- Malicious/fake data indicators
- Dataset Marketplace: Users can (presumably) list, browse, and acquire datasets.
- User Authentication: Wallet connection (e.g., MetaMask) for interacting with the dApp.
- Discussion Forum: A section for users to discuss topics.
- File Upload: Users can upload CSV datasets.
- Responsive UI: Built with React and Tailwind CSS.
To lint the frontend code:
- Navigate to the
frontenddirectory. - Run the lint command:
npm run lint
- Navigate to the
frontenddirectory. - Run the build command:
This will create a
npm run build
distfolder with the production-ready static assets.