A Django REST Framework API for managing and accessing data lake resources with role-based access control and comprehensive logging.
- Role-based Access Control: Secure access to data lake resources based on user permissions
- Data Lake Integration: Direct access to CSV, JSON, and Parquet files in organized data lake structure
- Advanced Data Filtering: Filter transaction data by country, status, amount, rating, and more
- Pagination Support: Efficient handling of large datasets with configurable page sizes
- Field Projection: Select specific columns to reduce data transfer and improve performance
- JWT Authentication: Secure token-based authentication using Simple JWT
- API Documentation: Auto-generated OpenAPI/Swagger documentation with drf-spectacular
- Access Logging: Comprehensive logging of all API access with middleware
- Resource Management: Admin interface for managing resources and access rules
- Data Preview: Quick preview functionality for data lake files
njango_drf/
βββ api_project/ # Django project
β βββ api_project/ # Main project settings
β β βββ settings.py # Django configuration
β β βββ urls.py # Main URL routing
β β βββ wsgi.py # WSGI configuration
β βββ core/ # Core application
β β βββ models.py # Data models (Customer, Resource, AccessRule, AccessLog)
β β βββ views.py # API views and endpoints
β β βββ serializers.py # DRF serializers
β β βββ permissions.py # Custom permission classes
β β βββ middleware.py # Access logging middleware
β β βββ admin.py # Django admin configuration
β β βββ urls.py # App URL routing
β βββ manage.py # Django management script
βββ data_lake/ # Data lake directory
β βββ transactions_flat/ # Transaction data (organized by date)
β βββ AMOUNT_5MIN_PER_TYPE/ # Aggregated transaction amounts
β βββ STATUS_PAR_TRANSACTION/ # Transaction status data
β βββ ... # Other data lake resources
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignore rules
βββ README.md # This file
Before you begin, ensure you have the following installed:
- Python 3.8+
- PostgreSQL 12+
- Git
git clone <your-repository-url>
cd njango_drf# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activatepip install -r requirements.txt- Windows: Download from PostgreSQL official website
- macOS:
brew install postgresql - Ubuntu/Debian:
sudo apt-get install postgresql postgresql-contrib
-- Connect to PostgreSQL as superuser
psql -U postgres
-- Create database
CREATE DATABASE api_project;
-- Create user (optional, you can use existing user)
CREATE USER api_user WITH PASSWORD 'your_password';
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE api_project TO api_user;Edit api_project/api_project/settings.py:
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql",
"NAME": "api_project",
"USER": "your_username", # Change this
"PASSWORD": "your_password", # Change this
"HOST": "localhost",
"PORT": "5432", # Default PostgreSQL port
}
}cd api_project
python manage.py migratepython manage.py createsuperuser# Load sample data lake resources
python manage.py shellIn the Django shell:
from core.models import Resource
# Add sample resources (adjust paths to your data_lake directory)
resources = [
Resource(name="transactions_flat", path="../data_lake/transactions_flat", kind="folder"),
Resource(name="AMOUNT_5MIN_PER_TYPE", path="../data_lake/AMOUNT_5MIN_PER_TYPE", kind="folder"),
Resource(name="STATUS_PAR_TRANSACTION", path="../data_lake/STATUS_PAR_TRANSACTION", kind="folder"),
# Add more resources as needed
]
for resource in resources:
resource.save()cd api_project
python manage.py runserverThe API will be available at:
- API Base: http://127.0.0.1:8000/api
- Admin Interface: http://127.0.0.1:8000/admin/
- API Documentation: http://127.0.0.1:8000/api/docs/#/
POST /api/token/- Obtain JWT access tokenPOST /api/token/refresh/- Refresh JWT token
GET /api/resources/- List all resources (Admin only)POST /api/resources/- Create new resource (Admin only)GET /api/resources/{id}/- Get resource details (Admin only)PUT /api/resources/{id}/- Update resource (Admin only)DELETE /api/resources/{id}/- Delete resource (Admin only)
GET /api/access-rules/- List all access rules (Admin only)POST /api/access-rules/- Create access rule (Admin only)GET /api/access-rules/{id}/- Get access rule details (Admin only)PUT /api/access-rules/{id}/- Update access rule (Admin only)DELETE /api/access-rules/{id}/- Delete access rule (Admin only)
GET /api/transactions-flat/- Access transactions_flat data with advanced filtering, pagination, and field projection (Requires read permission)- Query Parameters:
page(int): Page number (default: 1)page_size(int): Rows per page (default: 10)country(str): Filter by countrystatus(str): Filter by transaction statuscategory(str): Filter by product categorymethod(str): Filter by payment methodamount_gt(float): Amount greater thanamount_lt(float): Amount less thanrating_gt(int): Customer rating greater thanrating_lt(int): Customer rating less thanfields(str): Comma-separated list of columns to return
- Query Parameters:
GET /api/customers/- List customersPOST /api/customers/- Create customerGET /api/customers/{id}/- Get customer detailsPUT /api/customers/{id}/- Update customerDELETE /api/customers/{id}/- Delete customer
The API uses JWT tokens for authentication. To access protected endpoints:
- Obtain a token:
curl -X POST http://127.0.0.1:8000/api/token/ \
-H "Content-Type: application/json" \
-d '{"username": "your_username", "password": "your_password"}'- Use the token in subsequent requests:
curl -X GET http://127.0.0.1:8000/api/resources/ \
-H "Authorization: Bearer your_access_token"- Admin Users: Full access to all endpoints
- Regular Users: Access based on AccessRule permissions
- Resource Access: Users can only access resources they have been granted access to
The API provides access to various data lake resources:
- Transaction Data: Raw and processed transaction data
- Aggregated Data: Time-based aggregations (5-minute intervals)
- Status Data: Transaction status tracking
- Anonymized Data: Privacy-compliant transaction data
- CSV: Comma-separated values
- JSONL: JSON Lines format
- Parquet: Columnar storage format
Run the test suite:
# Run all tests
python manage.py test
# Run with coverage
pytest --cov=coreFor production deployment, update settings.py:
DEBUG = False
ALLOWED_HOSTS = ['your-domain.com']
# Use environment variables for sensitive data
import os
SECRET_KEY = os.environ.get('SECRET_KEY')
DATABASES['default']['PASSWORD'] = os.environ.get('DB_PASSWORD')Create a .env file for production:
SECRET_KEY=your-secret-key
DEBUG=False
DB_PASSWORD=your-db-password
ALLOWED_HOSTS=your-domain.comThe project uses Black for code formatting and flake8 for linting:
# Format code
black .
# Check linting
flake8 .
# Sort imports
isort .- Create a new view in
core/views.pyfollowing the pattern ofTransactionsFlatView - Add the resource to the database via admin interface
- Grant appropriate access permissions to users
- Update API documentation
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you encounter any issues or have questions:
- Check the API Documentation
- Review the Django logs for error messages
- Ensure database connectivity and permissions
- Verify data lake file paths and permissions
-
v1.1.0 - Enhanced data access with filtering and pagination
- Advanced filtering for transaction data (country, status, amount, rating)
- Pagination support with configurable page size
- Field projection to select specific columns
- Enhanced API documentation with OpenAPI schema
- Extended JWT token lifetime configuration
-
v1.0.0 - Initial release with core functionality
- JWT authentication
- Resource-based access control
- Data lake integration
- API documentation
Happy coding! π