GitHub - NoNesmer/bigdata-assignment

Big Data Assignment

This repository contains materials and code for a Big Data assignment.
Use this README as a template and adapt it to your specific task and technology stack.

Overview

Course / context: Fill in course name or project context here.
Objective: Briefly describe the main goal of the assignment (e.g., process large datasets, build ETL pipeline, run analytics, etc.).
Technologies: List main tools here, e.g. Hadoop, Spark, Kafka, Python, Scala, Jupyter, SQL, etc.

Project Structure

Update this section as you add files and folders.

README.md: Project description, setup and usage instructions.
Add more entries here as your project grows, for example:
- data/ – raw and processed datasets (usually excluded from version control).
- notebooks/ – exploratory analysis or prototype code.
- src/ – main application or job code.
- scripts/ – helper scripts for running jobs or managing data.

Setup

Describe how to prepare the environment needed to run your assignment.
Adjust or replace the items below to match your stack.

Prerequisites
- Install a recent version of Python, Java/Scala, or other required languages.
- Install big-data frameworks you use (e.g. Spark, Hadoop, Kafka) or ensure access to a cluster.
- Install any needed package managers (e.g. pip, conda, maven, sbt, npm).
Environment setup
- If using Python: create a virtual environment and install dependencies from requirements.txt (once it exists).
- If using Scala/Java: run mvn install or sbt compile (when build files are added).
- If using Docker: document which images and docker-compose commands to run.

Data

Document where the data comes from and how to obtain it.

Source: Describe dataset source (provided by instructor, public dataset URL, etc.).
Location: Explain where to put data files in this repository (e.g. data/raw/, data/processed/).
Size / format: Mention approximate size and formats (CSV, Parquet, JSON, etc.).
Privacy / ethics: Note any restrictions or anonymization requirements if applicable.

How to Run

Explain how to execute your jobs, scripts, or notebooks.

Basic steps (example, adjust as needed)
1. Prepare the environment (see Setup).
2. Download or place data into the correct folder.
3. Run the main job or notebook, for example:
  - python src/main.py
  - spark-submit src/job.py
  - spark-submit --class MainClass target/app.jar
  - Open notebooks/analysis.ipynb in Jupyter and run all cells.
Configuration
- Document configuration files or environment variables (e.g. config.yml, .env).
- Describe how to set paths for input/output data, cluster addresses, etc.

Results & Evaluation

Summarize how you evaluate success and where to find results.

Outputs: Describe generated outputs (tables, charts, reports, models, logs, etc.).
Metrics: List key metrics (e.g. runtime, throughput, accuracy, error rates).
Reports: Link to any final report or presentation once available.

Development Notes

Use this section to capture important implementation details or decisions.

Assumptions: List major assumptions about data, infrastructure, or APIs.
Limitations: Note any known limitations or trade-offs.
Future work: Ideas for improvement or extension of the assignment.

License / Academic Integrity

License: Add license information here if required (e.g. MIT, proprietary, none).
Academic integrity: If this is coursework, follow your institution's policies on collaboration and code sharing.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Assignment

Overview

Project Structure

Setup

Data

How to Run

Results & Evaluation

Development Notes

License / Academic Integrity

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Big Data Assignment

Overview

Project Structure

Setup

Data

How to Run

Results & Evaluation

Development Notes

License / Academic Integrity

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages