Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/pythonapp.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Python application

on: [push]

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v1
with:
python-version: 3.8
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Lint with flake8
run: |
pip install flake8
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with doctest
run: |
python -m doctest -v app_scripts/*.py sorters/*.py
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,12 @@ dmypy.json

# Pyre type checker
.pyre/

# Pycharm files
.idea
.venv

*.csv
*.jpeg
*.png
*.jpg
40 changes: 38 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,38 @@
# sorting_performance
Try sorting algorithms and evaluate performance by time and number of steps

## Why does this exist?

- trying different sorting techniques to see difference in performance
- understand (if possible) trade-off between heavy computations and number of computations
(a-la would you fight 1 horse sized duck or 100 duck size horses)

## How to run it?

Please run in python 3 for best results. No special packages required.
Some testing/coverage based tests will be written. Those might need a
look see in the `requirements.txt` file

### Sorting performance
Try sorting algorithms and evaluate performance by
- total time to solve
- number of steps taken to solve

### Inspiration

- local supermarket is giving out free football player cards
- each card has a number so that unique orders can be established
- this helps in keeping track of what cards we have and which ones we want to trade
- card numbers range from 1 to 250
- arranging all cards in to packs is a daily chore
- **Pack 1** : unique cards (sorted)
- **Pack 2**: extra copies of some cards in pack 1

Doing this on the dining table,
I realized that both me and my son are using various methods
of sorting! Constantly trying new sorting methods to either
- *speed up the process*
- OR *slow it down and dumb it down so that we can do it while chatting or watching cartoons*

## Credits

- [reddit thread](https://www.reddit.com/r/learnpython/comments/exese6/what_are_some_of_the_projects_i_can_start_working/fg7skxp/)
- [P vs NP problem explanation video in youtube uses a sorting based example](https://youtu.be/EHp4FPyajKQ?t=515)
71 changes: 71 additions & 0 deletions app_scripts/create_check_random_number_list.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import random


def generate_list(
min_number: int = 1, max_number: int = 1000000, count: int = 1000, uniqued_list: bool = True,
) -> list:
"""
This function will create a list of random numbers.
It accepts min number in list, max number in list and count of numbers in list.
Note; if `count` is None then it defaults to 1000

:param min_number: mininum value of single number in list of random numbers
:param max_number: maximum value of single number in list of random numbers
:param count: number of random numbers expected
:param uniqued_list: boolean of whether or not the returned random list is allowed to have duplicate values or not
:return: list of size `count` of random numbers in random order
"""

if count is None:
count = 1000

random_numbers_list = []

while not len(random_numbers_list) >= count:

temp = random.randint(min_number, max_number)

if uniqued_list:
if temp in random_numbers_list:
continue

random_numbers_list.append(temp)

if not check_order(random_numbers_list)["random_bool"]:
# If the generated list is somehow ordered then run the generator
# until randomness found. Useful in test scenarios.
generate_list(min_number, max_number, count, uniqued_list)

return random_numbers_list


def check_order(list_of_numbers: list) -> dict:
"""
Take a list of numbers and returns whether the list
was ordered in ascending manner or not

:param list_of_numbers:
:return: dictionary with 1 key `random_bool`.
- Value True means the list is random.
- Value False means the list is ordered in an ascending manner.

Doctest

>>> check_order([1,2,3])
{'random_bool': False}

>>> check_order([2,2,3])
{'random_bool': False}

>>> check_order([3,2,3])
{'random_bool': True}

"""

state_of_randomness = {"random_bool": False}

for index, num in enumerate(list_of_numbers[:-1]):
if num > list_of_numbers[index + 1]:
state_of_randomness["random_bool"] = True

return state_of_randomness
51 changes: 51 additions & 0 deletions app_scripts/print_scripts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import csv
import logging
import uuid
from pathlib import Path


def print_sort_progress(lowest_number: int, step_count: int, debug: bool):
if debug:
logging.info(f"Lowest number in this round = {lowest_number}. Step_count = {step_count}")


def print_sort_results(
method_name: str,
time_taken_to_sort: float,
step_count: int,
sort_state: bool,
matches_known_solution: bool = None,
help_text: str = "",
create_csv: bool = False,
):
result_f_string = f"{method_name} {help_text} took {time_taken_to_sort} seconds to order in {step_count} steps. Check: Sort status = {sort_state}."

if matches_known_solution is not None:
logging.info(
f"{result_f_string} Accurate (against known solution: {matches_known_solution})"
)
else:
logging.info(result_f_string)

if create_csv:

temp = {
"run_id": uuid.uuid4(),
"method_name": method_name,
"time_taken": time_taken_to_sort,
"step_count": step_count,
"sort_state": sort_state,
}

if Path().joinpath("perf.csv").exists():

with open("perf.csv", "a") as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerow(list(temp.values()))

else:

with open("perf.csv", "w") as csv_file:
csv_writer = csv.writer(csv_file)
csv_writer.writerow(list(temp.keys()))
csv_writer.writerow(list(temp.values()))
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
matplotlib
pandas
167 changes: 167 additions & 0 deletions sort_algorithm_runner.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
import logging

from app_scripts.create_check_random_number_list import generate_list
from sorters.bubble_sort import bubble_sort as bu_s
from sorters.merge_sort import merge_sort as ms
from sorters.native_sort import native_sort as ns
from sorters.quick_sort import quick_sort as qs
from sorters.selection_sort_1 import selection_sort as ss1
from sorters.selection_sort_2 import selection_sort as ss2

debug = False
count = None
# count = 500

logging.basicConfig(
# filename=f"{my_dir}/sort_runner.log",
level=logging.INFO,
format="%(asctime)s %(levelname)s %(message)s",
)

if debug:
unique_random_list = generate_list(max_number=20, count=6, uniqued_list=True)
duplicate_allowed_random_list = generate_list(max_number=20, count=6, uniqued_list=False)
else:
unique_random_list = generate_list(count=count, uniqued_list=True)
duplicate_allowed_random_list = generate_list(count=count, uniqued_list=False)

known_solution_unique_random_list = ss1(
unique_random_list, debug=debug, help_text="for unique numbers"
)
known_solution_duplicate_allowed_random_list = ss2(
duplicate_allowed_random_list, debug=debug, help_text="for non-unique numbers"
)

bu_s(
unique_random_list,
known_solution_unique_random_list,
debug=debug,
help_text="for unique numbers",
)

bu_s(
duplicate_allowed_random_list,
known_solution_duplicate_allowed_random_list,
debug=debug,
help_text="for non-unique numbers",
)

ms(
unique_random_list,
known_solution_unique_random_list,
debug=debug,
help_text="for unique numbers",
)

ms(
duplicate_allowed_random_list,
known_solution_duplicate_allowed_random_list,
debug=debug,
help_text="for non-unique numbers",
)

ns(
unique_random_list,
known_solution_unique_random_list,
debug=debug,
help_text="for unique numbers",
)

ns(
duplicate_allowed_random_list,
known_solution_duplicate_allowed_random_list,
debug=debug,
help_text="for non-unique numbers",
)

qs(
unique_random_list,
known_solution_unique_random_list,
debug=debug,
help_text="for unique numbers",
)

qs(
duplicate_allowed_random_list,
known_solution_duplicate_allowed_random_list,
debug=debug,
help_text="for non-unique numbers",
)


def create_graph():

for i in range(100):
unique_random_list = generate_list(count=count, uniqued_list=True)
known_solution_unique_random_list = ss1(
unique_random_list, debug=debug, help_text="for unique numbers", create_csv=True
)
bu_s(
unique_random_list,
known_solution_unique_random_list,
debug=debug,
help_text="for unique numbers",
create_csv=True,
)
ms(
unique_random_list,
known_solution_unique_random_list,
debug=debug,
help_text="for unique numbers",
create_csv=True,
)
ns(
unique_random_list,
known_solution_unique_random_list,
debug=debug,
help_text="for unique numbers",
create_csv=True,
)
qs(
unique_random_list,
known_solution_unique_random_list,
debug=debug,
help_text="for unique numbers",
create_csv=True,
)


create_graph()

## Experimental
# import pandas as pd
# import matplotlib.pyplot as plt
# all_data_df = pd.read_csv("perf.csv")
# perf_df = all_data_df.loc[:, ["method_name", "time_taken"]]
# steps_df = all_data_df.loc[:, ["method_name", "step_count"]]
#
# fig, (ax1, ax2) = plt.subplots(1, 2)
#
# perf_mer = perf_df[perf_df["method_name"] == "Merge sort"].loc[:,"time_taken"].to_numpy()
# perf_quk = perf_df[perf_df["method_name"] == "Quick sort"].loc[:,"time_taken"].to_numpy()
# perf_bub = perf_df[perf_df["method_name"] == "Bubble sort"].loc[:,"time_taken"].to_numpy()
# perf_nat = perf_df[perf_df["method_name"] == "Native sort"].loc[:,"time_taken"].to_numpy()
# perf_sel = perf_df[perf_df["method_name"] == "Selection sort 1.0"].loc[:,"time_taken"].to_numpy()
# ax1.set_title("Box plot of performance (lower is better)")
# ax1.set_xlabel("sorters")
# ax1.set_ylabel("time in seconds")
# ax1.set_xticklabels(["Merge", "Quick", "Bubble", "Native", "Selection 1.0"])
# ax1.boxplot((perf_mer, perf_bub, perf_nat, perf_sel, perf_quk))
#
#
# steps_mer = steps_df[steps_df["method_name"] == "Merge sort"].loc[:,"step_count"].to_numpy()
# steps_quk = steps_df[steps_df["method_name"] == "Quick sort"].loc[:,"step_count"].to_numpy()
# steps_bub = steps_df[steps_df["method_name"] == "Bubble sort"].loc[:,"step_count"].to_numpy()
# steps_nat = steps_df[steps_df["method_name"] == "Native sort"].loc[:,"step_count"].to_numpy()
# steps_sel = steps_df[steps_df["method_name"] == "Selection sort 1.0"].loc[:,"step_count"].to_numpy()
# ax2.set_title("Box plot of steps (lower is better)")
# ax2.set_xlabel("sorters")
# ax2.set_ylabel("count")
# ax2.set_xticklabels(["Merge", "Quick", "Bubble", "Native", "Selection 1.0"])
# ax2.boxplot((steps_mer, steps_bub, steps_nat, steps_sel, steps_quk))

##fig.savefig("x.jpeg", orientation="landscape", bbox_inches="tight")
## fig.show()

# TODO: Heap Sort https://en.wikipedia.org/wiki/Heapsort
# TODO: Bucket Sort (this is the method that I used to sort cards when I need to sort under distraction. Takes longer because more steps required.) https://en.wikipedia.org/wiki/Bucket_sort
Loading