Skip to content

Latest commit

 

History

History
85 lines (64 loc) · 5.77 KB

File metadata and controls

85 lines (64 loc) · 5.77 KB

Radix Sorts

  • Insertion sort $$\Theta(N)$$ for almost sorted array
    • Doesn't violate rule that sorting requires $$\Omega(N \log{N})$$ compares in worst/general case
    • Almost sorted array $$\neq$$ puppy-cat-dog
  • For $$N$$ items → $$N!$$ possible answers to puppy-cat-dog → if asking yes/no questions, given $$Q$$ possible answers need tree w/ $$\geq \lg{Q}$$ questions in worst case to find correct answer
    • Nature of binary questions → binary decision tree
  • Exploits time to "avoid" $$\log{N}$$
  • Actually still $$\Theta(N \log{N})$$, since OS must maintain sorted list of processes
  • Count based sorting doesn't use comparisons
  • If using Map instead of large array of possible values, answer is less clear
  • If sorting destructively/in-place, sorting array originally given → copy back from sorted array to original array
  • If $$R \geq N$$, less useful since wasting time counting things that don't matter (not in array)
  • Want to keep $$R$$, size of alphabet small
  • Only when $$N$$ very large will counting sort be guaranteed to be faster than quicksort
    • In general, even if $$N \geq R$$, need to run empirical timing tests to see which is better
  • Least Significant Digit Radix Sort
  • Generalize counting sort → make useful even if items aren't from alphabet
  • Generalize counting sort to handle Strings/sequences of characters from an alphabet
    • Can have String/sequence of letters/suits/numbers/etc.
  • Start by sorting from LSD (rightmost digit)
    • Items stay in same relative order from sorting based on previous digit → counting sort = stable
    • Don't need to use counting sort when sorting by digit, can technically use any sort
  • MSD sort generally better
  • Pad empty digits w/ dots → dots $$\prec$$ any other character
  • Need to do counting sort on each digit

LSD Radix Sort vs. Comparison Sorting

  • When comparing for merge sort, only need to look at most significant few characters to determine ordering → most characters never even looked at
  • Essentially comparing $$W$$ w/ $$\log{N}$$ → in general expect $$W > \log{N}$$ for this case
  • Each compare for merge sort takes some amount of time
    • Comparison for highly similar strings (particularly in top characters) or all the same takes time proportional to length of strings
    • Comparisons not constant time if taking width of strings into consideration
  • Usually ignore cost of each compare
    • In this case, relevant because LSD sort takes width into consideration → need to also include $$W$$ in merge sort time
  • Comparing long = constant time on modern computer → compares 2 64-bit numbers in one clock cycle
  • ML inference can be done w/ 8-bit arithmetic
  • ML training takes 64-bit arithmetic
  • Can only tell for sure in boundary/extreme cases
  • $$R$$ subproblems → sort each separately
  • Typically implemented recursively
  • Best Case: All top characters unique → only need to go through top character of each to sort (instead of all characters) → $$\Theta(N + R)$$
    • $$N \leq R$$
  • Worst Case: Have to go through every character (every string/sequence equal) → degenerates to LSD sort → $$\Theta(WN + WR)$$
    • Exactly like LSD sort, except w/ recursion