Skip to content

Latest commit

 

History

History
71 lines (56 loc) · 5.38 KB

File metadata and controls

71 lines (56 loc) · 5.38 KB

Overview, Tries

  • Search problem → data structures used = trees & hash tables
  • Sorting problem:
    • Insertion sort into BST = partition sort (quicksort)
    • Selection sort using heap = heapsort
    • Digit-by-digit sorting algorithms:
      • Can use other sorting algorithm subroutines when sorting digit-by-digit, instead of only just counting sort subroutine
  • Searching via trees (comparisons) uses compareTo(), analogous to comparison-based sorting
    • Search for things by ordering using compareTo()
  • Searching via hash tables uses hashCode() & equals() analogous to counting/integer sorting
    • "Cheat" not comparing by exploiting space to save time & avoid comparisons
  • Digit-by-digit search = Trie, analogous to radix sorts
  • Runtime independent of $$N$$, # of keys
  • Best case runtime can correspond to failures instead of just successes (e.g. Trie contains())
  • Hash table searching includes $$L$$ b/c always need to compute hash code (need to hash every digit) & for comparison
  • If need to search through each child to determine where to traverse down, $$R$$, alphabet size will be included in runtime → $$\Theta(RL)$$ worst case search/insert
  • $$L$$ factor included for BST b/c Trie performance based solely on $$L$$ → need to account for $$L$$ for BST as well
    • Need to do $$L$$ characters of comparison in worst case (need to compare every character to determine if match or not)
      • Similar to merge sort vs LSD sort comparison where LSD sort takes $$W$$ into consideration → also need to include $$W$$ in merge sort time
    • Best case $$\Theta(L)$$ = match at root → still need to compare every character to determine match
  • $$L$$ factor also included for Hash Table b/c hash code computation may take into account every character
  • Prefix matching much slower for hash tables (have to go through everything, b/c unordered) and BSTs (may be able to prune some branches b/c of ordering)
    • Very fast for tries
  • $$R$$ = alphabet size
  • Use int value of char to index into Node[]
  • Root exists always false
  • Extremely space hungry b/c not all characters for a given Node may be even used
  • If only care about top option, PriorityQueue fine
  • If want all results in ranked order → TreeSet
  • Support character-based operations → prefix searching/matching/approximating/etc.
  • Don't use arrays for sparse maps
  • Arrays better for compact lists (e.g. heaps)
  • May slow down link traversal, but uses much less memory
  • Can also use Hash Table for links
  • Each node has small # of fixed characters/links
  • Only use all three links when traversing through trie, otherwise when inserting new element, insert down from middle link after traversing to new node
  • Can have bad TST compared to regular trie when inserting $$N$$ unique single characters in increasing order → height for trie = 1, height for TST = $$N$$
    • Occurs similarly to how regularly unbalanced BSTs can degenerate into linked lists when inserting items in increasing/decreasing order → always goes to right/left of leaf
    • Can have balancing TSTs to account for issue
  • R-way Array Trie = LSD sort in disguise
  • Ternary Search Trie has corresponding sort → 3-way radix quicksort