Skip to content

Latest commit

 

History

History
549 lines (453 loc) · 26.5 KB

File metadata and controls

549 lines (453 loc) · 26.5 KB

Midterm 2 Notes

Interfaces

  • Every method in interface must be public (public by default)
  • Variables will be public static final, no instance variables for interfaces

Abstract Classes

  • abstract keyword for abstract methods
  • Can provide any kind of variables & methods
  • Subclasses can only extend one (abstract) class

Autoboxing

  • Arrays never autoboxed/unboxed (e.g. Integer[] cannot be used in place of int[] & vice versa)
  • Can cast int to Integer

Promotion/Primitive Widening

  • Similar thing happens when moving from primitive type w/ narrower range to wider range
    • Value is promoted
    • double wider than int → can pass int as arg to method that declares double param
  • To move from wider format to narrower format, must use casting

Immutability

  • final helps compiler ensure immutability, not guarantee
    • Neither necessary nor sufficient for immutability
    • Can assign value only once (in constructor of class or initializer)
  • Declaring reference final does not make object referred to by reference immutable
    • public final ArrayDeque<String> d = new ArrayDeque<>();
      • Memory box d not allowed to point at any other ArrayDeque, can't be changed to point at different ArrayDeque
      • Referenced ArrayDeque itself can change

Generic Methods

  • Types inferred from type of object passed in

Type Upper Bounds

  • Can use extends keyword as type upper bound
    • Used as statement of fact, doesn't change definition/behavior of generic method parameter

Checked vs Unchecked Exceptions

  • Any subclass of RuntimeException or Error is unchecked
  • All other Throwables are checked

Iteration

  • Instantiate non-static nested class (inner class) → use instance.new
  • Implement Iterable interface to support enhanced for loop
    • iterator() method must return object that implements Iterator interface

Packages

  • Cannot import/use/access code from default package from within different package

Access Control

  • Access based only on static types

Access Control w/ Inheritance & Packages

  • protected modifier allows package members & subclasses to use class member
  • Package private: no modifier → allows classes from same package, but not subclasses to access member

Access Levels

Modifier Class Package Subclass World
public Y Y Y Y
protected Y Y Y N
no modifier Y Y N N
private Y N N N

Access Control at the Top Level

  • Two levels: public, no modifier (package-private)
    • Can't declare top level class as private/protected
  • No such thing as a sub-package, ug.joshh.Animal & ug.joshh.Plant = 2 completely different packages

.equals()

  • Default implementation of .equals() uses ==
  • JUnit assertEquals uses .equals()
  • .equals() parameter must take Object, cast to actual type w/in .equals() method
  • Generally will need:
    • Reference check
    • null check
    • Class check w/ .getClass()
    • Cast to same type
    • Check fields

Asymptotics

  • If function has runtime $$R(N)$$ with order of growth $$\Theta(N^2)$$:
    • $$R(N) \in \Theta(N^2) \ \forall \text{ inputs}$$
    • Running function on input size of 1000 & input size of 10000 may not be large enough $$N$$ to exhibit quadratic behavior (i.e. takes 100 times longer to run)
      • $$\Theta$$ notation may abstract away potentially very large constant factors that may initially entirely determine the overall runtime for small inputs
  • Use arithmetic/geometric sum formulas
    • Arithmetic:
      • $$S = n \cdot \frac{a_{1} + a_{n}}{2}$$
      • $$S = \frac{n}{2} \cdot [2a_{1} + (n - 1) \cdot d] = n \cdot a_{1} + \frac{(n - 1) \cdot n \cdot d}{2}$$
    • Geometric:
      • $$S = \frac{a_{1} \cdot (1 - r^{n})}{1 - r}$$
      • Infinite ($$r &lt; 1$$): $$S = \frac{a_{1}}{1 - r}$$
  • $$\text{lg}(N) = \log_2(N)$$
  • $$\lceil f(N) \rceil \in \Theta(f(N))$$
  • $$\lfloor f(N) \rfloor \in \Theta(f(N))$$
  • May need to state what $$N$$ if not explicitly in problem
  • If $$f(n) \in O(g(n)), 2^{f(n)} \in O(2^{g(n)})$$ → false b/c actual $$g(n)$$ could be scaled less than $$f(n)$$

Using Potentials for Amortization

  • Amortized Analysis
  • Associate potential $$\Phi_i \geq 0$$ w/ $$i\text{th}$$ operation that tracks "saved up" time from cheap operations for spending on expensive ones, starting from $$\Phi_0 = 0$$
  • Define cost of $$i\text{th}$$ operation as $$a_i = c_i + \Phi_{i + 1} - \Phi_i$$
    • $$a_i = \text{deposit}$$
    • $$c_i = \text{withdrawal/real cost of operation}$$
    • $$\Phi_{i + 1} - \Phi_i = \text{total holdings}$$
  • On cheap operations, pick $$a_i &gt; c_i$$$$\Phi$$
  • On expensive operations, pick $$a_i : \Phi_i &gt; 0$$
  • Goals for ArrayList: $$a_i \in \Theta(1) \text{ and } \Phi_i \geq 0\ \forall\ i$$
    • Requires choosing $$a_i : \Phi_i &gt; c_i$$
    • Cost for operations is 1 for non-powers of 2, & $$2i + 1$$ for powers of 2
    • For high cost ops, need $$\sim 2i + 1$$ in bank, have previous $$\frac{i}{2}$$ operations to reach balance
      • $$\frac{i}{2} \cdot (a_{i} - 1) - 2i \geq 0$$
      • $$a_{i} \geq 5$$
    • ArrayList amortized $$\Theta(1)$$ insertion/removal if utilizing geometric resizing
  • On average, each op takes constant time, arrays = good lists
    • Rigorously show by overestimating constant time of each operation & proving resulting potential never $$&lt; 0$$

Disjoint Sets

Implementation Constructor connect isConnected
QuickFindDS $$\Theta(N)$$ $$\Theta(N)$$ $$\Theta(1)$$
QuickUnionDS $$\Theta(N)$$ $$O(N)$$ $$O(N)$$
WeightedQuickUnionDS $$\Theta(N)$$ $$O(\log{N})$$ $$O(\log{N})$$
  • Tree height = # of edges from root of deepest node
    • Root has depth 0

Weighted Quick Union

  • Modify quick-union to avoid tall trees
    • Track tree size (number of elements), also works similarly for height
    • Always link root of smaller tree to larger tree
  • Max depth of any item is $$\log{N}$$
    • Depth of element x only increases when tree T1 that contains x linked below other tree T2
      • Occurs only when weight(T2) >= weight(T1)
      • Size of resulting tree is at least doubled
      • Depth of element x incremented only when weight(T2) >= weight(T1) & resulting tree at least doubled in size
    • Tree containing x can double in size at most $$\log_{2}{N}$$ times, when starting from just 1 element
      • $$1 \cdot 2^x = N$$
      • $$x = \log_{2}{N}$$
    • Max depth starts at 0 for only 1 element (root) → increments at most $$\log_{2}{N}$$ times, $$\therefore$$ max depth = $$\log_{2}{N}$$

Path Compression

  • When doing isConnected(15, 10), tie all nodes seen to the root
    • Additional cost insignificant (same order of growth)
  • Kinda memoization/dynamic programming?
  • Path compression results in union/connected operations very close to amortized constant time
  • $$M$$ operations on $$N$$ nodes = $$O(N + M \lg^{*}{N})$$
  • $$\log^{*}{n} = \text{iterated/super logarithm}$$
  • Tighter bound: $$O(N + M \alpha(N))$$, where $$\alpha$$ is the inverse Ackermann function



Trees, BSTs

Tree

  • Constraint: Exactly one path between any 2 nodes

BST

  • Consequence of rules = no duplicate keys allowed in BST
  • Random inserts take on average only $$\Theta(\log{N})$$ each
  • Insertion of random data yields bushy BST
    • On random data, order of growth for get/put operations = logarithmic
  • Randomly deleting and inserting from tree changes height from $$\Theta(\log{N})$$ to $$\Theta(\sqrt{N})$$
    • Hibbard deletion results in $$\Theta(\sqrt{N})$$ order of growth
  • Deletion not commutative

TreeMap/TreeSet

  • BST internally
  • Sorted/ordered Map
  • Self-balancing, $$\log{N}$$ height
  • Insertion/removal/searching → $$\log{N}$$

Balanced BSTs

B-Tree

  • B-tree of order $$M = 4$$ also called 2-3-4 tree (or 2-4 tree)
    • # of children node can have, (e.g. 2-3-4 tree node may have 2, 3, or 4 children)
  • B-tree of order $$M = 3$$ also called 2-3 tree

Perfect Balance & Logarithmic Height

  • Max # of splitting operations per insert: $$\sim H$$
    • Time per insert/contains: $$\Theta(H) = \Theta(\log{N})$$

Tree Rotation

  • Preserves search tree property
  • Given arbitrarily unbalanced tree, $$\exists$$ sequence of rotations that will yield balanced tree
  • Balanced search tree = tree $$\propto \log{N}$$ w/in constant factor of 2

Left-Leaning Red Back Tree (LLRB)

  • Every path from root to leaf has same # of black links
    • Imposes balance on LLRB
    • Black edges in LLRB connect 2-3 nodes in 2-3 tree
    • 2-3 tree balanced on black edges → LLRB also balanced on black edges
      • Guaranteed logarithmic performance for insert
  • Walking along red edges analogous to walking through elements of stuffed node in B-tree
  • # of red edges used on any given path from root to bottom of tree constrained
  • At most $$M - 1$$ red edges for every black edge along path
    • Height along any given path in red-black tree at most $$M\log{N}$$
    • $$\forall$$ 2-3 tree (which is balanced), $$\exists$$ corresponding red-black tree that has depth $$\leq 2 \cdot \text{depth of 2-3 tree}$$
  • Searching LLRB tree for key just like BST
    • Red edges only matter in insertions
    • Red edges just like black edges for searching

Maintaining Isometry Through Rotations

  • $$\exists$$ isometry between 2-3 tree & LLRB
  • Implementation of LLRB based on maintaining isometry
  • When performing LLRB operations, pretend as if 2-3 tree
  • Preservation of isometry involves tree rotations

Preserving Isometry After Addition/Insertion Operations

  • Violations for 2-3 trees:
    • Existence of 4-nodes
  • Operations for fixing 2-3 tree violations:
    • Splitting 4-node
  • Violations for LLRBs:
    • 2 red children
    • 2 consecutive red links
    • Right red child (wrong representation)
  • Operations for fixing LLRB tree violations:
    • Tree rotations & color flips

Summary

  • 2-3 & 2-3-4 trees have perfect balance
    • Height guaranteed logarithmic
    • After insert/delete → at most 1 split operation per level of tree
      • Height logarithmic → $$O(\log{N})$$ splits
      • insert/delete $$O(\log{N})$$
    • Hard to implement
  • LLRBs mimic 2-3 tree behavior using color flipping & tree rotation
    • Height guaranteed logarithmic
    • After insert/delete → at most 1 color flip or rotation per level of tree
      • Height logarithmic → $$O(\log{N})$$ flips/rotations
      • insert/delete $$O(\log{N})$$
    • Easier to implement, constant factor faster than 2-3 or 2-3-4 tree

Hashing

Hash Tables

  • Never store mutable objects in HashSet or HashMap
  • Never override equals w/out also overriding hashCode

Hash Functions

  • Computing hash function consists of 2 steps:
    1. Compute hashCode (integer between $$-2^{31}$$ & $$2^{31} - 1$$
    2. Computing index = hashCode $$\mod M$$

Default hashCodes()

  • All Objects have hashCode() function
  • Default: returns this (address of object)

Negative .hashCodes in Java

  • In Java, -1 % 4 == -1 → use Math.floorMod instead
  • When finding bucket to insert into, use bucketNum = (o.hashCode() & 0x7FFFFFFF) % M
    • M = # of buckets
    • Can flip negative bit or use Math.floorMod

Good hashCodes()

  • Wide variety/range of objects ("random")
  • Deterministic
  • If 2 objects equal by .equals() → must have same hashCode()
  • Efficiently computed
  • Doesn't use mutable fields

.equals()

@Override
public boolean equals(Object other) {
    if (this == o) {
        return true;
    }

    if (o == null || getClass() != o.getClass()) {
        return false;
    }

    SimpleOomage that = (SimpleOomage) o;

    return (red == that.red) && (green == that.green) && (blue == that.blue); // check equality of fields
}

Summary

  • W/ good hashCode() & resizing, operations are $$\Theta(1)$$ amortized
  • Store & retrieval does not require items to be Comparable (unlike (balanced) BST)
Data Structure contains(x) insert(x)
Linked List $$\Theta(N)$$ $$\Theta(N)$$
Bushy BSTs (used by TreeSet) $$\Theta(\log{N})$$ $$\Theta(\log{N})$$
Unordered Array $$\Theta(N)$$ $$\Theta(N)$$
Hash Table (used by HashSet) $$\Theta(1)$$ $$\Theta(1)$$

Priority Queue Interface

/** (Min) Priority Queue: Allowing tracking & removal of smallest item in priority queue */
public interface MinPQ<Item> {
    public void add(Item x);
    public Item getSmallest();
    public Item removeSmallest();
    public int size();
}
  • Only allows interaction w/ smallest item at any given time → better performance than List/Set
  • Useful for keeping track of "smallest", "largest", "best", etc. seen so far

Heaps

  • Binary min-heap: Binary tree that is complete & obeys min-heap property
    • Min-heap property: Every node $$\leq$$ both its children
    • Complete: Missing items only at bottom level (if any), all nodes as far left as possible
  • Add to end of heap temporarily
  • Swim up hierarchy
  • Swap last item in heap into root
  • Sink down hierarchy (promote "better" successor)

Tree Representation

  • Store keys in array, offset everything by 1 spot
  • Leave spot 0 empty
  • Makes computation of children/parents "nicer"
    • leftChild(k) = k * 2
    • rightChild(k) = k * 2 + 1
    • parent(k) = k / 2

BST → Heap

  • Swim (heapify up) all nodes in level order (analogous to adding element to heap)
  • Sink all nodes in reverse level order (analogous to removing element from heap)

Heap Implementation of a Priority Queue

Operation Ordered Array Bushy BST (items w/ same priority hard to handle) Hash Table Heap
add $$\Theta(N)$$ $$\Theta(\log{N})$$ $$\Theta(1)$$ $$\Theta(\log{N})$$
getSmallest $$\Theta(1)$$ $$\Theta(\log{N})$$ $$\Theta(N)$$ $$\Theta(1)$$
removeSmallest $$\Theta(N)$$ $$\Theta(\log{N})$$ $$\Theta(N)$$ $$\Theta(\log{N})$$
  • Position in tree/heap = priority
  • Heap is $$\log{N}$$ time amortized (resize backing array)
    • Worst case inserting $$N$$ items:
      • $$\log{1} + \log{2} + \ldots + \log{N} + 2 + 4 + 8 + 16 + \ldots + 2N = \log{N!} + 4N - 4 = N \log{N} + 4N - 4$$
      • Averaged over $$N$$ items → $$\log{N} + 4 - \frac{4}{N}$$$$\Theta(\log{N})$$
  • BST can have constant getSmallest by keeping pointer to smallest
  • Heaps handle duplicate priorities much more naturally than BSTs
  • Array based heaps take less memory

Data Structures Summary

Data Structure Storage Operation(s) Primary Retrieval Operation Retrieve By:
List add(key)
insert(key, index)
get(index) index
Map put(key, value) get(key) key identity
Set add(key) containsKey(key) key identity
PQ add(key) getSmallest() key order/size
Disjoint Sets conenct(int1, int2) isConnected(int1, int2) 2 int values

Traversals

Tree Traversal

Range Finding

  • Instead of traversing entire tree, only want to look at items in certain range

Pruning & findInRange Runtime

  • Pruning: Restrict search to only nodes containing path(s) to answer(s)
  • Runtime

Spatial Trees

  • Generalization of BST
  • Think of items as oriented in particular 2D direction (e.g NW/NE/SW/SE instead of binary </>)
  • Can have mutliple different quadtrees for same set of data
    • Just like BST, insertion order determines topology of QuadTree
  • If on boundary line, usually assume = → >
  • Pruning
    • Analogous to binary search on BST (eliminate unnecessary search spaces)
    • Ignore branches if no possible way branch could contain wanted item
  • Directed: Edges have notion of direction (one-way)
    • Acyclic: No cycles (lead back to start)
    • Cyclic: $$\exists \geq 1 \text{ cycle}$$
  • Undirected: No notion of directionality, can traverse either way (bidirectional?)
    • Acyclic: No cycles (lead back to start) w/out reusing any edges
    • Cyclic: $$\exists \geq 1 \text{ path that leads to start w/out reusing any edges}$$
  • Any graph w/ cycle = cyclic
    • If not → acyclic
  • Terminology
  • Graph Processing Problems

Graph Representations

  • Implementation
  • Properties
    • Guaranteed to reach every reachable node
    • Runs in $$O(V + E)$$ time
      • Every edge used at most once, total # of vertex considerations = # of edges
        • # of times need to consider vertex $$&lt;=$$ # of edges incident on it
        • May be faster for some problems which quit early on some stopping condition (e.g. connectivity)
  • Postorder = return order
  • Runtime $$\Theta(V + E)$$
    • Each vertex visited exactly once, each visit = constant time
    • Each edge considered once
  • Space $$\Theta(V)$$
    • Recursive call stack depth $$\leq V$$
    • Space of recursive algorithm = depth of call stack
  • Level-order/BFS for vertices at same distance from source can be in any permutation
  • If $$\exists$$ multiple paths to a vertex, BFS always visits based on closest path
  • Traversals & Graph Problems
    • Detect cycle by running DFS → $$\exists$$ cycle if node already on active call stack encountered, runtime $$\Theta(E + V)$$
  • Indegree 0 vertices → $$\Theta(E + V)$$, have to look at $$V$$ lists w/ total length $$E$$
  • Only works if graph is acyclic → $$\varnothing$$ if cyclic b/c can't have circular dependencies
    • No indegree 0 vertices → cyclic → $$\varnothing$$
  • Topological ordering only possible iff graph is directed acyclic graph (no directed cycles)
  • Reverse postorder
  • Linearizes graph
  • Graph Problems

Implementation

public class DepthFirstOrder {

    private boolean[] marked;
    private Stack<Integer> reversePostorder; // using stack analogous to reversing list b/c LIFO

    public DepthFirstOrder(Digraph G) {
        reversePostorder = new Stack<>();
        marked = new boolean[G.V()];
        for (int v = 0; v < G.V(); v++) {
            if (!marked[v]) {
                dfs(G, v);
            }
        }
    }

    private void dfs(Digraph G, int v) {
        marked[v] = true;
        for (int w : G.adj(v)) {
            if (!marked[w]) {
                dfs(G, w);
            }
        }
        reversePostorder.push(v); // after each DFS is done, "visit" vertex by pushing on stack
    }
}
  • Works even when starting from vertices not indegree 0

Regular Expressions

Regular Expressions in Java

  • Default String method matches matches entire String, but not substrings
  • group(0) = first match of entire pattern, entire match
    • group(i), i > 0 = parenthesized groupings

Java

Maps

  • Map returns null when calling get on key not in Map
  • Map can have keys w/ null values