Midterm 2 Notes

Interfaces

Every method in interface must be public (public by default)
Variables will be public static final, no instance variables for interfaces

Abstract Classes

abstract keyword for abstract methods
Can provide any kind of variables & methods
Subclasses can only extend one (abstract) class

Autoboxing

Arrays never autoboxed/unboxed (e.g. Integer[] cannot be used in place of int[] & vice versa)
Can cast int to Integer

Promotion/Primitive Widening

Similar thing happens when moving from primitive type w/ narrower range to wider range
- Value is promoted
- double wider than int → can pass int as arg to method that declares double param
To move from wider format to narrower format, must use casting

Immutability

final helps compiler ensure immutability, not guarantee
- Neither necessary nor sufficient for immutability
- Can assign value only once (in constructor of class or initializer)
Declaring reference final does not make object referred to by reference immutable
- public final ArrayDeque<String> d = new ArrayDeque<>();
  - Memory box d not allowed to point at any other ArrayDeque, can't be changed to point at different ArrayDeque
  - Referenced ArrayDeque itself can change

Generic Methods

Types inferred from type of object passed in

Type Upper Bounds

Can use extends keyword as type upper bound
- Used as statement of fact, doesn't change definition/behavior of generic method parameter

Checked vs Unchecked Exceptions

Any subclass of RuntimeException or Error is unchecked
All other Throwables are checked

Iteration

Instantiate non-static nested class (inner class) → use instance.new
Implement Iterable interface to support enhanced for loop
- iterator() method must return object that implements Iterator interface

Packages

Cannot import/use/access code from default package from within different package

Access Control

Access based only on static types

Access Control w/ Inheritance & Packages

protected modifier allows package members & subclasses to use class member
Package private: no modifier → allows classes from same package, but not subclasses to access member

Access Levels

Modifier	Class	Package	Subclass	World
`public`	Y	Y	Y	Y
`protected`	Y	Y	Y	N
no modifier	Y	Y	N	N
`private`	Y	N	N	N

Access Control at the Top Level

Two levels: public, no modifier (package-private)
- Can't declare top level class as private/protected
No such thing as a sub-package, ug.joshh.Animal & ug.joshh.Plant = 2 completely different packages

`.equals()`

Default implementation of .equals() uses ==
JUnit assertEquals uses .equals()
.equals() parameter must take Object, cast to actual type w/in .equals() method
Generally will need:
- Reference check
- null check
- Class check w/ .getClass()
- Cast to same type
- Check fields

Asymptotics

If function has runtime $$R(N)$$ with order of growth $$\Theta(N^2)$$:
- $$R(N) \in \Theta(N^2) \ \forall \text{ inputs}$$
- Running function on input size of 1000 & input size of 10000 may not be large enough $$N$$ to exhibit quadratic behavior (i.e. takes 100 times longer to run)
  - $$\Theta$$ notation may abstract away potentially very large constant factors that may initially entirely determine the overall runtime for small inputs
Use arithmetic/geometric sum formulas
- Arithmetic:
  - $$S = n \cdot \frac{a_{1} + a_{n}}{2}$$
  - $$S = \frac{n}{2} \cdot [2a_{1} + (n - 1) \cdot d] = n \cdot a_{1} + \frac{(n - 1) \cdot n \cdot d}{2}$$
- Geometric:
  - $$S = \frac{a_{1} \cdot (1 - r^{n})}{1 - r}$$
  - Infinite ($$r < 1$$): $$S = \frac{a_{1}}{1 - r}$$
$$\text{lg}(N) = \log_2(N)$$
$$\lceil f(N) \rceil \in \Theta(f(N))$$
$$\lfloor f(N) \rfloor \in \Theta(f(N))$$
May need to state what $$N$$ if not explicitly in problem
If $$f(n) \in O(g(n)), 2^{f(n)} \in O(2^{g(n)})$$ → false b/c actual $$g(n)$$ could be scaled less than $$f(n)$$

Using Potentials for Amortization

Amortized Analysis
Associate potential $$\Phi_i \geq 0$$ w/ $$i\text{th}$$ operation that tracks "saved up" time from cheap operations for spending on expensive ones, starting from $$\Phi_0 = 0$$
Define cost of $$i\text{th}$$ operation as $$a_i = c_i + \Phi_{i + 1} - \Phi_i$$
- $$a_i = \text{deposit}$$
- $$c_i = \text{withdrawal/real cost of operation}$$
- $$\Phi_{i + 1} - \Phi_i = \text{total holdings}$$
On cheap operations, pick $$a_i > c_i$$ → $$\Phi$$ ↑
On expensive operations, pick $$a_i : \Phi_i > 0$$
Goals for ArrayList: $$a_i \in \Theta(1) \text{ and } \Phi_i \geq 0\ \forall\ i$$
- Requires choosing $$a_i : \Phi_i > c_i$$
- Cost for operations is 1 for non-powers of 2, & $$2i + 1$$ for powers of 2
- For high cost ops, need $$\sim 2i + 1$$ in bank, have previous $$\frac{i}{2}$$ operations to reach balance
  - $$\frac{i}{2} \cdot (a_{i} - 1) - 2i \geq 0$$
  - $$a_{i} \geq 5$$
- ArrayList amortized $$\Theta(1)$$ insertion/removal if utilizing geometric resizing
On average, each op takes constant time, arrays = good lists
- Rigorously show by overestimating constant time of each operation & proving resulting potential never $$< 0$$

Disjoint Sets

Implementation	Constructor	`connect`	`isConnected`
`QuickFindDS`	$$\Theta(N)$$	$$\Theta(N)$$	$$\Theta(1)$$
`QuickUnionDS`	$$\Theta(N)$$	$$O(N)$$	$$O(N)$$
`WeightedQuickUnionDS`	$$\Theta(N)$$	$$O(\log{N})$$	$$O(\log{N})$$

Tree height = # of edges from root of deepest node
- Root has depth 0

Weighted Quick Union

Modify quick-union to avoid tall trees
- Track tree size (number of elements), also works similarly for height
- Always link root of smaller tree to larger tree
Max depth of any item is $$\log{N}$$
- Depth of element x only increases when tree T1 that contains x linked below other tree T2
  - Occurs only when weight(T2) >= weight(T1)
  - Size of resulting tree is at least doubled
  - Depth of element x incremented only when weight(T2) >= weight(T1) & resulting tree at least doubled in size
- Tree containing x can double in size at most $$\log_{2}{N}$$ times, when starting from just 1 element
  - $$1 \cdot 2^x = N$$
  - $$x = \log_{2}{N}$$
- Max depth starts at 0 for only 1 element (root) → increments at most $$\log_{2}{N}$$ times, $$\therefore$$ max depth = $$\log_{2}{N}$$

Path Compression

When doing isConnected(15, 10), tie all nodes seen to the root
- Additional cost insignificant (same order of growth)
Kinda memoization/dynamic programming?
Path compression results in union/connected operations very close to amortized constant time
$$M$$ operations on $$N$$ nodes = $$O(N + M \lg^{*}{N})$$
$$\log^{*}{n} = \text{iterated/super logarithm}$$
- Inverse of super exponentiation
Tighter bound: $$O(N + M \alpha(N))$$, where $$\alpha$$ is the inverse Ackermann function

Trees, BSTs

Tree

Constraint: Exactly one path between any 2 nodes

BST

Consequence of rules = no duplicate keys allowed in BST
Random inserts take on average only $$\Theta(\log{N})$$ each
Insertion of random data yields bushy BST
- On random data, order of growth for get/put operations = logarithmic
Randomly deleting and inserting from tree changes height from $$\Theta(\log{N})$$ to $$\Theta(\sqrt{N})$$
- Hibbard deletion results in $$\Theta(\sqrt{N})$$ order of growth
Deletion not commutative

`TreeMap`/`TreeSet`

BST internally
Sorted/ordered Map
Self-balancing, $$\log{N}$$ height
Insertion/removal/searching → $$\log{N}$$

Balanced BSTs

B-Tree

B-tree of order $$M = 4$$ also called 2-3-4 tree (or 2-4 tree)
- # of children node can have, (e.g. 2-3-4 tree node may have 2, 3, or 4 children)
B-tree of order $$M = 3$$ also called 2-3 tree

Perfect Balance & Logarithmic Height

Max # of splitting operations per insert: $$\sim H$$
- Time per insert/contains: $$\Theta(H) = \Theta(\log{N})$$

Tree Rotation

Preserves search tree property
Given arbitrarily unbalanced tree, $$\exists$$ sequence of rotations that will yield balanced tree
Balanced search tree = tree $$\propto \log{N}$$ w/in constant factor of 2

Left-Leaning Red Back Tree (LLRB)

Every path from root to leaf has same # of black links
- Imposes balance on LLRB
- Black edges in LLRB connect 2-3 nodes in 2-3 tree
- 2-3 tree balanced on black edges → LLRB also balanced on black edges
  - Guaranteed logarithmic performance for insert
Walking along red edges analogous to walking through elements of stuffed node in B-tree
# of red edges used on any given path from root to bottom of tree constrained
At most $$M - 1$$ red edges for every black edge along path
- Height along any given path in red-black tree at most $$M\log{N}$$
- $$\forall$$ 2-3 tree (which is balanced), $$\exists$$ corresponding red-black tree that has depth $$\leq 2 \cdot \text{depth of 2-3 tree}$$
Searching LLRB tree for key just like BST
- Red edges only matter in insertions
- Red edges just like black edges for searching

Maintaining Isometry Through Rotations

$$\exists$$ isometry between 2-3 tree & LLRB
Implementation of LLRB based on maintaining isometry
When performing LLRB operations, pretend as if 2-3 tree
Preservation of isometry involves tree rotations

Preserving Isometry After Addition/Insertion Operations

Violations for 2-3 trees:
- Existence of 4-nodes
Operations for fixing 2-3 tree violations:
- Splitting 4-node
Violations for LLRBs:
- 2 red children
- 2 consecutive red links
- Right red child (wrong representation)
Operations for fixing LLRB tree violations:
- Tree rotations & color flips

Summary

2-3 & 2-3-4 trees have perfect balance
- Height guaranteed logarithmic
- After insert/delete → at most 1 split operation per level of tree
  - Height logarithmic → $$O(\log{N})$$ splits
  - insert/delete $$O(\log{N})$$
- Hard to implement
LLRBs mimic 2-3 tree behavior using color flipping & tree rotation
- Height guaranteed logarithmic
- After insert/delete → at most 1 color flip or rotation per level of tree
  - Height logarithmic → $$O(\log{N})$$ flips/rotations
  - insert/delete $$O(\log{N})$$
- Easier to implement, constant factor faster than 2-3 or 2-3-4 tree

Hashing

Hash Tables

Never store mutable objects in HashSet or HashMap
Never override equals w/out also overriding hashCode

Hash Functions

Computing hash function consists of 2 steps:
1. Compute hashCode (integer between $$-2^{31}$$ & $$2^{31} - 1$$
2. Computing index = hashCode $$\mod M$$

Default `hashCodes()`

All Objects have hashCode() function
Default: returns this (address of object)

Negative `.hashCode`s in Java

In Java, -1 % 4 == -1 → use Math.floorMod instead
When finding bucket to insert into, use bucketNum = (o.hashCode() & 0x7FFFFFFF) % M
- M = # of buckets
- Can flip negative bit or use Math.floorMod

Good `hashCodes()`

Wide variety/range of objects ("random")
Deterministic
If 2 objects equal by .equals() → must have same hashCode()
Efficiently computed
Doesn't use mutable fields

`.equals()`

@Override
public boolean equals(Object other) {
    if (this == o) {
        return true;
    }

    if (o == null || getClass() != o.getClass()) {
        return false;
    }

    SimpleOomage that = (SimpleOomage) o;

    return (red == that.red) && (green == that.green) && (blue == that.blue); // check equality of fields
}

Summary

W/ good hashCode() & resizing, operations are $$\Theta(1)$$ amortized
Store & retrieval does not require items to be Comparable (unlike (balanced) BST)

Data Structure	`contains(x)`	`insert(x)`
Linked List	$$\Theta(N)$$	$$\Theta(N)$$
Bushy BSTs (used by `TreeSet`)	$$\Theta(\log{N})$$	$$\Theta(\log{N})$$
Unordered Array	$$\Theta(N)$$	$$\Theta(N)$$
Hash Table (used by `HashSet`)	$$\Theta(1)$$	$$\Theta(1)$$

Priority Queues & Heaps

Priority Queue Interface

/** (Min) Priority Queue: Allowing tracking & removal of smallest item in priority queue */
public interface MinPQ<Item> {
    public void add(Item x);
    public Item getSmallest();
    public Item removeSmallest();
    public int size();
}

Only allows interaction w/ smallest item at any given time → better performance than List/Set
Useful for keeping track of "smallest", "largest", "best", etc. seen so far

Heaps

Binary min-heap: Binary tree that is complete & obeys min-heap property
- Min-heap property: Every node $$\leq$$ both its children
- Complete: Missing items only at bottom level (if any), all nodes as far left as possible

Insertion

Add to end of heap temporarily
Swim up hierarchy

Delete Min

Swap last item in heap into root
Sink down hierarchy (promote "better" successor)

Tree Representation

Store keys in array, offset everything by 1 spot
Leave spot 0 empty
Makes computation of children/parents "nicer"
- leftChild(k) = k * 2
- rightChild(k) = k * 2 + 1
- parent(k) = k / 2

BST → Heap

Swim (heapify up) all nodes in level order (analogous to adding element to heap)
Sink all nodes in reverse level order (analogous to removing element from heap)

Heap Implementation of a Priority Queue

Operation	Ordered Array	Bushy BST (items w/ same priority hard to handle)	Hash Table	Heap
`add`	$$\Theta(N)$$	$$\Theta(\log{N})$$	$$\Theta(1)$$	$$\Theta(\log{N})$$
`getSmallest`	$$\Theta(1)$$	$$\Theta(\log{N})$$	$$\Theta(N)$$	$$\Theta(1)$$
`removeSmallest`	$$\Theta(N)$$	$$\Theta(\log{N})$$	$$\Theta(N)$$	$$\Theta(\log{N})$$

Position in tree/heap = priority
Heap is $$\log{N}$$ time amortized (resize backing array)
- Worst case inserting $$N$$ items:
  - $$\log{1} + \log{2} + \ldots + \log{N} + 2 + 4 + 8 + 16 + \ldots + 2N = \log{N!} + 4N - 4 = N \log{N} + 4N - 4$$
  - Averaged over $$N$$ items → $$\log{N} + 4 - \frac{4}{N}$$ → $$\Theta(\log{N})$$
BST can have constant getSmallest by keeping pointer to smallest
Heaps handle duplicate priorities much more naturally than BSTs
Array based heaps take less memory

Data Structures Summary

Search Data Structures

Data Structure	Storage Operation(s)	Primary Retrieval Operation	Retrieve By:
List	`add(key)` `insert(key, index)`	`get(index)`	`index`
Map	`put(key, value)`	`get(key)`	key identity
Set	`add(key)`	`containsKey(key)`	key identity
PQ	`add(key)`	`getSmallest()`	key order/size
Disjoint Sets	`conenct(int1, int2)`	`isConnected(int1, int2)`	2 int values

Advanced Trees, including Geometric

Traversals

Tree Traversal

Level Order
- Traverse top-to-bottom, left-to-right
- Nodes "visited" in given order
Depth First Traversals
- Preorder (root, left, right), Inorder (left, root, right), Postorder (left, right, root)
  - A Weird Trick
  - Preorder Traversal Runtime
Level Order Traversal
- Iterative Deepening
  - Runtime $$\Theta(N)$$ b/c each new level doubles amount of work done & # of nodes visited
    - Exponential in height of tree, $$\Theta(2^{H})$$ → height logarithmic in # of nodes, $$H = \Theta(\log{N})$$ → overall runtime linear in # of nodes, $$\Theta(N)$$
  - Spindly Runtime $$\Theta(N^{2})$$
  - Tree Height & Runtime

Range Finding

Instead of traversing entire tree, only want to look at items in certain range

Pruning & `findInRange` Runtime

Pruning: Restrict search to only nodes containing path(s) to answer(s)
Runtime

Spatial Trees

Handling Multidimensional Data: Quadtrees

Generalization of BST
Think of items as oriented in particular 2D direction (e.g NW/NE/SW/SE instead of binary </>)
Can have mutliple different quadtrees for same set of data
- Just like BST, insertion order determines topology of QuadTree
If on boundary line, usually assume = → >
Pruning
- Analogous to binary search on BST (eliminate unnecessary search spaces)
- Ignore branches if no possible way branch could contain wanted item

Graphs

Graph Types

Directed: Edges have notion of direction (one-way)
- Acyclic: No cycles (lead back to start)
- Cyclic: $$\exists \geq 1 \text{ cycle}$$
Undirected: No notion of directionality, can traverse either way (bidirectional?)
- Acyclic: No cycles (lead back to start) w/out reusing any edges
- Cyclic: $$\exists \geq 1 \text{ path that leads to start w/out reusing any edges}$$
Any graph w/ cycle = cyclic
- If not → acyclic
Terminology
Graph Processing Problems

Graph Representations

Common Simplification
Degree = # of edges incident on graph node
Graph API
Adjacency Matrix
- Runtime
- Rows = start nodes, columns = end nodes
Edge Sets
Adjacency Lists
- Runtime
  - If sparse graph (few edges) → $$\Theta(V)$$
  - If dense graph (many edges) → $$\Theta(E)$$
- Bare-Bones Undirected Graph Implementation
Summary

Depth First Traversal

Implementation
Properties
- Guaranteed to reach every reachable node
- Runs in $$O(V + E)$$ time
  - Every edge used at most once, total # of vertex considerations = # of edges
    - # of times need to consider vertex $$<=$$ # of edges incident on it
    - May be faster for some problems which quit early on some stopping condition (e.g. connectivity)
Postorder = return order

Graph Traversals

Graph Problems

Runtime $$\Theta(V + E)$$
- Each vertex visited exactly once, each visit = constant time
- Each edge considered once
Space $$\Theta(V)$$
- Recursive call stack depth $$\leq V$$
- Space of recursive algorithm = depth of call stack

Graph Traversals

Level-order/BFS for vertices at same distance from source can be in any permutation
If $$\exists$$ multiple paths to a vertex, BFS always visits based on closest path
Traversals & Graph Problems
- Detect cycle by running DFS → $$\exists$$ cycle if node already on active call stack encountered, runtime $$\Theta(E + V)$$

Topological Sorting

Indegree 0 vertices → $$\Theta(E + V)$$, have to look at $$V$$ lists w/ total length $$E$$
Only works if graph is acyclic → $$\varnothing$$ if cyclic b/c can't have circular dependencies
- No indegree 0 vertices → cyclic → $$\varnothing$$
Topological ordering only possible iff graph is directed acyclic graph (no directed cycles)
Reverse postorder
Linearizes graph
Graph Problems

Implementation

public class DepthFirstOrder {

    private boolean[] marked;
    private Stack<Integer> reversePostorder; // using stack analogous to reversing list b/c LIFO

    public DepthFirstOrder(Digraph G) {
        reversePostorder = new Stack<>();
        marked = new boolean[G.V()];
        for (int v = 0; v < G.V(); v++) {
            if (!marked[v]) {
                dfs(G, v);
            }
        }
    }

    private void dfs(Digraph G, int v) {
        marked[v] = true;
        for (int w : G.adj(v)) {
            if (!marked[w]) {
                dfs(G, w);
            }
        }
        reversePostorder.push(v); // after each DFS is done, "visit" vertex by pushing on stack
    }
}

Works even when starting from vertices not indegree 0

Breadth First Search

Finding Level-Order

Regular Expressions

Introducing the Regular Expression
[] = set

Regular Expressions in Java

Default String method matches matches entire String, but not substrings
group(0) = first match of entire pattern, entire match
- group(i), i > 0 = parenthesized groupings

Java

Maps

Map returns null when calling get on key not in Map
Map can have keys w/ null values

FilesExpand file tree

mt2.md

Latest commit

History

mt2.md

File metadata and controls

Midterm 2 Notes

Interfaces

Abstract Classes

Autoboxing

Promotion/Primitive Widening

Immutability

Generic Methods

Type Upper Bounds

Checked vs Unchecked Exceptions

Iteration

Packages

Access Control

Access Control w/ Inheritance & Packages

Access Levels

Access Control at the Top Level

.equals()

Asymptotics

Using Potentials for Amortization

Disjoint Sets

Weighted Quick Union

Path Compression

Trees, BSTs

Tree

BST

TreeMap/TreeSet

Balanced BSTs

B-Tree

Perfect Balance & Logarithmic Height

Tree Rotation

Left-Leaning Red Back Tree (LLRB)

Maintaining Isometry Through Rotations

Preserving Isometry After Addition/Insertion Operations

Summary

Hashing

Hash Tables

Hash Functions

Default hashCodes()

Negative .hashCodes in Java

Good hashCodes()

.equals()

Summary

Priority Queue Interface

Heaps

Tree Representation

BST → Heap

Heap Implementation of a Priority Queue

Data Structures Summary

Traversals

Tree Traversal

Range Finding

Pruning & findInRange Runtime

Spatial Trees

Graph Representations

Implementation

Regular Expressions

Regular Expressions in Java

Java

Maps

`.equals()`

`TreeMap`/`TreeSet`

Default `hashCodes()`

Negative `.hashCode`s in Java

Good `hashCodes()`

`.equals()`

Pruning & `findInRange` Runtime