Skip to content

Latest commit

 

History

History
58 lines (51 loc) · 3.41 KB

File metadata and controls

58 lines (51 loc) · 3.41 KB

Lecture 7: Partitioning & Linear-Time Selection [02/02/2026]

Course: Advanced Data Structures and Algorithms


1. The Partition Problem [00:01:46]

  • Goal: Rearrange an array $A[p..r]$ around a pivot $x$ such that all elements $\le x$ are on the left, $x$ is in the middle, and all elements $> x$ are on the right.
  • Lomuto Partition (Incremental):
    1. Choose the first element $A[p]$ as the pivot $x$.
    2. Maintain an index $q$ (end of smaller region) and $j$ (scanning index).
    3. If $A[j] \le x$, swap $A[q+1]$ with $A[j]$ and increment $q$.
    4. At the end, swap the pivot $A[p]$ with $A[q]$ to place it at the boundary.
  • Complexity: $O(n)$ linear time.

2. Order Statistics (Selection Problem) [00:36:04]

  • Definition: Find the $k$th smallest element in an array of $n$ distinct integers.
  • Naive Recursive Solution:
    • Partition the array and check the pivot's rank $q$.
    • Case 1: $q=k$. Return $A[q]$.
    • Case 2: $q>k$. Recurse on the left portion $A[1..q-1]$.
    • Case 3: $q<k$. Recurse on the right portion $A[q+1..n]$ with new rank $k-q$.
  • The Bottleneck [01:12:46]: If the pivot choice $A[1]$ is consistently the minimum or maximum, the pruning is ineffective (only 1 element removed). This leads to a Worst-Case Complexity of $O(n^2)$.

3. The "Median of Medians" Innovation [01:32:46]

To force a linear $O(n)$ complexity, the pivot must be chosen "cleverly."

  • The Strategy:
    1. Divide $n$ elements into groups of 5.
    2. Sort each group of 5 (takes $O(1)$ per group, $O(n)$ total).
    3. Extract the median of each group.
    4. Median of Medians: Recursively call the Selection algorithm on the $\lceil n/5 \rceil$ medians to find the median of the medians ($x$).
    5. Use $x$ as the pivot for the main Partition.

4. Mathematical Proof of $O(n)$ [01:47:14]

  • Pruning Guarantee: The Median of Medians $x$ is guaranteed to be greater than at least 30% of the elements and smaller than at least 30% of the elements.
  • The Recurrence Relation: $$G(n) \le 3n + G(n/5) + G(7n/10)$$
    • $3n$: Cost of sorting groups and partitioning.
    • $G(n/5)$: Cost of finding the median of medians.
    • $G(7n/10)$: Cost of the recursive call on the remaining $\approx 70%$ of the data.
  • Close Form Solution [01:53:47]: Using Strong Induction, it is proven that $G(n) \le 30n$.
  • Key Criterion [02:08:33]: For a recurrence $G(n) \le cn + G(pn) + G(qn)$ to be linear, the sum of fractions $p+q$ must be strictly less than 1.
    • Here: $1/5 + 7/10 = 9/10 < 1$.
    • Note: Grouping by 3 ($1/3 + 2/3 = 1$) fails to be linear.

5. Administrative: Assignment Synch [01:21:43]

  • Communication Gap: Many students missed the official mail due to asynchronous registration and technical hiccups with SML accounts.
  • The Fix:
    • Deadline: Extended to February 10th.
    • Protocol: TAs will post to a new group email (CS5800-2026) and the Google Drive simultaneously.
  • Philosophical Encouragement [02:20:01]: The instructor urges students to value "Natural Intelligence"—critical thinking, spotting weaknesses, and fine-tuning designs—over automated solutions.

Key Terms Corrected:

  • Lomuto partitioning -> Lomuto Partitioning
  • median of median -> Median of Medians
  • order statistics -> Order Statistics
  • strong induction -> Strong Induction
  • natural intelligence -> Natural Intelligence