asymptotics

Big-O
- big-oh: O(g): functions that grow no faster than g - upper bound, runs in time less than g
  - $f(n) \leq c\cdot g(n)$ for some c, large n
  - set of functions s.t. there exists c,k>0, 0 ≤ f(n) ≤ c*g(n), for all n > k
- big-theta: Θ(g): functions that grow at the same rate as g
  - big-oh(g) and big-theta(g) - asymptotic tight bound
- big-omega: Ω(g): functions that grow at least as fast as g
  - f(n)≥c*g(n) for some c, large n
- Example: f = 57n+3
  - O(n^2) - or anything bigger
  - Θ(n)
  - Ω(n^.5) - or anything smaller
  - input must be positive
  - We always analyze the worst case run-time
- little-omega: omega(g) - functions that grow faster than g
- little-o: o(g) - functions that grow slower than g
  - we write f(n) ∈ O(g(n)), not f(n) = O(g(n))
- They are all reflexive and transitive, but only Θ is symmetric.
  - Θ defines an equivalence relation.
add 2 functions, growth rate will be $O(max(g_1(n)+g_2(n))$ (same for sequential code)
recurrence thm:$ f(n) = O(n^c) => T(n) = 0(n^c)$
- $T(n) = a*T(n/b) + f(n)$
- $c = log_b(a)$
over bounded number of elements, almost everything is constant time

recursion

moving down/right on an NxN grid - each path has length (N-1)+(N-1)
- we must move right N-1 times
- ans = (N-1+N-1 choose N-1)
- for recursion, if a list is declared outside static recursive method, it shouldn’t be static
generate permutations - recursive, add char at each spot
think hard about the base case before starting
- look for lengths that you know
- look for symmetry
n-queens - one array of length n, go row by row

dynamic programming

//returns max value for knapsack of capacity W, weights wt, vals val
int knapSack(int W, int wt[], int val[])
int n = wt.length;
int K[n+1][W+1];
//build table K[][] in bottom up manner
for (int i = 0; i <= n; i++)
   for (int w = 0; w <= W; w++)
	   if $(i==0 \vert \vert  w==0)$ // base case
		   K[i][w] = 0;
	   else if (wt[i-1] <= w) //max of including weight, not including
		   K[i][w] = max(val[i-1] + K[i-1][w-wt[i-1]], K[i-1][w]);
	   else //weight too large
		   K[i][w] = K[i-1][w];
return K[n][W];

min-cut / max-cut

hungarian

assign N things to N targets, each with an associated cost

max-flow

A list of pipes is given, with different flow-capacities. These pipes are connected at their endpoints. What is the maximum amount of water that you can route from a given starting point to a given ending point?

sorting

you can assume w.l.o.g. all input numbers are unique
sorting requires Ω(nlog n) (proof w/ tree)
- considerations: worst case, average, in practice, input distribution, stability (order coming in is preserved for things with same keys), in-situ (in-place), stack depth, having to read/write to disk (disk is much slower), parallelizable, online (more data coming in)
adaptive - changes its behavior based on input (ex. bubble sort will stop)

comparised-based

bubble sort

keep swapping adjacent pairs

for i=1:n-1
	if a[i+1]<a[i]
		swap(a,i,i+1)

have a flag that tells if you did no swaps - done
number of passes ~ how far elements are from their final positions
O(n^2)

odd-even sort

swap even pairs
then swap odd pairs
parallelizable

selection sort

move largest to current position for i=1:n-1 for j=1:n x = max(x,a[j]) jmax = j swap(a,i,j)
0(n^2)

insertion sort

insert each item into lists for i=2:n insert a[i] into a[1..(i-1)] shift
O(n^2), O(nk) where k is max dist from final position
best when alsmost sorted

heap sort

insert everything into heap
kepp remove max
can do in place by storing everything in array
can use any height-balanced tree instead of heap
- traverse tree to get order
- ex. B-tree: multi-rotations occur infrequently, average O(log n) height
0(n log n)

smooth sort

adaptive heapsort
collection of heaps (each one is a factor larger than the one before)
can add and remove in essentially constant time if data is in order

merge sort

split into smaller arrays, sort, merge
T(n) = 2T(n/2) + n = 0(n log n)
stable, parallelizable (if parallel, not in place)

quicksort

split on pivot, put smaller elements on left, larger on right
O(n log n) average, O(n^2) worst
O(log n) space

shell sort

generalize insertion sort
insertion-sort all items i apart where i starts big and then becomes small
- sorted after last pass (i=1)
O(n^2), O(n^(3/2), … unsure what complexity is
- no matter what must be more than n log n
not used much in practice

not comparison-based

counting sort

use values as array indices in new sort
keep count of number of times at each index
for specialized data only, need small values
0(n) time, 0(k) space

bucket sort

spread data into buckets based on value
sort the buckets
O(n+k) time
buckets could be trees

radix sort

sort each digit in turn
stable sort on each digit
- like bucket sort d times
0(d*n time), 0(k+n) space

meta sort

like quicksort, but 0(nlogn) worst case
run quicksort, mergesort in parallel
- stop when one stops
there is an overhead but doesn’t affect big-oh analysis
ave, worst-cast = O(n log n)

sorting overview

in exceptional cases insertion-sort or radix-sort are much better than the generic QuickSort / MergeSort / HeapSort answers.
merge a and b sorted - start from the back

searching

binary sort can’t do better than linear if there are duplicates
if data is too large, we need to do external sort (sort parts of it and write them back to file)
write binary search recursively
- use low<= val and high >=val so you get correct bounds
- binary search with empty strings - make sure that there is an element at the end of it
“a”.compareTo(“b”) is -1
we always round up for these
finding minimum is Ω(n)
- pf: assume an element was ignored, that element could have been minimum
- simple algorithm - keep track of best so far
- thm: n/2 comparisons are necessary because each comparison involves 2 elements
- thm: n-1 comparisons are necessary - need to keep track of knowledge gained
  - every non-min element must win atleast once (move from unkown to known)
find min and max
- naive solution has 2n-2 comparison
- pairwise compare all elements, array of maxes, array of mins = n/2 comparisons
  - check min array, max array = 2* (n/2-1)
- 3n/2-2 comparisons are sufficient (and necessary)
  - pf: 4 categories (not tested, only won, only lost, both)
  - not tested-> w or l =n/2 comparisons
  - w or l -> both = n/2-1
  - therefore 3n/2-2 comparisons necessary
find max and next-to-max
- thm: n-2 + log(n) comparisons are sufficient
- consider elimination tournament, pairwise compare elements repeatedly
  - 2nd best must have played best at some point - look for it in log(n)
selection - find ith largest integer
- repeatedly finding median finds ith largest
- finding median linear yields ith largest linear
  - T(n) = T(n/2) + M(n) where M(n) is time to find median
- quickselect - partition around pivot and recur
  - average time linear, worst case O(n^2)
median in linear time - quickly eliminate a constant fraction and repeat
- partition into n/5 groups of 5
  - sort each group high to low
  - find median of each group
  - compute median of medians recursively
  - move groups with larger medians to right
    - move groups with smaller medians to left
  - now we know 3/10 of elements larger than median of medians
    - 3/10 of elements smaller than median of medians
  - partition all elements around median of medians
    - recur like quickselect
    - guarantees each partition contains at most 7n/10 elements
  - T(n) = T(n/5) + T(7n/10) + O(n) -> f(x+y)≥f(x)+f(y)
  - T(n) ≤ T(9n/10) + O(n) -> this had to be less than T(n)

computational geometry

range queries
- input = n points (vectors) with preprocessing
- output - number of points within any query rectangle
- 1D
  - range query is a pair of binary searches
  - O(log n) time per query
  - O(n) space, O(n log n) preprocessing time
- 2D
  - subtract out rectangles you don’t want
  - add back things you double subtracted
  - we want rectangles anchored at origin
- nD
  - make regions by making a grid that includes all points
  - precompute southwest counts for all regions - different ways to do this - tradeoffs between space and time
  - O(log n) time per query (after precomputing) - binary search x,y
polygon-point intersection
- polygon - a closed sequence of segments
- simple polygon - has no intersections
- thm (Jordan) - a simple polygon partitions the plane into 3 regions: interior, exterior, boundary
- convex polygon - intersection of half-planes
- polytope - higher-dimensional polygon
- raycasting
  - intersections = odd - interior, even - exterior
  - check for tangent lines, intersecting corners
  - O(n) time per query, O(1) space and time
- convex case
  - preprocessing
    - find an interior point p (pick a vertext or average the vertices)
    - partition into wedges (slicing through vertices) w.r.t. p
    - sort wedges by polar angle
  - query
    - find containing wedge (look up by angle)
    - test interior / exterior
      - check triangle - cast ray from p to point, see if it crosses edge
  - O(log n) time per query (we binary search the wedges)
  - O(n) space and O(n log n) preprocessing time
- non-convex case
  - preprocessing
    - sort vertices by x
    - find vertical slices
    - partition into trapezoids (triangle is trapezoid)
    - sort slice trapezoids by y
  - query
    - find containing slice
    - find trapezoid in slice
    - report interior/ exterior
  - O(log n) time per query (two binary searches)
  - O(n^2) space and O(n^2) preprocessing time
convex hull
- input: set of n points
- output: smallest containing convex polygon
- simple solution 1 - Jarvis’s march
- simple solution 2 - Graham’s scan
- mergehull
  - partition into two sets - computer MergeHull of each set
  - merge the two resulting CHS
    - pick point p with least x
    - form angle-monotone chains w.r.t p
    - merge chains into angle-sorted list
    - run Graham’s scan to form CH
  - T(n) = 2T(n/2) + n = 0(n log n)
  - generalizes to higher dimensions
  - parallelizes
- quickhull (like quicksort)
  - find right and left-most points
    - partition points along this line
    - find points farthest from line - make quadrilateral
      - eliminate all internal points
    - recurse on 4 remaining regions
    - concatenate resulting CHs
  - O(n log n) expected time
  - O(n^2) worst-case time - ex. circle
  - generalizes to higher dim, parallelizes
- lower bound - CH requires Ω(n log n) comparisons
  - pf - reduce sorting to convex hull
  - consider arbitrary set of x_i to be sorted
  - raise the x_i to the parabola (x_i,x_i^2) - could be any concave function
  - compute convex hull of the parabola - all connected and line on top
  - from convex hull we can get sorted x_i => convex hull did sorting so at least n log n comparisons
  - corollary - Graham’s scan is optimal
- Chan’s convex hull algorithm
  - assume we know CH size m=h
  - partitoin points into n/m sets of m each
convex polygon diameter
Voronoi diagrams - input n points - takes O(nlogn) time to compute
- problems that are solved
  - Voronoi cell - the set of points closer to any given point than all others form a convex polygon
  - generalizes to other metrics (not just Euclidean distance)
  - a Voronoi cell is unbounded if and only if it’s point is on the convex hull
    - corollary - convex hull can be computed in linear time
  - Voronoi diagram has at most 2n-5 vertices and 3n-6 edges
  - every nearest neighbor of a point defines an edge of the Voronoi diagram
    - corollary - all nearest neighbors can be computed from the Voronoi diagram in linear time
    - corollary - nearest neighbor search in O(log n) time using planar subdivision search (binary search in 2D)
  - connection points of neighboring Voronoi diagram cells form a triangulation (Delanuay triangulation)
  - a Delanuay triangulation maximizes the minimum angle over all triangulations - no long slivery triangles
    - Euclidean minimum spanning tree is a subset of the Delanuay triangulation (can be computed easily)
- calculating Voronoi diagram
  - discrete case / bitmap - expand breadth-first waves from all points
    - time is O(bitmap size)
    - time is independent of #points
  - intersecting half planes
    - Voronoi cell of a point is intersection of all half-planes induced by the perpendicular bisectors w.r.t all other points
    - use intersection of convex polygons to intersect half-planes (nlogn time per cell)
  - can be computed in nlogn total time
    1. idea divide and conquer
      - merging is complex
    2. sweep line using parabolas