Give an $O(n\lg n)$-time algorithm to find the longest monotonically increasing subsequence of a sequence of $n$ numbers. ($\textit{Hint:}$ Observe that the last element of a candidate subsequence of length $i$ is at least as large as the last element of a candidate subsequence of length $i - 1$. Maintain candidate subsequences by linking them through the input sequence.)
The algorithm $\text{LONG-MONOTONIC}(A)$ returns the longest monotonically increasing subsequence of $A$, where $A$ has length $n$.
The algorithm works as follows: a new array B will be created such that $B[i]$ contains the last value of a longest monotonically increasing subsequence of length $i$. A new array $C$ will be such that $C[i]$ contains the monotonically increasing subsequence of length $i$ with smallest last element seen so far.
To analyze the runtime, observe that the entries of $B$ are in sorted order, so we can execute line 9 in $O(\lg n)$ time. Since every other line in the for-loop takes constant time, the total run-time is $O(n\lg n)$.
LONG-MONOTONIC(A)
let B[1..n] be a new array where every value = ∞
let C[1..n] be a new array
L = 1
for i = 1 to n
if A[i] < B[1]
B[1] = A[i]
C[1].head.key = A[i]
else
let j be the largest index of B such that B[j] < A[i]
B[j + 1] = A[i]
C[j + 1] = C[j]
INSERT(C[j + 1], A[i])
if j + 1 > L
L = L + 1
print C[L]
Show that the solution of $T(n) = T(\lceil n / 2 \rceil) + 1$ is $O(\lg n)$.
We guess $T(n) \le c\lg(n - a)$,
$$ \begin{aligned} T(n) & \le c\lg(\lceil n / 2 \rceil - a) + 1 \\ & \le c\lg((n + 1) / 2 - a) + 1 \\ & = c\lg((n + 1 - 2a) / 2) + 1 \\ & = c\lg(n + 1 - 2a) - c\lg 2 + 1 & (c \ge 1) \\ & \le c\lg(n + 1 - 2a) & (a \ge 1) \\ & \le c\lg(n - a), \end{aligned} $$
Show that no compression scheme can expect to compress a file of randomly chosen $8$-bit characters by even a single bit. ($\textit{Hint:}$ Compare the number of possible files with the number of possible encoded files.)
If every possible character is equally likely, then, when constructing the Huffman code, we will end up with a complete binary tree of depth $7$. This means that every character, regardless of what it is will be represented using $7$ bits.
This is exactly as many bits as was originally used to represent those characters, so the total length of the file will not decrease at all.
Prove that for any directed graph $G$, we have $((G^\text T)^{\text{SCC}})^\text T = G^{\text{SCC}}$. That is, the transpose of the component graph of $G^\text T$ is the same as the component graph of $G$.
First observe that $C$ is a strongly connected component of $G$ if and only if it is a strongly connected component of $G^\text T$. Thus the vertex sets of $G^{\text{SCC}}$ and $(G^\text T)^{\text{SCC}}$ are the same, which implies the vertex sets of $((G^\text T)^\text{SCC})^\text T$ and $G^{\text{SCC}}$ are the same. It suffices to show that their edge sets are the same. Suppose $(v_i, v_j)$ is an edge in $((G^\text T)^{\text{SCC}})^\text T$. Then $(v_j, v_i)$ is an edge in $(G^\text T)^{\text{SCC}}$. Thus there exist $x \in C_j$ and $y \in C_i$ such that $(x, y)$ is an edge of $G^\text T$, which implies $(y, x)$ is an edge of $G$. Since components are preserved, this means that $(v_i, v_j)$ is an edge in $G^{\text{SCC}}$. For the opposite implication we simply note that for any graph $G$ we have $(G^\text T)^{\text T} = G$.
The set-partition problem takes as input a set $S$ of numbers. The question is whether the numbers can be partitioned into two sets $A$ and $\bar A = S - A$ such that $\sum_{x \in A} x = \sum_{x \in \bar A} x$. Show that the set-partition problem is $\text{NP-complete}$.
(Omit!)
Prove that, after the procedure $\text{INITIALIZE-PREFLOW}(G, S)$ terminates, we have $s.e \le -|f^*|$, where $f^*$ is a maximum flow for $G$.
(Removed)
State the maximum-flow problem as a linear-programming problem.
$$ \begin{array}{ll} \max & \sum\limits_{v \in V} f(s, v) - \sum\limits_{v \in V} f(v, s) \\ s.t. & 0 \le f(u, v) \le c(u, v) \\ & \sum\limits_{v \in V} f(v, u) - \sum\limits_{v \in V} f(u, v) = 0 \end{array} $$
Show the parenthesis structure of the depth-first search of Figure 22.4.
The parentheses structure of the depth-first search of Figure 22.4 is $(u(v(y(xx)y)v)u)(w(zz)w)$.
Let $G = (V, E)$ be a weighted, directed graph with positive weight function $w: E \rightarrow \{1, 2, \ldots, W\}$ for some positive integer $W$, and assume that no two vertices have the same shortest-path weights from source vertex $s$. Now suppose that we define an unweighted, directed graph $G' = (V \cup V', E')$ by replacing each edge $(u, v) \in E$ with $w(u, v)$ unit-weight edges in series. How many vertices does $G'$ have? Now suppose that we run a breadth-first search on $G'$. Show that the order in which the breadth-first search of $G'$ colors vertices in $V$ black is the same as the order in which Dijkstra's algorithm extracts the vertices of $V$ from the priority queue when it runs on $G$.
$V + \sum_{(u, v) \in E} w(u, v) - E$.
Write recursive versions of $\text{TREE-MINIMUM}$ and $\text{TREE-MAXIMUM}$.
TREE-MINIMUM(x)
if x.left != NIL
return TREE-MINIMUM(x.left)
else return x
TREE-MAXIMUM(x)
if x.right != NIL
return TREE-MAXIMUM(x.right)
else return x
A perfect matching is a matching in which every vertex is matched. Let $G = (V, E)$ be an undirected bipartite graph with vertex partition $V = L \cup R$, where $|L| = |R|$. For any $X \subseteq V$, define the neighborhood of $X$ as
$$N(X) = \{y \in V: (x, y) \in E \text{ for some } x \in X\},$$
that is, the set of vertices adjacent to some member of $X$. Prove Hall's theorem: there exists a perfect matching in $G$ if and only if $|A| \le |N(A)|$ for every subset $A \subseteq L$.
First suppose there exists a perfect matching in $G$. Then for any subset $A \subseteq L$, each vertex of $A$ is matched with a neighbor in $R$, and since it is a matching, no two such vertices are matched with the same vertex in $R$. Thus, there are at least $|A|$ vertices in the neighborhood of $A$.
Now suppose that $|A| \le |N(A)|$ for all $A \subseteq L$. Run Ford-Fulkerson on the corresponding flow network. The flow is increased by $1$ each time an augmenting path is found, so it will suffice to show that this happens $|L|$ times. Suppose the while loop has run fewer than $L$ times, but there is no augmenting path. Then fewer than $L$ edges from $L$ to $R$ have flow $1$.
Let $v_1 \in L$ be such that no edge from $v_1$ to a vertex in $R$ has nonzero flow. By assumption, $v_1$ has at least one neighbor $v_1' \in R$. If any of $v_1$'s neighbors are connected to $t$ in $G_f$ then there is a path, so assume this is not the case. Thus, there must be some edge $(v_2, v_1)$ with flow $1$. By assumption, $N(\{v_1, v_2\}) \ge 2$, so there must exist $v_2' \ne v_1'$ such that $v_2'\in N(\{v_1, v_2 \})$. If $(v_2', t)$ is an edge in the residual network we're done since $v_2'$ must be a neighbor of $v_2$, so $s$, $v_1$, $v_1'$, $v_2$, $v_2'$, and $t$ is a path in $G_f$. Otherwise $v_2'$ must have a neighbor $v_3 \in L$ such that $(v_3, v_2')$ is in $G_f$. Specifically, $v_3 \ne v_1$ since $(v_3, v_2')$ has flow $1$, and $v_3 \ne v_2$ since $(v_2, v_1')$ has flow $1$, so no more flow can leave $v_2$ without violating conservation of flow. Again by our hypothesis, $N(\{v_1, v_2, v_3\}) \ge 3$, so there is another neighbor $v_3' \in R$.
Continuing in this fashion, we keep building up the neighborhood $v_i'$, expanding each time we find that $(v_i', t)$ is not an edge in $G_f$. This cannot happen $L$ times, since we have run the Ford-Fulkerson while-loop fewer than $|L|$ times, so there still exist edges into $t$ in $G_f$. Thus, the process must stop at some vertex $v_k'$, and we obtain an augmenting path
$$s, v_1, v_1', v_2, v_2', v_3, \ldots, v_k, v_k', t,$$
contradicting our assumption that there was no such path. Therefore the while loop runs at least $|L|$ times. By Corollary 26.3 the flow strictly increases each time by $f_p$. By Theorem 26.10 $f_p$ is an integer. In particular, it is equal to $1$. This implies that $|f| \ge |L|$. It is clear that $|f| \le |L|$, so we must have $|f| = |L|$. By Corollary 26.11 this is the cardinality of a maximum matching. Since $|L| = |R|$, any maximum matching must be a perfect matching.
Use a recursion tree to determine a good asymptotic upper bound on the recurrence $T(n) = 4T(n / 2 + 2) + n$. Use the substitution method to verify your answer.
The subproblem size for a node at depth $i$ is $n / 2^i$.
Thus, the tree has $\lg n + 1$ levels and $4^{\lg n} = n^2$ leaves.
The total cost over all nodes at depth $i$, for $i = 0, 1, 2, \ldots, \lg n - 1$, is $4^i(n / 2^i + 2) = 2^i n + 2 \cdot 4^i$.
$$ \begin{aligned} T(n) & = \sum_{i = 0}^{\lg n - 1} (2^i n + 2 \cdot 4^i) + \Theta(n^2) \\ & = \sum_{i = 0}^{\lg n - 1} 2^i n + \sum_{i = 0}^{\lg n - 1} 2 \cdot 4^i + \Theta(n^2) \\ & = \frac{2^{\lg n} - 1}{2 - 1}n + 2 \cdot \frac{4^{\lg n} - 1}{4 - 1} + \Theta(n^2) \\ & = (2^{\lg n} - 1)n + \frac{2}{3} (4^{\lg n} - 1) + \Theta(n^2) \\ & = (n - 1)n + \frac{2}{3}(n^2 - 1) + \Theta(n^2) \\ & = \Theta(n^2). \end{aligned} $$
We guess $T(n) \le c(n^2 - dn)$,
$$ \begin{aligned} T(n) & = 4T(n / 2 + 2) + n \\ & \le 4c[(n / 2 + 2)^2 - d(n / 2 + 2)] + n \\ & = 4c(n^2 / 4 + 2n + 4 - dn / 2 - 2d) + n \\ & = cn^2 + 8cn + 16c - 2cdn - 8cd + n \\ & = cn^2 - cdn + 8cn + 16c - cdn - 8cd + n \\ & = c(n^2 - dn) - (cd - 8c - 1)n - (d - 2) \cdot 8c \\ & \le c(n^2 - dn), \end{aligned} $$
where the last step holds for $cd - 8c - 1 \ge 0$.
Show that using a single bit to store each vertex color suffices by arguing that the $\text{BFS}$ procedure would produce the same result if lines 5 and 14 were removed.
The textbook introduces the $\text{GRAY}$ color for the pedagogical purpose to distinguish between the $\text{GRAY}$ nodes (which are enqueued) and the $\text{BLACK}$ nodes (which are dequeued).
Therefore, it suffices to use a single bit to store each vertex color.
Professor Gekko has always dreamed of inline skating across North Dakota. He plans to cross the state on highway U.S. 2, which runs from Grand Forks, on the eastern border with Minnesota, to Williston, near the western border with Montana. The professor can carry two liters of water, and he can skate $m$ miles before running out of water. (Because North Dakota is relatively flat, the professor does not have to worry about drinking water at a greater rate on uphill sections than on flat or downhill sections.) The professor will start in Grand Forks with two full liters of water. His official North Dakota state map shows all the places along U.S. 2 at which he can refill his water and the distances between these locations.
The professor's goal is to minimize the number of water stops along his route across the state. Give an efficient method by which he can determine which water stops he should make. Prove that your strategy yields an optimal solution, and give its running time.
The greedy solution solves this problem optimally, where we maximize distance we can cover from a particular point such that there still exists a place to get water before we run out. The first stop is at the furthest point from the starting position which is less than or equal to $m$ miles away. The problem exhibits optimal substructure, since once we have chosen a first stopping point $p$, we solve the subproblem assuming we are starting at $p$. Combining these two plans yields an optimal solution for the usual cut-and-paste reasons. Now we must show that this greedy approach in fact yields a first stopping point which is contained in some optimal solution. Let $O$ be any optimal solution which has the professor stop at positions $o_1, o_2, \dots, o_k$. Let $g_1$ denote the furthest stopping point we can reach from the starting point. Then we may replace $o_1$ by $g_2$ to create a modified solution $G$, since $o_2 - o_1 < o_2 - g_1$. In other words, we can actually make it to the positions in $G$ without running out of water. Since $G$ has the same number of stops, we conclude that $g_1$ is contained in some optimal solution. Therefore the greedy strategy works.
Show that the solution to $T(n) = 2T(\lfloor n / 2 \rfloor + 17) + n$ is $O(n\lg n)$.
We guess $T(n) \le c(n - a)\lg(n - a)$,
$$ \begin{aligned} T(n) & \le 2c(\lfloor n / 2 \rfloor + 17 - a)\lg(\lfloor n / 2 \rfloor + 17 - a) + n \\ & \le 2c(n / 2 + 17 - a)\lg(n / 2 + 17 - a) + n \\ & = c(n + 34 - 2a)\lg\frac{n + 34 - 2a}{2} + n \\ & = c(n + 34 - 2a)\lg(n + 34 - 2a) - c(n + 34 - 2a) + n & (c > 1, n > n_0 = f(a)) \\ & \le c(n + 34 - 2a)\lg(n + 34 - 2a) & (a \ge 34) \\ & \le c(n - a)\lg(n - a). \end{aligned} $$
What is an optimal Huffman code for the following set of frequencies, based on
the first $8$ Fibonacci numbers?
$$a:1 \quad b:1 \quad c:2 \quad d:3 \quad e:5 \quad f:8 \quad g:13 \quad h:21$$
Can you generalize your answer to find the optimal code when the frequencies are the first $n$ Fibonacci numbers?
$$ \begin{array}{c|l} a & 1111111 \\ b & 1111110 \\ c & 111110 \\ d & 11110 \\ e & 1110 \\ f & 110 \\ g & 10 \\ h & 0 \end{array} $$
GENERALIZATION
In what follows we use $a_i$ to denote $i$-th Fibonacci number. To avoid any confusiion we stress that we consider Fibonacci's sequence beginning $1$, $1$, i.e. $a_1 = a_2 = 1$.
Let us consider a set of $n$ symbols $\Sigma = \{c_i ~|~ 1 \le i \le n \}$ such that for each $i$ we have $c_i.freq = a_i$. We shall prove that the Huffman code for this set of symbols given by the run of algorithm HUFFMAN from CLRS is the following code:
By $code(c)$ we mean the codeword assigned to the symbol $c_i$ by the run of HUFFMAN($\Sigma$) for any $c \in \Sigma$.
First we state two technical claims which can be easily proven using the proper induction. Following good manners of our field we leave the proofs to the reader :-)
Consider tree $T_n$ inductively defined by
We shall prove that $T_n$ is the tree produced by the run of HUFFMAN($\Sigma$).
KEY CLAIM: $T_{i + 1}$ is exactly the node $z$ constructed in $i$-th run of the for-cycle of HUFFMAN($\Sigma$) and the content of the priority queue $Q$ just after $i$-th run of the for-cycle is exactly $Q = (a_{i + 2}, T_{i + 1}, a_{i + 3}, \dots, a_n)$ with $a_{i + 2}$ being the minimal element for each $1 \le i < n$. (Since we prefer not to overload our formal notation we just note that for $i = n - 1$ we claim that $Q = (T_n)$ and our notation grasp this fact in a sense.)
PROOF OF KEY CLAIM by induction on $i$.
KEY CLAIM tells us that just after the last execution of the for-cycle we have $Q = (T_n)$ and therefore the line 9 of HUFFMAN returns $T_n$ as the result. One can easily see that the code given in the beginning is exactly the code which corresponds to the code-tree $T_n$.
Show that the class $P$, viewed as a set of languages, is closed under union, intersection, concatenation, complement, and Kleene star. That is, if $L_1, L_2 \in P$, then $L_1 \cup L_2 \in P$, $L_1 \cap L_2 \in P$, $L_1L_2 \in P$, $\bar L_1 \in P$, and $L_1^* \in P$.
(Omit!)
Solve the instance of the scheduling problem given in Figure 16.7, but with each penalty $w_i$ replaced by $80 - w_i$.
$$ \begin{array}{c|ccccccc} a_i & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ \hline d_i & 4 & 2 & 4 & 3 & 1 & 4 & 6 \\ w_i & 10 & 20 & 30 & 40 & 50 & 60 & 70 \end{array} $$
We begin by just greedily constructing the matroid, adding the most costly to leave incomplete tasks first. So, we add tasks $7, 6, 5, 4, 3$. Then, in order to schedule tasks $1$ or $2$ we need to leave incomplete more important tasks. So, our final schedule is $\langle 5, 3, 4, 6, 7, 1, 2 \rangle$ to have a total penalty of only $w_1 + w_2 = 30$.
Give a dynamic-programming solution to the $0$-$1$ knapsack problem that runs in $O(nW)$ time, where $n$ is the number of items and $W$ is the maximum weight of items that the thief can put in his knapsack.
Suppose we know that a particular item of weight $w$ is in the solution. Then we must solve the subproblem on $n − 1$ items with maximum weight $W − w$. Thus, to take a bottom-up approach we must solve the $0$-$1$ knapsack problem for all items and possible weights smaller than W. We'll build an $n + 1$ by $W + 1$ table of values where the rows are indexed by item and the columns are indexed by total weight. (The first row and column of the table will be a dummy row).
For row $i$ column $j$, we decide whether or not it would be advantageous to include item i in the knapsack by comparing the total value of of a knapsack including items $1$ through $i − 1$ with max weight $j$, and the total value of including items $1$ through $i − 1$ with max weight $j − i.weight$ and also item $i$. To solve the problem, we simply examine the $n$, $W$ entry of the table to determine the maximum value we can achieve. To read off the items we include, start with entry $n$, $W$. In general, proceed as follows: if entry $i$, $j$ equals entry $i - 1$, $j$, don't include item $i$, and examine entry $i - 1$, $j$ next. If entry $i$, $j$ doesn't equal entry $i − 1$, $j$, include item $i$ and examine entry $i − 1$, $j − i$.weight next. See algorithm below for construction of table:
0-1-KNAPSACK(n, W)
Initialize an (n + 1) by (W + 1) table K
for i = 1 to n
K[i, 0] = 0
for j = 1 to W
K[0, j] = 0
for i = 1 to n
for j = 1 to W
if j < i.weight
K[i, j] = K[i - 1, j]
else
K[i, j] = max(K[i - 1, j], K[i - 1, j - i.weight] + i.value)
Suppose that we construct a binary search tree by repeatedly inserting distinct values into the tree. Argue that the number of nodes examined in searching for a value in the tree is one plus the number of nodes examined when the value was first inserted into the tree.
Number of nodes examined while searching also includes the node which is searched for, which isn't the case when we inserted it.