Introduction to Algorithm Assignment 2 Solution

   

  1. You use a hash function $h$ to hash $n$ distinct keys into an array $T$ of length $m$. Assuming independent uniform hashing, what is the expected number of collisions? More precisely, what is the expected cardinality of ${\{ {\{k_1, k_2 }\} : k_1 \ne k_2 \textrm{ and } h(k_1) = h(k_2) }\}$?

Solution:

Under the assumption of simple uniform hashing, we will use linearity of expectation to compute this.

Suppose that all the keys are totally ordered $\{k_1, \dots, k_n\}$. Let $X_i$ be the number of $\ell$'s such that $\ell > k_i$ and $h(\ell) = h(k_i)$. So $X_i$ is the (expected) number of times that key $k_i$ is collided by those keys hashed afterward. Note, that this is the same thing as $\sum_{j > i} \Pr(h(k_j) = h(k_i)) = \sum_{j > i} 1 / m = (n - i) / m$. Then, by linearity of expectation, the number of collisions is the sum of the number of collisions for each possible smallest element in the collision. The expected number of collisions is

$$\sum_{i = 1}^n \frac{n - i}{m} = \frac{n^2 - n(n + 1) / 2}{m} = \frac{n^2 - n}{2m}.$$

   

  1. Consider inserting the keys $10, 22, 31, 4, 15, 28, 17, 88, 59$ into a hash table of length $m = 11$ using open addressing. Illustrate the result of inserting these keys using linear probing with $h(k, i) = (k + i) \mod m$ and using double hashing with $h_1(k) = k$ and $h_2(k) = 1 + (k \mod (m - 1))$.

Solution:

We use $T_t$ to represent each time stamp $t$ starting with $i = 0$, and if encountering a collision, then we iterate $i$ from $i = 1$ to $i = m - 1 = 10$ until there is no collision.

  • Linear probing:

    $$ \begin{array}{r|ccccccccc} h(k, i) = (k + i) \mod 11 & T_0 & T_1 & T_2 & T_3 & T_4 & T_5 & T_6 & T_7 & T_8 \\ \hline 0 \mod 11 & & 22 & 22 & 22 & 22 & 22 & 22 & 22 & 22 \\ 1 \mod 11 & & & & & & & & 88 & 88 \\ 2 \mod 11 & & & & & & & & & \\ 3 \mod 11 & & & & & & & & & \\ 4 \mod 11 & & & & 4 & 4 & 4 & 4 & 4 & 4 \\ 5 \mod 11 & & & & & 15 & 15 & 15 & 15 & 15 \\ 6 \mod 11 & & & & & & 28 & 28 & 28 & 28 \\ 7 \mod 11 & & & & & & & 17 & 17 & 17 \\ 8 \mod 11 & & & & & & & & & 59 \\ 9 \mod 11 & & & 31 & 31 & 31 & 31 & 31 & 31 & 31 \\ 10 \mod 11 & 10 & 10 & 10 & 10 & 10 & 10 & 10 & 10 & 10 \end{array} $$

  • Double hashing:

    $$ \begin{array}{r|ccccccccc} h(k, i) = (k + i(1 + k \mod 10)) \mod 11 & T_0 & T_1 & T_2 & T_3 & T_4 & T_5 & T_6 & T_7 & T_8 \\ \hline 0 \mod 11 & & 22 & 22 & 22 & 22 & 22 & 22 & 22 & 22 \\ 1 \mod 11 & & & & & & & & & \\ 2 \mod 11 & & & & & & & & & 59 \\ 3 \mod 11 & & & & & & & 17 & 17 & 17 \\ 4 \mod 11 & & & & 4 & 4 & 4 & 4 & 4 & 4 \\ 5 \mod 11 & & & & & 15 & 15 & 15 & 15 & 15 \\ 6 \mod 11 & & & & & & 28 & 28 & 28 & 28 \\ 7 \mod 11 & & & & & & & & 88 & 88 \\ 8 \mod 11 & & & & & & & & & \\ 9 \mod 11 & & & 31 & 31 & 31 & 31 & 31 & 31 & 31 \\ 10 \mod 11 & 10 & 10 & 10 & 10 & 10 & 10 & 10 & 10 & 10 \end{array} $$

   

  1. Consider a hash table with 9 slots and the hash function $h(k) = k \mod 9$. Demonstrate what happens upon inserting the keys $5, 28, 19, 15, 20, 33, 12, 17, 10$ with collisions resolved by chaining.

Solution:

Let us number our slots $0, 1, \dots, 8$.

Then our resulting hash table will look like following:

$$ \begin{array}{c|l} h(k) & \text{keys} \\ \hline 0 \mod 9 & \\ 1 \mod 9 & 10 \to 19 \to 28 \\ 2 \mod 9 & 20 \\ 3 \mod 9 & 12 \\ 4 \mod 9 & \\ 5 \mod 9 & 5 \\ 6 \mod 9 & 33 \to 15 \\ 7 \mod 9 & \\ 8 \mod 9 & 17 \end{array} $$

   

  1. Suppose that we use an open-addressed hash table of size $m$ to store $n \le m / 2$ items.

    a. Assuming uniform hashing, show that for $i = 1, 2, \ldots, n$, the probability is at most $2^{-k}$ that the $i$th insertion requires strictly more than $k$ probes.

Solution:

From the text we know that:

$$ \Pr\{ X \ge i \} = \Pr\{ X > i - 1 \} = \frac{n}{m} \cdot \frac{n-1}{m-1} \cdots \frac{n-i+2}{m-i+2} \le \left( \frac{n}{m} \right)^{i-1} = \alpha^{i-1} $$

Since we know that $n \le m/2$, we know that:

$$ \Pr\{X > k\} = \Pr\{X \ge k+1 \} \le \left( \frac{n}{m} \right)^k \le \left( \frac{m}{2m} \right)^k = \left( \frac{1}{2} \right)^k = 2^{-k} $$

b. Show that for $i = 1, 2, \ldots, n$, the probability is $O(1 / n^2)$ that the $i$th insertion requires more than $2\lg n$ probes.

Solution:

Substitute in the previous with $k = 2\lg{n} = \lg{n^2}$:

$$ \Pr\{X > \lg{n^2}\} \le 2^{-\lg{n^2}} = \frac{1}{n^2} = O(1/n^2) $$

Let the random variable $X_i$ denote the number of probes required by the $i$th insertion. You have shown in part (b) that $\Pr {X_i ; 2\lg n} = O(1 / n^2)$. Let the random variable $X = \max_{1 \le i \le n} X_i$ denote the maximum number of probes required by any of the $n$ insertions.

c. Show that $\Pr{\{X ; 2\lg n}\} = O(1 / n)$.

Solution:

$$ \begin{aligned} \Pr\{X > 2\lg{n}\} &= \Pr\{\bigcup_{i=1}^n \left( X_i > 2\lg{n} \right) \} && \\ &\le \sum_{i=1}^{n} \Pr\{X_i > 2\lg{n} \} && \text{since } \Pr\{A \cup B\} \le \Pr\{A\} + \Pr\{B\} \\ &\le \sum_{i=1}^{n} \frac{1}{n^2} && \text{because of (b)} \\ &= \frac{n}{n^2} \\ &= O(1/n) \end{aligned} $$

d. how that the expected length $\text E[X]$ of the longest probe sequence is $O(\lg n)$.

Solution:

$$ \begin{aligned} E[X] &= \sum_{k=1}^{n} k \Pr \{ X = k \} \\ &= \sum_{k=1}^{\lceil 2\lg{n} \rceil} k \Pr\{X = k\} + \sum_{\lceil 2\lg{n} \rceil + 1}^n k \Pr\{X = k\} \\ &\le \sum_{k=1}^{\lceil 2\lg{n} \rceil} \lceil 2\lg{n} \rceil \cdot \Pr\{X = k\} + \sum_{\lceil 2\lg{n} \rceil + 1}^n n \cdot \Pr\{X = k\} \\ &= \lceil 2\lg{n} \rceil \sum_{k=1}^{\lceil 2\lg{n} \rceil} \Pr\{X = k\} + n \sum_{\lceil 2\lg{n} \rceil + 1}^n \Pr\{X = k\} \\ \end{aligned} $$

   

  1. Show, by means of a counterexample, that the following “greedy” strategy does not always determine an optimal way to cut rods. Define the density of a rod of length $i$ to be $p_i / i$, that is, its value per inch. The greedy strategy for a rod of length $n$ cuts off a first piece of length $i$, where $1 \le i \le n$, having maximum density. It then continues by applying the greedy strategy to the remaining piece of length $n - i$.

Solution:

The counterexample:

$$ \begin{array}{c|cccc} \text{length $i$} & 1 & 2 & 3 & 4 \\ \hline \text{price $p_i$} & 1 & 20 & 33 & 36 \\ p_i / i & 1 & 10 & 11 & 9 \end{array} $$

   

  1. Give an $O(n)$-time dynamic-programming algorithm to compute the nth Fibonacci number. Draw the subproblem graph. How many vertices and edges are in the graph?

Solution:

There are $n + 1$ vertices in the subproblem graph, i.e., $v_0, v_1, \dots, v_n$.</p>

  • For $v_0, v_1$, each has $0$ leaving edge.
  • For $v_2, v_3, \dots, v_n$, each has $2$ leaving edges.

Thus, there are $2n - 2$ edges in the subproblem graph.

   

  1. Find an optimal parenthesization of a matrix-chain product whose sequence of dimensions is $\langle 5, 10, 3, 12, 5, 50, 6 \rangle$.

Solution: \(((5 \times 10)(10 \times 3))(((3 \times 12)(12 \times 5))((5 \times 50)(50 \times 6))).\)

   

  1. Give a recursive algorithm $\text{MATRIX-CHAIN-MULTIPLY}(A, s, i, j)$ (algorithm in 26 page of Week 6) that actually performs the optimal matrix-chain multiplication, given the sequence of matrices $\langle A_1, A_2, \ldots ,A_n \rangle$, the $s$ table computed by $\text{MATRIX-CHAIN-ORDER}$, and the indices $i$ and $j$. (The initial call would be $\text{MATRIX-CHAIN-MULTIPLY}(A, s, 1, n)$.)

Solution:

def matrix_chain_multiply(arr, s, i, j):
    """
    Args:
      arr: The array of matrices.
      s: The table computed by MATRIX-CHAIN-ORDER.
      i: The left index of the array.
      j: The right index of the array.

    Returns:
      marix: The result of the matrix chain multiplication.
    """
    
    if i == j
        return A[i]
    if i + 1 == j
        return A[i] * A[j]
    b = matrix_chain_multiply(A, s, i, s[i, j])
    c = matrix_chain_multiply(A, s, s[i, j] + 1, j)
    return b * c

   

  1. Determine the cost and structure of an optimal binary search tree for a set of $n = 7$ keys with the following probabilities.
\[\begin{array}{c|cccccccc} i & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ \hline p_i & & 0.04 & 0.06 & 0.08 & 0.02 & 0.10 & 0.12 & 0.14 \\ q_i & 0.06 & 0.06 & 0.06 & 0.06 & 0.05 & 0.05 & 0.05 & 0.05 \end{array}\]

Solution:

Cost is $3.12$.

$$ \begin{aligned} & \text{$k_5$ is the root} \\ & \text{$k_2$ is the left child of $k_5$} \\ & \text{$k_1$ is the left child of $k_2$} \\ & \text{$d_0$ is the left child of $k_1$} \\ & \text{$d_1$ is the right child of $k_1$} \\ & \text{$k_3$ is the right child of $k_2$} \\ & \text{$d_2$ is the left child of $k_3$} \\ & \text{$k_4$ is the right child of $k_3$} \\ & \text{$d_3$ is the left child of $k_4$} \\ & \text{$d_4$ is the right child of $k_4$} \\ & \text{$k_7$ is the right child of $k_5$} \\ & \text{$k_6$ is the left child of $k_7$} \\ & \text{$d_5$ is the left child of $k_6$} \\ & \text{$d_6$ is the right child of $k_6$} \\ & \text{$d_7$ is the right child of $k_7$} \end{aligned} $$

   

  1. What is an optimal Huffman code for the following set of frequencies, based on the first $8$ Fibonacci numbers?
\[a:1 \quad b:1 \quad c:2 \quad d:3 \quad e:5 \quad f:8 \quad g:13 \quad h:21\]

Can you generalize your answer to find the optimal code when the frequencies are the first $n$ Fibonacci numbers?

\[\begin{array}{c|l} a & 1111111 \\ b & 1111110 \\ c & 111110 \\ d & 11110 \\ e & 1110 \\ f & 110 \\ g & 10 \\ h & 0 \end{array}\]

Solution:

In what follows we use $a_i$ to denote $i$-th Fibonacci number. To avoid any confusiion we stress that we consider Fibonacci's sequence beginning $1$, $1$, i.e. $a_1 = a_2 = 1$.

Let us consider a set of $n$ symbols $\Sigma = \{c_i ~|~ 1 \le i \le n \}$ such that for each $i$ we have $c_i.freq = a_i$. We shall prove that the Huffman code for this set of symbols given by the run of algorithm HUFFMAN from CLRS is the following code:

  • $code(c_n) = 0$
  • $code(c_{i - 1}) = 1code(c_i)$ for $2 \le i \le n - 1$ (i.e. we take a code for symbol $c_i$ and add $1$ to the beginning)
  • $code(c_1) = 1^{n - 1}$

By $code(c)$ we mean the codeword assigned to the symbol $c_i$ by the run of HUFFMAN($\Sigma$) for any $c \in \Sigma$.