Posts tagged math

Lately, I've come across several programming competitions with a strong mathematical component. You would think that this would be my strong suit, but I actually struggled on a few of them. For this reason, I'll write about some of the solutions here.

Sharing Candies

Here's Sharing Candies from CodeChef.

Alvin and Berto have gotten tired of eating chocolates, so now they have decided to eat candies instead.

Alvin has $A$ apple candies, and Berto has $B$ banana candies. (I know, they have weird tastes.) Alvin and Berto always wants the split of candies to be as fair as possible. The problem is, Alvin only wants apple candies and Berto only wants banana candies!

Here comes Chef to the rescue! Chef bought an infinite number of candy packs. There are two types of packs:

Packs containing exactly $C$ apple candies.

Packs containing exactly $D$ banana candies.

Chef wants to give some (could be zero) apple candy packs to Alvin and some (could be zero) banana candy packs to Berto in such a way that the absolute difference between the number of candies they have is minimized. What is this minimum absolute difference?

Note that Chef doesn't want to open any pack; he gives each pack in full.

Let $x$ be the number of packs of apple candies that Chef gives Alvin and $y$ be the number of packs of banana candies. The absolute difference can be written \begin{equation} \left|(A + xC) - (B + yD)\right| = \left|(A - B) + (xC - yD)\right|. \end{equation}

Let $d$ be the greatest common denominator of $C$ and $D$. Then, by Bézout's identity, we have that \begin{equation} xC - yD = kd \end{equation} for any $k$ and some $x$ and $y$. Moreover, every integer solution $(x^\prime,y^\prime)$ belongs to the set \begin{equation} S_{x,y,k} = \left\{\left(x + l\frac{D}{d}, y + l\frac{C}{d}\right) : l \in \mathbb{Z}\right\}. \end{equation}

By choosing large $l$, we can have both $x^\prime \geq 0$ and $y ^\prime \geq 0$, so we have a nonnegative integer solutions. Thus, our solution is \begin{equation} \min\left(\left|A - B\right| \bmod d, d - \left(\left|A - B\right| \bmod d\right)\right) \end{equation} since those are two numbers closet to $0$ we can get by adding or subtracting $k$ to $A - B$. Note that we don't actually have to calculate $x$ and $y$, so we just find $d$ with the Euclidean algorithm. Here's the code.

#include <algorithm>
#include <cmath>
#include <climits>
#include <iostream>

using namespace std;

long long computeGcd(long long a, long long b) {
  if (a < b) swap(a, b);
  // now a >= b
  if (b == 0) return a;
  long long q = a/b;
  a -= q*b;
  return computeGcd(b, a);
}

long long minimizeDifference(long long A, long long B,
                             long long C, long long D) {
  long long delta = A > B ? A - B : B - A;
  long long gcd = computeGcd(C, D);  
  return min(delta % gcd, gcd - (delta % gcd));
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int T; cin >> T;
  for (int t = 0; t < T; ++t) {
    long long A, B, C, D;
    cin >> A >> B >> C >> D;
    cout << minimizeDifference(A, B, C, D) << '\n';
  }
  cout << flush;
  return 0;
}

Bear and Tower of Cubes

Here's Bear and Tower of Cubes from Codeforces.

Limak is a little polar bear. He plays by building towers from blocks. Every block is a cube with positive integer length of side. Limak has infinitely many blocks of each side length.

A block with side a has volume $a^3$. A tower consisting of blocks with sides $a_1,a_2,\ldots,a_k$ has the total volume $a_1^3 + a_2^3 + \cdots + a_k^3$.

Limak is going to build a tower. First, he asks you to tell him a positive integer $X$ — the required total volume of the tower. Then, Limak adds new blocks greedily, one by one. Each time he adds the biggest block such that the total volume doesn't exceed $X$.

Limak asks you to choose $X$ not greater than $m$. Also, he wants to maximize the number of blocks in the tower at the end (however, he still behaves greedily). Secondarily, he wants to maximize $X$.

Can you help Limak? Find the maximum number of blocks his tower can have and the maximum $X \leq m$ that results this number of blocks.

The key observation to make that I missed is realize that if $a$ is the greatest integer such that $a^3 \leq m$, then we should choose the first block to have length either $a$ or $a-1$.

For a proof of this fact, suppose that we choose our first block to be of length $b$. Then, it must be true that $b^3 \leq X \lneq (b+1)^3$. Our new volume limit is then $X - b^3$ after choosing a block of length $b$. Now, note that \begin{equation} 0 \leq X - b^3 < (b+1)^3 - b^3 = 3b^2 + 3b = 3b(b+1), \end{equation} In this way, the upper bound of our new volume limit $m^\prime = X - b^3$ increases as a function of $b$. Thus, it makes sense to choose $b$ as large as possible. But if we choose $b = a$, then $0 \leq m^\prime \leq m - a^3$ since it's possible that $m$ is not much larger than $a^3$.

Thus, for every volume limit we choose a block of length $a$ or $a - 1$. Since the volume limit decreases quickly, we can just use a brute force recursive solution and keep track of the number of blocks and total volume as we go. Here's the code.

#include <cmath>
#include <iostream>
#include <utility>

using namespace std;

pair<int, long long> findMaxBlocksAndVolumeHelper(long long volumeLimit, 
                                                  int currentBlocks,
                                                  long long currentVolume) {
  if (volumeLimit == 0) return make_pair(currentBlocks, currentVolume);
  long long maxA = floor(cbrt(volumeLimit));
  pair<int, long long> bigBlockState = findMaxBlocksAndVolumeHelper(volumeLimit - maxA*maxA*maxA,
                                                                    currentBlocks + 1,
                                                                    currentVolume + maxA*maxA*maxA);
  if (maxA > 1) {
    pair<int, long long> smallBlockState = findMaxBlocksAndVolumeHelper(maxA*maxA*maxA - 1 - (maxA - 1)*(maxA - 1)*(maxA - 1),
                                                                        currentBlocks + 1,
                                                                        currentVolume + (maxA - 1)*(maxA - 1)*(maxA - 1));
    if (smallBlockState.first > bigBlockState.first ||
        (smallBlockState.first == bigBlockState.first && smallBlockState.second > bigBlockState.second)) 
      return smallBlockState;
  } 
  return bigBlockState;

}

pair<int, long long> findMaxBlocksAndVolume(long long volumeLimit) {
  return findMaxBlocksAndVolumeHelper(volumeLimit, 0, 0);
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  long long M; cin >> M;
  pair<int, long long> blockVolume = findMaxBlocksAndVolume(M);
  cout << blockVolume.first << ' ' << blockVolume.second << endl;
  return 0;
}

Longest Increasing Subsequences

In another, CodeChef problem Longest Increasing Subsequences, we make use of Ternary numerical system.

Chef recently learned about the classic Longest Increasing Subsequence problem. However, Chef found out that while the length of the longest increasing subsequence is unique, the longest increasing subsequence itself is not necessarily unique; for example, in the array $[1, 3, 2, 4]$, there are two longest increasing subsequences: $[1, 3, 4]$ and $[1, 2, 4]$.

Chef decided to investigate on this more, and now he has sufficient mastery in it that he was able to come up with a problem:

Given an integer $K$, output an integer $N$ and a permutation of the array $[1, 2,\ldots, N]$ such that there are exactly $K$ longest increasing subsequences. Chef also requires that $1 \leq N \leq 100$, otherwise he found the problem is too easy.

In case there are multiple possible answers, any one will be accepted.

Consider blocks of decreasing sequences $B_1, B_2,\ldots,B_n$, where for all $x \in B_i$ and $y \in B_j$, $x < y$ if $i < j$. For example, we might have $n = 3$ and \begin{align*} B_1 &= [5,4,3,2,1] \\ B_2 &= [8,7,6] \\ B_3 &= [10,9]. \end{align*} If we concatenate these sequences, we have a permutation of $[1,2,...,10]$ such that length of a longest increasing sequence is 3. We can make any such sequence by choosing exactly one number from each block. Thus, the number of such sequences is $5 \times 3 \times 2 = 30$ in this case.

Now imagine all the blocks are of the same size, say $k$. We want to maximize the number of sequences that we can encode with such blocks. Fix $N$. Then, we have about $N/k$ blocks that can represent $f_N(k) = k^{N/k}$ longest increasing sequences with this strategy. We have that \begin{equation} f_N^\prime(k) = N\exp\left(\frac{N}{k}\log k\right) \frac{1 - \log k}{k^2}, \end{equation} so $f$ is maximized when at $e$. Since $k$ must be an integer, we set $k = \lceil e \rceil = 3$. Thus, our block size is fixed to be size $3$.

Now, suppose that we write \begin{equation} K = d_0 + d_13 + d_23^2 + \cdots + d_n3^n = \sum_{j = 0}^nd_j3^j, \end{equation} where $d_n \neq 0$. Suppose that $K \geq 9$, too. Then, we have $n$ blocks, $B_1,B_2,\ldots,B_n$ of size 3. If $d_n = 2$, we actually let $B_1$ be of size $6$. In this way, by concatenating $B_1,B_2,\ldots,B_n$ we have $d_n3^n$ longest increasing sequences.

Now suppose that $d_j > 0$. We can add $d_j3^j$ sequences by inserting a sequence between $B_{n - j}$ and $B_{n - j + 1}$. If $d_j = 1$, we need to add an increasing sequence of length $n - j$ such that all the numbers are greater than those in $B_{n-j}$ but less than those in $B_{n - j + 1}$. If $d_j = 2$, we add an increasing sequence of length $n - j + 1$ with the same properties, but we transpose the first numbers. As an example, consider $K = 71 = 2 \cdot 3^0 + 2 \cdot 3^1 + 3^2 + 2 \cdot 3^3$. We'll have 3 blocks. Since no $d_j = 0$, we need to interleave a sequence between every block. Thus, our complete sequence will be $$ \overbrace{[14,13,12,11,10,9]}^{B_1} [8] \overbrace{[17,16,15]}^{B_2} [6,5,7] \overbrace{[20,19,18]}^{B_3} [2,1,3,4], $$ which gives exactly $K = 71$ longest increasing sequences.

When $K < 9$, we can simple just use the sequence $[K,K-1,\ldots, 1]$. Here's the code.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

using namespace std;

const int MAX_N = 100;

vector<int> makeSequence(int K) {
  if (K <= 8) { // subsequences of length 1
    vector<int> sequence(K);
    for (int i = 0; i < K; ++i) sequence[i] = K - i;
    return sequence;
  }
  int bitCount = 0;
  vector<int> base3K;
  do {
    base3K.push_back(K % 3);
    K /= 3;
  } while (K > 0);  
  /* maxBit this is the length of the longest increasing subsequence, 
   * K = d_0*3^0 + d_1*3^1 + ... + d_maxBit*3^maxBit
   */
  int maxBit = base3K.size() - 1;   
  int N = 3*maxBit;
  if (base3K.back() == 2) N += 3;
  for (int i = 0; i <= maxBit - 1; ++i) {
    if (base3K[i] > 0) {
      N += maxBit - i;
      if (base3K[i] == 2) ++N;
    }
  }
  vector<int> sequence(N);
  int currentIdx = 1;
  int endCursor = N;  
  for (int i = 0; i < maxBit; ++i) { // interleave to get other ternary
    if (base3K[i] > 0) {
      int j = endCursor - (maxBit - i);
      if (base3K[i] == 2) --j;
      for (; j < endCursor; ++j) {
        sequence[j] = currentIdx++;
      }
      endCursor -= maxBit - i;
      if (base3K[i] == 2) { // if digit is 2
        --endCursor;
        swap(sequence[endCursor], sequence[endCursor + 1]);
      }      
    }
    sequence[--endCursor] = N - 2;
    sequence[--endCursor] = N - 1;
    sequence[--endCursor] = N;
    N -= 3;
  }
  if (endCursor > 0) { // first digit in base 3 is 2    
    sequence[--endCursor] = N - 2; 
    sequence[--endCursor] = N - 1;
    sequence[--endCursor] = N;
    N -= 3;
    swap(sequence[0], sequence[3]);
    swap(sequence[1], sequence[4]);
    swap(sequence[2], sequence[5]);
  }
  return sequence;
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int T; cin >> T;
  for (int t = 0; t < T; ++t) {
    int K; cin >> K;
    vector<int> sequence = makeSequence(K);
    cout << sequence.size() << '\n';
    copy(sequence.begin(), sequence.end() - 1, ostream_iterator<int>(cout, " "));
    cout << sequence.back() << '\n'; 
  }
  cout << flush;
  return 0;
}

Chef and Super Array

Another problem from CodeChef is Chef and Super Array.

Chef has a an array $A$ consisting of $N$ elements. He wants to add some elements into the array as per the below mentioned process.

After each minute, Chef iterates over the array in order from left to right, and takes every two neighbouring pair of elements, say $x$ and $y$, he adds a new element $x + y$ in the middle of elements $x$ and $y$.

For example, if initial array $A = \{1, 6, 9\}$.

After first minute, the array A will be equal to $\{1, \mathbf{7}, 6, \mathbf{15}, 9\}$. Please note that the elements shown in the bold font are the newly added elements during first minute. As you can observe that $7 = 1 + 6$, and $15 = 6 + 9$.

After second minute, the array will be $\{1, \mathbf{8}, 7, \mathbf{13}, 6, \mathbf{21}, 15, \mathbf{24}, 9\}$. Once again, elements added during the second minute, are shown in bold.

Chef wants to know the sum of elements between $x$th and $y$th positions in the array $A$ (i.e. $A_x + A_{x + 1} + \cdots + A_y$) after $m$ minutes. As the answer could be large, output it modulo $10^9+7 = 1000000007$. Please note that we use $1$-based indexing in the problem.

The important thing to note in this problem is the recursive structure of it. Consider an array $A = \{A_1,A_2,\ldots,A_N\}$. Let $A_i = a$ and $A_{i+1} = b$. Let $S_{a,b}(k)$ be the sum of elements between $a$ and $b$ after $k$ steps of the process. Clearly, $S_{a,b}(0) = 0$. Now, suppose $c$ is between $a$ and $b$. After a step, c appears $3$ times, and we add an additional $a$ and $b$. For example, $\{a,c,b\}$ becomes $\{a, a + c, c, b + c, b\}$. Thus, $S_{a,b}(k + 1) = 3S_{a,b}(k) + (a + b)$. Since we can write $S_{a,b}(0) = (3^0 - 1)\frac{a + b}{2},$ we have that in general, \begin{equation} S_{a,b}(k) = \left(3^k - 1\right)\frac{a + b}{2}. \label{eqn:sum_between} \end{equation}

Now if we use $0$-indexing, element $i$ of array $A$ has index $i2^m$ after $m$ steps. Suppose that we want to sum elements up to index $x$. If $i2^m \leq x$, we can simply use Equation \ref{eqn:sum_between} with $a = A_{i-1}$ and $b = A_i$.

What about the case when $(i-1)2^m \leq x < i2^m$ in general? Let $a = A_{i-1}$, $b = A_i$ and $c = a + b$. Set $x^\prime = x - (i-1)2^m$. Let $T_{a,b,m,x^\prime}$ be the sum of elements with indices from $(i-1)2^m$ to $x$. We have that \begin{equation} T_{a,b,m,x^\prime} = \begin{cases} a, &x^\prime = 0 \\ a + \left(3^m - 1\right)\frac{a + b}{2}, &x^\prime = 2^m - 1 \\ T_{a,c,m - 1, x^\prime}, &x^\prime < 2^{m - 1} \\ a + \left(3^{m - 1} - 1\right)\frac{a + c}{2} + T_{a,c,m - 1, x^\prime - 2^{m - 1}}, &x^\prime \geq 2^{m - 1} \end{cases} \end{equation} since we can imagine that the process starts after $1$ step and reaches this point after $m - 1$ steps. Here's the code for this.

#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>

using namespace std;

const int MOD = 1000000007;
const int MAX_M = 30;

long long p2[MAX_M + 1];
long long p3[MAX_M + 1];

/* consider the subarray newA[k],...,newA[l] after m minutes
 * where newA[k] = a, newA[l] = b
 */
int subSum(int i, int a, int b, int m) {
  if (i == 0) return a;
  if (i == p2[m] - 1) return (a + (a + b)*(p3[m] - 1)/2) % MOD;
  if (i < p2[m - 1]) return subSum(i, a, a + b, m - 1);
  return (subSum(p2[m - 1] - 1, a, a + b, m - 1) + subSum(i - p2[m - 1], a + b, b, m - 1)) % MOD;
}

// find the sum of newA[0], newA[1],...,newA[i]
int cumulativeSum(long long i, const vector<int> &A, int m) {
  if (i < 0) return 0;
  int idx = 0;
  int S = 0;
  // in new array A[idx] is located at index idx*2^m
  while (idx*p2[m] <= i) { 
    S += subSum(min(p2[m] - 1, i - idx*p2[m]), A[idx], A[idx + 1], m);
    S %= MOD;
    ++idx;
  }
  return S;
}

int sum(const vector<int> &A, int m, long long x, long long y) {
  int S = cumulativeSum(y, A, m) - cumulativeSum(x - 1, A, m);
  S %= MOD;
  if (S < 0) S += MOD;
  return S;
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  // precompute powers of 2 and 3
  p2[0] = 1LL; p3[0] = 1LL;
  for (int i = 1; i <= MAX_M; ++i) {
    p2[i] = p2[i - 1]*2LL; p3[i] = p3[i - 1]*3LL;
  }
  // run through test cases
  int T; cin >> T;
  for (int t = 0; t < T; ++t) {
    int N, m;
    long long x, y;
    cin >> N >> m;
    cin >> x >> y;
    --x; --y; // 0-index
    vector<int> A; A.reserve(N + 1);
    for (int i = 0; i < N; ++i) {
      int a; cin >> a; A.push_back(a);
    }
    A.push_back(0); // dummy number padding
    cout << sum(A, m, x, y) << '\n';
  }
  cout << flush; 
  return 0;
}

Tags: codechef, math, algorithm, codeforces

Recently, I came across two problems that required the convex hull trick. Often, we have a set of lines, and given a time $t$, we want to find out which line has the maximal (or minimal value). To make this more concrete, here's the first problem in which I came across this technique, Cyclist Race.

Chef has been invited to be a judge for a cycling race.

You may think about cyclists in the race as points on a line. All the cyclists start at the same point with an initial speed of zero. All of them move in the same direction. Some cyclists may force their cycles to speed up. The change of speed takes place immediately. The rest of the time, they move with a constant speed. There are N cyclists in total, numbered from 1 to N.

In order to award points to the participants, Chef wants to know how far the cyclists have reached at certain points of time. Corresponding to the above two, there will be two types of queries.

Change the speed of cyclist i at some time.

Ask for the position of the race leader at some time.

Return the results of the second type of query.

Now, it's not too hard to see that the distance traveled by each cyclist is a piecewise linear function. For each query of the second type, we could just evaluate all these peicewise linear functions and take the max, but there's a better way.

In this particular problem, the speed is always increasing. So if you look at the thick lines in the title picture that indicate which cyclist is in the lead, it forms the bottom of a convex hull, hence the name, the convex hull trick. The distance of the lead cyclist is also piecewise linear, so the goal becomes to merge the piecewise linear functions of all the cyclist into one.

Essentially, we'll want to create a vector $\left((x_0, l_0), (x_1,l_1),\ldots,(x_{n-1},l_{n-1})\right)$, where $l_i$ is a line, and $x_i$ is the $x$-coordinate at which the line becomes relavant. That is, line $l_i$ has the maximal value on the interval $\left[x_i, x_{i+1}\right]$, where $x_n = \infty$. In this way, if we sort our vector by $x_i$, then to find the maximal value at $x$, we can do a binary search.

To build this vector, we to first have all our lines sorted in ascending order by slope. We iterate through our lines, but we only want to keep some of them. Let us call our convex hull $C$. Initialize $C = \left((-\infty, l_0)\right)$. Now, suppose we encouter line $l$ and we need to make a decision. Recall that the slope $l$ will need be greater than or equal to any line line in $C$. There are several cases.

If $C$ has at least two lines, consider the previous line $l^\prime$ and the line before $l^\prime$, $l^{\prime\prime}$. $l^\prime$ becomes relevant at the intersection with $l^{\prime\prime}$ at say $x^\prime$. If $l$ intersects $l^{\prime\prime}$ at $x$ and $x \leq x^\prime$ $l$ becomes relevant before $l^\prime$, so we remove $l^\prime$. Repeat until we remove as many unnecessary lines as possible. Then, if append to our vector $(x,l)$ where $x$ is the $x$-coordinate of the intersection of $l$ and $l^\prime$.
If $C$ has only one line and $l$ has the same slope as that line, then only keep the line with the bigger $y$-intercept.
If $C$ has only one line and $l$ has a larger slope, then append to our vector $(x,l)$ where $x$ is the $x$-coordinate of the $l$ and the one line in $C$.

See the title picture for an example. We'd initialize with the green line $\left(-\infty, y = 0.1x\right)$. Now, we're in case 3, so we'd add the blue line $(0, y = 0.2x)$ since the two lines intersect at $(0,0)$. For the thick red line, we find ourselves in case 1. We'll pop off the blue line and find ourselves in case 3 again, so now our vector is $C = \left((-\infty, y= 0.1x),(0,y=x/3)\right)$.

The next line we encounter is the green line $y = 0.5x - 0.8$. We're in case 1, but we can't pop off any lines since its intersection with other two lines is far to the right, so we simply append this line and its intersection with the thick red line, so we have $$C = \left((-\infty, y= 0.1x), (0,y=x/3), (4.8,y=0.5x-0.8)\right).$$ Our next line is thick blue line, $y = 2x/3 - 1.4$. This intersects the thick red line at $x = 4.2$, while the thick pervious green line intersects at $x =4.8$, so we'll pop off the green line $y = 0.5x-0.5$, and push the thick blue line on to the stack and get $C = \left((-\infty, y= 0.1x), (0,y=x/3), (4.2,y=2x/3-1.4)\right)$. Finally, we encounter the thick green line and our convex hull is $$ C = \left((-\infty, y= 0.1x), (0,y=x/3), (4.2,y=2x/3-1.4),(5.4, y=2x-8.6)\right). $$

Hopefully, it's clear that the natural data structure to keep track of the lines is a vector, which we'll use as a stack. Then, adding new lines is just a constant time operation. The most time-consuming operation is the initial sorting of the lines, so constructing the convex hull is $O(N\log N)$, where $N$ is the number of lines. Evaluating the distance at some $x$ is $O(\log N)$ using binary search as we discussed earlier. If we have $M$, queries the total running time will be $O\left((N + M)\log N \right)$.

Here's the code for this problem.

#include <algorithm>
#include <iostream>
#include <utility>
#include <vector>

using namespace std;

class Line {
private:
  long double m, b; // y = mx + b
public:
  Line(long double m, long double b) : m(m), b(b) {}
  long double& slope() { return m; }
  long double& yIntercept() { return b; }  
  pair<long double, long double> intersect(Line other) {
    long double x = (this -> b - other.yIntercept())/(other.slope() - this -> m);
    long double y = (this -> m)*x + this -> b;
    return make_pair(x, y);
  }
  double getY(long double x) { return m*x + b; }
};

vector<pair<long double, Line>> makeConvexHull(vector<Line> &lines) {
  int N = lines.size(); 
  vector<pair<long double, Line>> convexHull; convexHull.reserve(N);
  if (N == 0) return convexHull;
  convexHull.emplace_back(0, lines.front());
  for (int i = 1; i < N; ++i) {
    // pop stack while new line's interesection with second to last line is to the left of last line
    // note that slopes are strictly increasing
    while (convexHull.size() > 1 &&
           lines[i].intersect(convexHull[convexHull.size() - 2].second).first <= convexHull.back().first) {
      convexHull.pop_back();
    }
    // check intersection with x = 0 when there's only 1 line
    if (convexHull.size() == 1 && lines[i].yIntercept() >= convexHull.back().first) convexHull.pop_back();
    convexHull.emplace_back(lines[i].intersect(convexHull.back().second).first, lines[i]);
  }
  return convexHull;
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int N, Q; cin >> N >> Q;      // number of cyclists and queries
  // current speed, time change, distance traveled so far
  vector<Line> lines; // y = mx + b, first is m, second is b
  lines.emplace_back(0,0);
  vector<pair<int, long long>> cyclists(N, make_pair(0, 0)); // first is speed v, second is x, where x + vt is location at time t
  vector<long long> queryTimes;
  for (int q = 0; q < Q; ++q) {
    int queryType; cin >> queryType;
    long long time;
    switch(queryType) {
    case 1:                     // speed change
      int cyclist, newSpeed;
      cin >> time >> cyclist >> newSpeed;
      --cyclist;   // convert to 0 indexing
      // suppose current function is x + vt
      // new function is x + v*time + (t - time)*newSpeed
      // (x + v*time - time*newSpeed) + newSpeed*t
      // = (x + (v-newSpeed)*time) + newSpeed*t
      cyclists[cyclist].second += time*(cyclists[cyclist].first - newSpeed);
      cyclists[cyclist].first = newSpeed;      
      lines.emplace_back(newSpeed, cyclists[cyclist].second);
      break;
    case 2:                     // leader position
      cin >> time;
      queryTimes.push_back(time);
      break;
    }
  }
  // sort slopes in ascending order
  sort(lines.begin(), lines.end(),
       [](Line &a, Line &b) -> bool {
         if (a.slope() != b.slope()) return a.slope() < b.slope();
         return a.yIntercept() < b.yIntercept();
       }); 
  if (lines.size() == 1) {      // there will always be at least 1 line
    for (long long t : queryTimes) {
      cout << (long long) lines.back().getY(t) << '\n';
    }
  } else {
    // eliminate irrelevant lines, first is time where the line becomes relevant
    vector<pair<long double, Line>> convexHull = makeConvexHull(lines); 
    // since times are strictly increasing just walk through lines 1 by 1
    int cursor = 0;
    for (long long t : queryTimes) {
      while (cursor + 1 < convexHull.size() && convexHull[cursor + 1].first <= t) ++cursor;
      cout << (long long) convexHull[cursor].second.getY(t) << '\n';
    }
  }
  cout << flush;
  return 0;
}

In this particular problem, negative times make no sense, so we can start at $x = 0$ for our convex hull.

Applications to Dynamic Programming

In certain cases, we can apply this trick to a dynamic programming problem to significantly improve the running time. Consider the problem Levels and Regions.

Radewoosh is playing a computer game. There are $N$ levels, numbered $1$ through $N$. Levels are divided into $K$ regions (groups). Each region contains some positive number of consecutive levels.

The game repeats the the following process:

If all regions are beaten then the game ends immediately. Otherwise, the system finds the first region with at least one non-beaten level. Let $X$ denote this region.

The system creates an empty bag for tokens. Each token will represent one level and there may be many tokens representing the same level.

For each already beaten level $i$ in the region $X$, the system adds $t_i$ tokens to the bag (tokens representing the $i$-th level).

Let $j$ denote the first non-beaten level in the region $X$. The system adds $t_j$ tokens to the bag.

Finally, the system takes a uniformly random token from the bag and a player starts the level represented by the token. A player spends one hour and beats the level, even if he has already beaten it in the past.

Given $N$, $K$ and values $t_1,t_2,\ldots,t_N$, your task is to split levels into regions. Each level must belong to exactly one region, and each region must contain non-empty consecutive set of levels. What is the minimum possible expected number of hours required to finish the game?

It's not obvious at all how the convex hull trick should apply here. Well, at least it wasn't obvious to me. Let's do some math first. Consider the case where we only have 1 region first consisting of levels $1,2,\ldots,n$. Let $T_i$ be the time at which we finish level $i$. Write \begin{equation} T_n = T_1 + (T_2 - T_1) + (T_3 - T_2) + \cdots + (T_n - T_{n-1}). \label{eqn:expectation_Tn} \end{equation}

$T_1 = 1$ since we'll just put $t_i$ tokens in the bag and draw from those $t_i$ tokens, so $t_i/t_i = 1$, so we always play and beat the first level immediately. Now $T_{i} - T_{i-1}$ is the time that it takes to be level $i$ given that levels $1,2,\ldots,i-1$ are beaten. This has a geometric distribution. Every time we try to beat level $i$, we'll put in $t_1 + t_2 + \cdots + t_i$ tokens in the back and try to get one of the $t_i$ tokens labeled $i$. The probability of doing so is \begin{equation} p = \frac{t_i}{\sum_{j=1}^i t_j}. \end{equation} Thus, we'll have that \begin{align} \mathbb{E}(T_i - T_{i-1}) &= p + 2(1-p)p + 3(1-p)p + \cdots = \sum_{k=1}^\infty k(1-p)^{k-1}p \nonumber\\ &= \frac{p}{\left(1-(1-p)\right)^2} = \frac{1}{p} \nonumber\\ &= \frac{\sum_{j=1}^i t_j}{t_i}. \label{eqn:expectation_delta_T} \end{align}

Now by linearity of expectation, applying Equation \ref{eqn:expectation_delta_T} to Equation \ref{eqn:expectation_Tn} will give us that \begin{equation} \mathbb{E}T_n = \sum_{i = 1}^n\frac{\sum_{j=1}^i t_j}{t_i}. \label{eqn:expection_T_n} \end{equation}

Now, define $E_{k,n}$ denote the minimum expected time to beat $n$ levels if we split them into $k$ regions. Note that each region must have at least 1 level, so this is only well-defined if $n \geq k$. Now, the levels must be beaten in order, so imagine putting levels $t_l,t_{l+1},\ldots,t_n$ into the last region. Since each region must have at least 1 level, we'll have that $k \leq l \leq n$, which gives us the recurrence relation \begin{equation} E_{k,n} = \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{\sum_{j=l}^i t_j}{t_i}\right\} \label{eqn:expectation_recurrence} \end{equation} by Equation \ref{eqn:expection_T_n}.

Now, suppose we define \begin{equation} S_k = \sum_{i=1}^k t_i ~\text{and}~ R_k = \sum_{i=1}^k \frac{1}{t_i}, \label{eqn:sum_dp} \end{equation} which leads to \begin{equation} E_{1,n} = \mathbb{E}T_n = \sum_{i=1}^n\frac{S_i}{t_i} \label{eqn:base_expectation} \end{equation} Equation \ref{eqn:expectation_recurrence} becomes \begin{align} E_{k,n} &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{\sum_{j=l}^i t_j}{t_i}\right\} \nonumber\\ &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{S_i - S_{l-1}}{t_i}\right\} \nonumber\\ &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{S_i}{t_i} - S_{l-1}\left(R_n - R_{l-1}\right)\right\} \nonumber\\ &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \left(E_{1,n} - E_{1,l-1}\right) - S_{l-1}\left(R_n - R_{l-1}\right)\right\} \nonumber\\ &= E_{1,n} + \inf_{k \leq l \leq n}\left\{\left(E_{k - 1, l - 1} - E_{1,l-1} + S_{l-1}R_{l-1}\right) - S_{l-1}R_n\right\} \label{eqn:expectation_line} \end{align} by Equations \ref{eqn:base_expectation} and \ref{eqn:sum_dp}.

Do you see the lines of decreasing slope in Equation \ref{eqn:expectation_line}, yet? It's okay if you don't. Look at the expression inside the $\inf$. The $y$-intercept is in parentheses and the slope is $-S_{l-1}$ which is decreasing with $l$. So index our lines by $l$.

Fix $k$. Imagine that we have calculated $E_{j,n}$ for all $j < k$. Now, we're going to attempt to calculate $E_{k,k},E_{k,k+1},\ldots, E_{k,N}$ in that order. To calculate $E_{k,n}$ we only need to consider the lines $l = k,\ldots,n$. The intercept does not vary as a function of $n$, so we can add lines one-by-one. Before calculating $E_k,n$, we'll add the line with slope $-S_{n-1}$. Now our answer will be $E_{K,N}$.

In this way, it simplifies a $O(KN^2)$ problem into a $O(KN\log N)$ problem. Notice that here, instead of building our convex hull before making any queries like in the previous problem, we dynamically update it. Here's the code with some differences since $0$-indexing was used in the code.

#include <algorithm>
#include <iomanip>
#include <iostream>
#include <utility>
#include <vector>

using namespace std;

class Line {
private:
  long double m, b; // y = mx + b
public:
  Line(long double m, long double b) : m(m), b(b) {}
  long double slope() { return m; }
  long double& yIntercept() { return b; }  
  pair<long double, long double> intersect(Line other) {
    long double x = (this -> b - other.yIntercept())/(other.slope() - this -> m);
    long double y = (this -> m)*x + this -> b;
    return make_pair(x, y);
  }
  double getY(long double x) { return m*x + b; }
};

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int N, K; cin >> N >> K; // number of levels and regions
  vector<int> T; T.reserve(N);  // tokens
  for (int i = 0; i < N; ++i) {
    int t; cin >> t;
    T.push_back(t);
  }
  vector<long long> S; S.reserve(N); // cumulative sum of tokens
  vector<long double> R; R.reserve(N); // cumulative sum of token reciprocals
  S.push_back(T.front());
  R.push_back(1.0/T.front());
  for (int n = 1; n < N; ++n) {
    S.push_back(S.back() + T[n]);
    R.push_back(R.back() + 1.0L/T[n]);
  }
  /* let eV[k][n] be the minimum expected time to beat
   * levels 0,1,...,n-1 spread across regions 0,1,...,k-1
   */ 
  vector<vector<long double>> eV(K, vector<long double>(N)); // only indices eV[k][n] with n >= k are valid
  eV.front().front() = 1;
  for (int n = 1; n < N; ++n) { // time to beat each level is a geometric distribution
    eV[0][n] = eV[0][n-1] + ((long double) S[n])/T[n];
  }
  /* eV[k][n] = min(eV[k-1][l-1] + (S[l]-S[l-1])/t_l + ... + (S[n] - S[l-1])/t_{n}),
   * where we vary k <= l < n. That is, we're putting everything with index at least l
   * in the last region. Note that each region needs at least 1 level. This simplifes to
   * eV[k][n] = min(eV[k-1][l-1] + eV[0][n] - eV[0][l-1]  - S[l-1](R[n] - R[l-1])
   *          = eV[0][n] + min{(eV[k-1][l-1] - eV[0][l-1] + S[l-1]R[l-1]) - S[l-1]R[n]},
   * Thus, for each l we have a line with slope -S[l - 1].
   * -S[l-1] is strictly decreasing and R[n] is strictly increasing.
   * Use the convex hull trick.
   */ 
  vector<pair<long double, Line>> convexHull; convexHull.reserve(N);
  for (int k = 1; k < K; ++k) {    
    /* in convex hull we add lines in order of decreasing slope, 
     * the the convexHull[i].first is the x-coordinate where 
     * convexHull[i].second and convexHull[i-1].second intersect
     */
    convexHull.clear();
    int cursor = 0;
    for (int n = k; n < N; ++n) { // calculate eV[k][n]
      /* add lines l = k,...,n to build convex hull
       * loop invariant is that lines l = k,...,n-1 have been added, so just add line l = n
       */
      long double slope = -S[n - 1];
      long double yIntercept = eV[k - 1][n - 1] - eV[0][n - 1] + S[n - 1]*R[n - 1];
      Line l(slope, yIntercept);
      while (convexHull.size() > 1 && 
             convexHull.back().first >= l.intersect(convexHull[convexHull.size() - 2].second).first) {
        convexHull.pop_back(); // get rid of irrelevant lines by checking that line intersection is to the left
      }
      // check intesection with x = 0 if no lines left
      if (convexHull.size() == 1 && l.yIntercept() <= convexHull.back().second.yIntercept()) convexHull.pop_back();
      convexHull.emplace_back(convexHull.empty() ? 0 : l.intersect(convexHull.back().second).first, l); // add line
      /* binary search for the correct line since they are sorted by x intersections
       * lower bound is old cursor since x coordinate of intersection is monotonically decreasing
       * and R[n] is increasing, too
       */ 
      if (cursor >= convexHull.size()) cursor = convexHull.size() - 1;
      cursor = upper_bound(convexHull.begin() + cursor, convexHull.end(), R[n], 
                           [](long double x, pair<long double, Line> l) { return x < l.first; }) - convexHull.begin();
      --cursor; // since we actually want the last index that is less than or equal to
      eV[k][n] = eV[0][n] + convexHull[cursor].second.getY(R[n]);

    }
  }
  cout << fixed;
  cout << setprecision(10);
  cout << eV.back().back() << endl;
  return 0;
}

By the way the drawing was done with GeoGebra. It's a pretty cool way to do math.

Tags: dynamic programming, math, algorithm, probability, codeforces, codechef

One of the features of Snapstream Searcher is matrix search. One can search for a whole list of terms over a date range and receive a matrix of co-occurrences, where a co-occurrence is defined as two terms being mentioned in the same program within a specified number of characters.

One way to visualize such data is as a graph. Each country is a node. Between each pair of nodes, we place an edge which is weighted according to the strength of their relationship. We'd suspect that countries that frequently co-occur will form clusters.

Spring Embedding

To accomplish this clustering, I used spring embedding. Suppose we have $N$ nodes, labeled from $1$ to $N$. Between nodes $i$ and $j$, we place a string of length $l_{ij}$ with spring constant $k_{ij}$. Recall that Hooke's law states that the force needed to stretch or compress the spring to a length $d$ is $F(d) = k_{ij}(d - l_{ij})$, which implies that spring has potential energy $$ E(d) = \frac{1}{2}k_{ij}(d-l_{ij})^2. $$ Suppose each node $i$ has position $(x_i,y_i)$ and node $j$ has position $(x_j,y_j)$. Define the distance between two nodes $$ d(i,j) = \sqrt{(x_j-x_i)^2 + (y_j-y_i)^2}. $$ The total energy of the system is then \begin{equation} E = \frac{1}{2}\sum_{i=1}^{N-1}\sum_{j=i+1}^{N}k_{ij}\left(d(i,j) - l_{ij}\right)^2 \end{equation} If we call $l_{ij}$ the ideal length of the spring, deviations from the ideal length lead to higher energy. We want to choose $(x_i, y_i)$ for all $i = 1,2,\ldots,N$ such that the total energy is minimized. We can do this with a steepest gradient descent method. Much to my surprise, I actually used something from my Numerical Analysis class.

Implementation Issues

To actually implement this algorithm a couple of issues must be addressed:

computing the $l_{ij}$ as a function of occurrences and co-occurrences, and
normalizing $l_{ij}$ so the nodes fit on the canvas.

For the first issue, if there are a lot of co-occurrences, we want the nodes to be more closely linked. But nodes that are mentioned frequently like USA would have the most co-occurrences with every other node by chance since it appears most frequently in general. To address this issue some normalization is done. Let $S$ be the total number of occurrences of all search terms. Let $R_i$ be the number of occurrences of term $i$ and $C_j$ the number of occurrences of term $j$. Finally, let $M_{ij}$ be the number of co-occurrences of terms $i$ and $j$. We define \begin{equation} A_{ij} = \frac{\left(M_{ij}/S\right)}{\left(R_i/S\right)\left(C_j/S\right)} = \frac{SM_{ij}}{R_iC_j}, \end{equation} which you can intuitively think of as the proportion of co-occurrences we actaully observed over the number of co-occurrences that we would expect if the occurrences were independent of each other.

Now, since more co-occurrences than expected should lead to a smaller ideal distance, we define \begin{equation} l_{ij} = \frac{1}{c + A_{ij}}, \end{equation} where we chose $c = 0.01$.

Note that $l_{ij} \in (0,1/c)$ which is between $0$ and $10$ in the case that $c = 0.01.$ To plot this we need to translate this distance into pixels. We don't want the minimum distance to be too small because than the nodes will be on top of each other. Nor do we want the max distance to be too big since the nodes will fall off the canvas. We apply an affine transformation from $l_{ij}$ to $[L, U].$ Let $l^* = \max_{i,j}l_{ij}$ and $l_* = \min_{i,j}l_{ij},$ and define \begin{equation} l_{ij}^\prime = L + \frac{l_{ij} - l_*}{l^* - l_*}(U - L). \end{equation} I simply chose to fix $L = 80.$ To choose $U$, I used a technique that I learned from programming contests. We want to vary $U$ until all the nodes fit in the canvas. Running the spring embedding algorithm is very expensive however, so we can't try all possible values of $U$. Thus, we do a binary search and find the largest $U$ such that all the nodes fit. This solves the second issue.

You can this algorithm implemented at Country Relationships. Play around with the graph, and let me know what you think!

Analysis

If you look at Country Relationships, right away you see a cluster of Middle Eastern countries with Russia, France, Belgium, and Israel as bridge to the United States. China also places a central role and is closely related to Vietnam and Japan.

If you change the time period to December 2015 and click Spring Embed again to re-cluster, inexplicably the Philippines and Colombia are strongly related. Recall that Steve Harvey mixed up the winners of Miss Universe 2015.

In January 2016, North Korea appears, for it claimed to test a hydrogen bomb. Brazil grows larger with as the fear of the Zika virus takes grip.

In February 2016, Cuba becomes larger due to President Obama announcing that he'll visit. Brazil also gets even larger as Zika fears intensify.

In March 2016, Belgium becomes very large due to the terrorist attack. Of course, Ireland makes its debut thanks to St. Patrick's Day.

In April 2016, Ecuador appears due to the earthquake. It so happens that Japan had earthquakes, too. News programs often group earthquake reporting together, so Ecuador and Japan appear to be closely related.

Try making your own graph by doing a matrix search here!

Tags: math, data viz

I recently added equation numbering to my blog. To see it in action, here's one of my homework exercises. I like this problem because I actually found it useful to diagonalize to exponentiate the matrix.

Let $\{X_n\}$ be the random walk on the space $\{0,1,2,3,4\},$ with $0$ and $4$ absorbing and with $p(i,i) = 1/2,$ $p(i,i+1) = p(i,i-1)= 1/4$ for $1 \leq i \leq 3.$ Let $\tau$ denote the absorption time. The graph can be seen in the title picture.

Part A

What is the limit as $n \rightarrow \infty$ of the law $X_n$ conditioned on non-absorption, that is, $$\lim_{n \rightarrow \infty}\mathcal{L}(X_n \mid \tau > n)?$$

Solution

We have that $\displaystyle \boxed{ \lim_{n \rightarrow \infty}\mathcal{L}\left(X_n \mid \tau > n\right) = \begin{cases} 1 - \frac{1}{\sqrt{2}}, &X_n \in \{1,3\} \\ \sqrt{2} - 1, &X_n = 2. \end{cases}}$

To see this, without loss of generality, let $X_0 = 2.$ We can do this since if $X_0 = 1$ or $X_0 = 3,$ then $\mathbb{P}(X_m = X_0,~\forall m \leq n \mid \tau > n) \rightarrow 0$ as $n \rightarrow \infty.$ Then, we can apply the Markov property.

Now, define $p_n^*(x,y) = \mathbb{P}(X_{n} = y \mid \tau > n,~X_0 = 2).$ We have that
\begin{equation} p^*_1(2,y) = \begin{cases} 1/4, &y \in \{1,3\} \\ 1/2, &y = 2. \end{cases} ~\text{and, in general,}~ p_n^*(2,y) = \begin{cases} a_n, &y \in \{1,3\} \\ b_n = 1 - 2a_n, &y = 2 \end{cases} \label{eqn:3a_a_def} \end{equation} by symmetry, where $a_1 = 1/4$ and $b_1 = 1/2 = 1 - 2a_1.$

Now, we have that \begin{align} a_{n+1} = \mathbb{P}(X_{n+1} = 1 \mid \tau > n + 1,~X_0 = 2) &= p_{n+1}^*(2,1) \nonumber\\ &= \frac{p_n^*(2,1)p(1,1) + p_n^*(2,2)p(2,1)}{\mathbb{P}(\tau > n + 1 \mid X_0 = 2, \tau > n)} \nonumber\\ &= \frac{p_n^*(2,1)p(1,1) + p_n^*(2,2)p(2,1)}{1 - p_n^*(2,1)p(1,0) - p_n^*(2,3)p(3,4)} \nonumber\\ &= \frac{a_n(1/2) + b_n(1/4)}{1 - (2a_n/4)} \nonumber\\ &= \frac{a_n(1/2) + (1-2a_n)(1/4)}{(4 - 2a_n)/4} \nonumber\\ &= \frac{1}{4 - 2a_n}. \label{eqn:3a_a_recurs} \end{align} $a_n$ converges by induction because $$ |a_{n+1} - a_n| = \left|\frac{1}{4-2a_n} - \frac{1}{4-a_{n-1}}\right| = \left|\frac{4 - 2a_{n-1} - 4 + 2a_n}{(4-2a_{n})(4-2a_{n-1})}\right| \leq \frac{1}{2}\left|a_n-a_{n-1}\right| $$ since $0 \leq a_n \leq 1$ for all $n.$ Solving for $a$ in \begin{equation} a = \frac{1}{4-2a} \Rightarrow 2a^2 - 4a + 1 = 0 \Rightarrow a = 1 \pm \frac{1}{\sqrt{2}}. \label{eqn:3a_a} \end{equation}
Thus, we must have that $a_n \rightarrow 1 - \frac{1}{\sqrt{2}}$ since $a_n \leq 1$ for all $n.$ Then, we have that $b_n \rightarrow b = 1 - 2a = \sqrt{2} - 1.$

Part B

What is the limiting distribution conditioned on never getting absorbed, that is, $$ \lim_{n \rightarrow \infty}\lim_{M \rightarrow \infty} \mathcal{L}(X_n \mid \tau > M)? $$

Solution

We have that $\displaystyle\boxed{\lim_{n \rightarrow \infty}\lim_{M \rightarrow \infty}\mathcal{L}(X_n \mid \tau > M)= \begin{cases}1/4, &X_n \in \{1,3\}\\ 1/2, &X_n = 2.\end{cases}}$

Again, we can assume without loss of generality that $X_0 = 2.$ Fix $n \in \mathbb{N}.$ From Equation \ref{eqn:3a_a_def} in the previous part, we know $\mathbb{P}(X_n = 2\mid \tau > n) = 1-2a_{n},$ where we define $a_0 = 0.$ Now, suppose, we know $\mathbb{P}(X_n = 2 \mid \tau > M)$ for $M \geq n.$ Then, by Bayes' theorem, \begin{align} \mathbb{P}(X_n = 2 \mid \tau > M + 1) &= \mathbb{P}(X_n = 2 \mid \tau > M + 1, \tau > M) \nonumber\\ &= \frac{\mathbb{P}(\tau > M + 1 \mid X_n = 2, \tau > M)\mathbb{P}(X_n = 2 \mid \tau > M)}{\mathbb{P}(\tau > M + 1 \mid \tau > M)}. \label{eqn:3b_bayes} \end{align} Calculating the two unknown factors, we have that in the denominator, \begin{align} \mathbb{P}(\tau > M + 1 \mid \tau > M) &= \sum_{x=1}^3 \mathbb{P}(\tau > M + 1, X_M = x \mid \tau > M) \nonumber\\ &= \frac{3}{4}\left(\mathbb{P}(X_M = 1 \mid \tau > M) + \mathbb{P}(X_M = 3 \mid \tau > M)\right) + \mathbb{P}(X_M = 2 \mid \tau > M)\nonumber\\ &= \frac{3}{4}\left(a_{M} + a_{M}\right) + (1-2a_{M}) \nonumber\\ &= 1 - \frac{1}{2}a_{M}, \label{eqn:3b_tau} \end{align} and in the numerator, \begin{align} \mathbb{P}(\tau > M + 1 \mid X_n = 2, \tau > M) &= \sum_{x=1}^3\mathbb{P}(\tau > M + 1, X_M = x \mid X_n = 2, \tau > M)\nonumber\\ &= \sum_{x=1}^3\mathbb{P}(\tau > M + 1, X_M = x \mid X_n = 2, \tau > M)\nonumber\\ &= \frac{3}{2}\mathbb{P}(X_M = 1 \mid X_n = 2, \tau > M) + \mathbb{P}(X_M = 2 \mid X_n = 2, \tau > M) \nonumber\\ &= \frac{3}{2}a_{M - n} + 1 - 2a_{M - n} \nonumber\\ &= 1 - \frac{1}{2}a_{M - n}, \label{eqn:3b_bayes_num} \end{align} where we use that $\mathbb{P}(\tau > M + 1, X_M = 1 \mid X_n = 2, \tau > M) = \mathbb{P}(\tau > M + 1, X_M = 3 \mid X_n = 2, \tau > M),$ and the fact that $\mathbb{P}(X_M = x \mid X_n = 2, \tau > M)$ is the probability of a chain starting at $2$ that doesn't hit an absorbing state in $M - n$ transistions.

Putting together Equations \ref{eqn:3b_bayes_num}, \ref{eqn:3b_tau}, and \ref{eqn:3b_bayes}, we have that \begin{equation*} \mathbb{P}(X_n = 2 \mid \tau > M + 1) = \frac{1 - \frac{1}{2}a_{M-n}}{1 - \frac{1}{2}a_{M}}\mathbb{P}(X_n = 2 \mid \tau > M), \end{equation*} and by induction, we'll have that \begin{equation*} \mathbb{P}(X_n = 2 \mid \tau > M) = (1 - 2a_{n}) \prod_{k=n}^{M - 1} \frac{1 - \frac{1}{2}a_{k-n}}{1 - \frac{1}{2}a_{k}}, \end{equation*} where the product is just $1 - 2a_{n}$ if $M = n.$ For large enough $M,$ when $k \geq 2n$ the factors in the numerator will cancel, so we will have that for $M \geq 2n$ that \begin{align} \mathbb{P}(X_n = 2 \mid \tau > M) &= (1 - 2a_{n}) \prod_{k=0}^{n - 1}\left(1-\frac{1}{2}a_{k}\right) \prod_{k=M - n}^{M-1}\frac{1}{1-\frac{1}{2}a_{k}} \nonumber\\ &= (1 - 2a_{n}) \prod_{k=0}^{n - 1}\left(4-2a_{k}\right) \prod_{k=M - n}^{M-1}\frac{1}{4-2a_{k}} \nonumber\\ &= (1 - 2a_{n}) \prod_{k=1}^{n}\frac{1}{a_{k}} \prod_{k=M - n + 1}^{M}a_k \label{eqn:3b_p2} \end{align} by the recursive definition of $a_n$ in Equation \ref{eqn:3a_a_recurs}.

Taking the limit as $M \rightarrow \infty,$ we have that \begin{equation} \lim_{M \rightarrow \infty} \mathbb{P}(X_n = 2 \mid \tau > M) = (1 - 2a_{n})a^{n}\prod_{k=1}^{n}\frac{1}{a_{k}}. \label{eqn:3b_p_lim} \end{equation} by Equation \ref{eqn:3a_a}, where $\displaystyle a = 1 - \frac{1}{\sqrt{2}} = \frac{1}{2 + \sqrt{2}}.$

Define $c_{-1} = 0$ and $c_0 = 1/4,$ so $a_0 = 0 = \frac{c_{-1}}{c_0}.$ In general, we can write $a_n = \frac{c_{n-1}}{c_n},$ where $c_n = -2c_{n-2} + 4c_{n-1}$ for $k \geq 1.$ To see this, by definition of $a_n$ in Equation \ref{eqn:3a_a_recurs}, \begin{equation} a_{n + 1} = \frac{1}{4 - 2a_n} = \frac{1}{4 - 2\frac{c_{n-1}}{c_{n}}} = \frac{c_n}{-2c_{n-1} + 4c_n} = \frac{c_n}{c_{n+1}}. \label{eqn:3b_a_c} \end{equation}

Now, with Equation \ref{eqn:3b_a_c}, Equation \ref{eqn:3b_p_lim} becomes \begin{equation} \lim_{M \rightarrow \infty} \mathbb{P}(X_n = 2 \mid \tau > M) = (1 - 2a_{n})a^{n}\prod_{k=1}^{n}\frac{c_k}{c_{k-1}} = (1 - 2a_{n})a^{n}\frac{c_{n}}{c_0} \label{eqn:3b_p_lim_2} \end{equation}

Note, that we can rewrite $c_n$ by exponentiating a matrix: \begin{align} \begin{pmatrix} c_{n} \\ c_{n + 1} \end{pmatrix} &= \begin{pmatrix} 0 & 1 \\ -2 & 4 \end{pmatrix}^n \begin{pmatrix} c_0 \\ c_1 \end{pmatrix} \nonumber\\ &= \begin{pmatrix} 1 & 1 \\ 2 + \sqrt{2} & 2 - \sqrt{2} \end{pmatrix} \begin{pmatrix} 2 + \sqrt{2} & 0 \\ 0 & 2 - \sqrt{2} \end{pmatrix}^n \begin{pmatrix} \frac{1}{2}-\frac{1}{\sqrt{2}} & \frac{1}{2\sqrt{2}} \\ \frac{1}{2}+\frac{1}{\sqrt{2}} & -\frac{1}{2\sqrt{2}} \end{pmatrix} \begin{pmatrix} 1/4 \\ 1 \end{pmatrix} \nonumber\\ &= \begin{pmatrix} \frac{1}{8}\left((2+\sqrt{2})^n(1+\sqrt{2}) + (2-\sqrt{2})^n(1-\sqrt{2})\right) \\ \frac{1}{8}\left((2+\sqrt{2})^{n+1}(1+\sqrt{2}) + (2-\sqrt{2})^{n+1}(1-\sqrt{2})\right) \end{pmatrix} \label{eqn:3b_c_matrix} \end{align} by diagonalization. Using Equation \ref{eqn:3b_c_matrix}, Equation \ref{eqn:3b_p_lim_2} becomes \begin{equation} \lim_{M \rightarrow \infty} \mathbb{P}(X_n = 2 \mid \tau > M) = \frac{1}{2}(1 - 2a_{n})\left((1+\sqrt{2}) + \left(\frac{2-\sqrt{2}}{2+\sqrt{2}}\right)^n(1-\sqrt{2})\right), \label{eqn:3b_p_lim_3} \end{equation} where we recall that $a = \frac{1}{2 + \sqrt{2}}.$ $|2-\sqrt{2}| < 1$ and so, taking the limit as $n \rightarrow \infty$ of Equation \ref{eqn:3b_p_lim_3}, the second term goes to $0$: \begin{equation} \lim_{n\rightarrow 0}\lim_{M \rightarrow \infty} \mathbb{P}(X_n = 2 \mid \tau > M) = \frac{1-2a}{2}(1+\sqrt{2}) = \frac{\sqrt{2} - 1}{2}(1+\sqrt{2}) = \frac{1}{2}. \label{eqn:3b_p_lim_lim} \end{equation} And finally, by symmetry, \begin{align} \lim_{n\rightarrow 0}\lim_{M \rightarrow \infty} \mathbb{P}(X_n = 1 \mid \tau > M) &= \lim_{n\rightarrow 0}\lim_{M \rightarrow \infty} \mathbb{P}(X_n = 3 \mid \tau > M) \nonumber\\ &= \frac{1 - \lim_{n\rightarrow 0}\lim_{M \rightarrow \infty} \mathbb{P}(X_n = 2 \mid \tau > M)}{2} \nonumber\\ &= \frac{1}{4}. \label{eqn:3b_p_1_3} \end{align} Equations \ref{eqn:3b_p_lim_lim} and \ref{eqn:3b_p_1_3} combine to give us the limiting distribution.

Graphs in $\LaTeX$

Here's the code for the graph. I used a package called tikz.

\documentclass[10pt]{article}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}

\usepackage{tikz}
\usetikzlibrary{arrows}

\usepackage[active, tightpage]{preview}
\PreviewEnvironment{tikzpicture}

\renewcommand\PreviewBbAdjust {-0bp -25bp 0bp 25bp}

\begin{document}
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=1.75cm,
  thick,main node/.style={circle,draw}]

  \node[main node] (0) {$0$};
  \node[main node] (1) [right of=0] {$1$};
  \node[main node] (2) [right of=1] {$2$};
  \node[main node] (3) [right of=2] {$3$};
  \node[main node] (4) [right of=3] {$4$};

  \path[every node/.style={font=\sffamily}]
  (0) edge [loop above] node {$1$} (0)
  (0)
  (1) edge [bend right] node [below] {$1/4$} (2)
  edge [loop above] node {$1/2$} (1)
  edge node [below] {$1/4$} (0)
  (1)
  (2) edge [bend right] node [above] {$1/4$} (1)
  edge [loop below] node {$1/2$} (2)
  edge [bend left] node [above] {$1/4$} (3)
  (2)
  (3) edge [bend left] node [below] {$1/4$} (2)
  edge [loop above] node {$1/2$} (3)
  edge node [below] {$1/4$} (4)
  (3)
  (4) edge [loop above] node {$1$} (4);      
\end{tikzpicture}
\end{document}

Then, to convert the graph to a png file with a transparent background, I used ImageMagick.

convert -density 300 graph.pdf -format png graph.png

Tags: math, meta, probability

After getting a perfect score in the first round, I crashed and burned in round 2, and only managed to get 1 problem out of 4 correct. For what it's worth, my solution on the 2nd problem got over 80% of the test cases right. For the 3rd problem, I managed to finish a correct solution 10 minutes after the contest was over. As punishment for such a dismal performance, I'm forcing myself to write up the solutions to the problems of this round.

Boomerang Decoration

This problem was the easiest as it was the only one that I got full credit for. Here's a link to the problem statement, Boomerang Decoration.

I employed a pretty common strategy. Basically, there are $N$ spots to paint. Let us index them $0,1,\ldots,N-1.$ Now, consider a pivot point $p.$ We will paint everything $i < p$ from the left using a prefix. We will paint everything $i \geq p$ from the right using a suffix. Let $L(p)$ be the number of prefix paintings we need if we pivot at point $p.$ Let $R(p)$ be the number of suffix paintings we need if we pivot at point $p.$ Then, we have that our solution is $$\min_{p \in \{0,1,\ldots,N\}} \max\left(L(p), R(p)\right),$$ so we just need to compute $L(p)$ and $R(p)$, which we can do with a recurrence relation and dynamic programming.

Let $x_0,x_1,\ldots,x_{N-1}$ be the initial left half of the boomerang. Let $y_0,y_1,\ldots,y_{N-1}$ be the right half of the boomerang that we're trying transform the left side into. Let $L^*(p)$ be the number of blocks we've seen so far, where a block is defined as a contiguous sequence of letters. Clearly, $L(0) = L^*(p) = 0$ since we're not painting anything in that case. Then, for $p = 1,2,\ldots,N$, \begin{equation} L^*(p) = \begin{cases} 1 &\text{if}~p=1 \\ L^*(p - 1) &\text{if}~x_{p-1} = x_{p-2} \\ L^*(p - 1) + 1 &\text{if}~x_{p-1} \neq x_{p-2}, \end{cases} ~\text{and}~ L(p) = \begin{cases} L(p-1) &\text{if}~x_{p-1} = y_{p-1} \\ L^*(p) &\text{if}~x_{p-1} \neq y_{p-1}. \end{cases} \end{equation} since if the letters match, there is no need to paint, and if they don't we only need to paint once for each block.

Similarly, we define $R^*(p)$ as the number of blocks seen from the right. $R(N) = R^*(N) = 0$ since the $N$th index doesn't actually exist. Then, for $p = N-1,N-2,\ldots,0$, \begin{equation} R^*(p) = \begin{cases} 1 &\text{if}~p=N-1 \\ R^*(p + 1) &\text{if}~x_{p} = x_{p+1} \\ R^*(p + 1) + 1 &\text{if}~x_{p} \neq x_{p+1}, \end{cases} ~\text{and}~ R(p) = \begin{cases} R(p+1) &\text{if}~x_{p} = y_{p} \\ R^*(p) &\text{if}~x_{p} \neq y_{p}. \end{cases} \end{equation}

Thus, our run time is $O(N)$. Here's the code that implements this idea.

#include <algorithm>
#include <climits>
#include <iostream>
#include <string>
#include <vector>

using namespace std;

int countSteps(const string &left, const string &right) {
  int N = left.length();
  vector<int> leftSteps; leftSteps.reserve(N + 1);
  leftSteps.push_back(0);
  int leftBlocks = 0;
  for (int i = 0; i < N; ++i) {
    if (i == 0 || right[i] != right[i - 1]) ++leftBlocks;
    if (left[i] == right[i]) {
      leftSteps.push_back(leftSteps.back());
    } else {
      leftSteps.push_back(leftBlocks);
    }
  }
  vector<int> rightSteps(N + 1, 0);
  int rightBlocks = 0;
  for (int i = N - 1; i >= 0; --i) {
    if (i == N - 1 || right[i] != right[i + 1]) ++rightBlocks;
    if (left[i] == right[i]) {
      rightSteps[i] = rightSteps[i + 1];
    } else {
      rightSteps[i] = rightBlocks;
    }
  } 
  int minSteps = INT_MAX;
  for (int i = 0; i <= N; ++i) { 
    // paint everything strictly to the left, paint everything to right including i
    minSteps = min(minSteps, max(leftSteps[i], rightSteps[i]));
  }
  return minSteps;  
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int T; cin >> T;
  for (int t = 1; t <= T; ++t) {
    int N; cin >> N;
    string left, right;
    cin >> left >> right;
    cout << "Case #" << t << ": "
         << countSteps(left, right)
         << '\n';
  }
  cout << flush;
  return 0;  
}

Carnival Coins

This problem is a probability problem that also makes use of dynamic programming and a recurrence relation. Here's the problem statement, Carnival Coins. I probably spent too long worrying about precision and trying to find a closed-form solution.

In any case, for this problem, given $N$ coins, we need to calculate the binomial distribution for all $n = 0,1,2,\ldots,N$ with probability $p$. Fix $p \in [0,1]$. Let $X_{n,k}$ be the probability $\mathbb{P}(X_n = k),$ where $X_n \sim \operatorname{Binomial}(n,p)$, that is, it is the number of heads if we flip $n$ coins. We use a similar idea to counting an unordered set of $k$ objects from $n$ objects without replacement in Counting Various Things.

Clearly, $\mathbb{P}(X_{n,k}) = 0$ if $k > n$. Also, $\mathbb{P}(X_{0,0}) = 1$. Now let $n \geq 1.$ Consider the $n$th coin. It's heads with probability $p$ and tails with probability $p - 1$, so for $k = 0,1,\ldots,n$, we have that \begin{equation} X_{n,k} = \begin{cases} (1-p)X_{n-1,0} &\text{if}~k=0 \\ (1-p)X_{n-1,k} + pX_{n-1,k-1} &\text{if}~k=1,2,\ldots,n-1 \\ pX_{n-1,n-1} &\text{if}~k=n \end{cases} \end{equation} since if we flip tails, we must have $k$ heads in the first $n-1$ coins, and if we flip heads, we must have $k - 1$ heads in the first $n$ coins.

Now, the problem states that we win if we get more that $K$ coins, too, so we really need the tail distribution. Define $Y_{n,k} = \mathbb{P}(X_n \geq k)$. Then, $Y_{n,n} = X_{n,n}$ since we can't have more than $n$ heads, and for $k = 0,1,\ldots,n-1$, \begin{equation} Y_{n,k} = X_{n,k} + Y_{n,k+1}. \end{equation}

We can compute this all in $O(N^2)$ time. I was hesistant to do this calculate since $p$ is a double, and I was afraid of the loss of precision, but it turns out using a long double table works.

Now, suppose we want to maximize expected value with $N$ coins. We can play all $N$ coins at once. Then, our probability of winning is $Y_{N,K}.$ Our second option is to break up our coins into two groups of size say $m$ and $N-m$. These two groups may further be broken up into more groups. Suppose we know the optimal strategy for $n = 1,2,\ldots,N-1$ coins. Let $E[n]$ be the maximum expected value when playing with $n$ coins. The maximum expected value of playing with the two groups, $m$ coins and $N-m$ coins, is $E[m] + E[N-m]$ by linearity of expectation.

This strategy only makes sense if both of the groups are of size at least $K$. Clearly, $E[K] = Y_{K,K} = X_{K,K}.$ Then, for all $n = 0,1,2, \ldots, N,$ we have \begin{equation} E[n] = \begin{cases} 0, &\text{if}~n < K \\ Y_{K,K}, &\text{if}~n = K \\ \max\left(Y_{n,K}, \sup\left\{E[m] + E[n-m] : m = K,K+1,\ldots,\lfloor n/2\rfloor\right\}\right), &\text{if}~n = K + 1,\ldots,N. \end{cases} \end{equation}

Our solution is $E[N]$. Since $N \geq K$, running time is $O(N^2)$. Here's the code.

#include <iostream>
#include <iomanip>
#include <vector>

using namespace std;

long double P[3001][3001];      // pre allocate memory for probability

double computeExpectedPrizes(int N, int K, double p) {
  // P[n][k] = P(X_n >= k), where X_n ~ Binomial(n, p)
  // first calculate P(X_n = k)
  P[0][0] = 1;
  P[1][0] = 1 - p; P[1][1] = p;
  for (int n = 2; n <= N; ++n) {
    P[n][0] = P[n-1][0]*(1-p);
    for (int k = 1; k < N; ++k) {
      // probability of hitting k when nth coin is heads and nth coin is tails
      P[n][k] = p*P[n-1][k-1] + (1-p)*P[n-1][k];
    }
    P[n][n] = P[n-1][n-1]*p;
  }
  // make cumulative
  for (int n = 1; n <= N; ++n) {
    for (int k = n - 1; k >= 0; --k) P[n][k] += P[n][k+1];
  }

  vector<long double> maxExpectedValue(N + 1, 0); // maxExpectedValue[n] is max expected value for n coins
  // two cases: all coins in 1 group or coins in more than 1 group
  for (int n = 0; n <= N; ++n) maxExpectedValue[n] = P[n][K]; // put all the coins in 1 group
  for (int n = 1; n <= N; ++n) {
    // loop invariant is that we know maxExpectedValue for 0,...,n - 1
    for (int m = K; m <= n/2; ++m) { // just do half by symmetry
      // split coins into two parts, play separately with each part
      maxExpectedValue[n] = max(maxExpectedValue[n], 
                                maxExpectedValue[m] + maxExpectedValue[n - m]);
    }
  }  
  return maxExpectedValue.back();
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int T; cin >> T;
  cout << fixed;
  cout << setprecision(9);
  for (int t = 1; t <= T; ++t) {
    int N, K;                   // coins and goal
    cin >> N >> K;
    double p; cin >> p;         // probability of coin landing heads
    cout << "Case #" << t << ": "
         << computeExpectedPrizes(N, K, p)
         << '\n';
  }
  cout << flush;
  return 0;
}

Snakes and Ladders

This problem was the one I finished 10 minutes after the contest ended. I had everything right, but for some reason, I got stuck on deriving a fairly simple recurrence relation in the last 10 minutes. Here's the problem statement, Snakes and Ladders.

A couple of key insights must be made here.

Since a snake occurs between ladders of the same height, so group them by height.
Taller ladders obstruct snakes, so process the ladders by descending height, and store obstructions as we go.
Deal with the ladders in unobstructed blocks, so sort them by position to put ladders in contiguous blocks

Now, the cost of feeding a snake is the square of its length, which makes processing each unobstructed block a little bit tricky. This is where I got stuck during the contest. The naive way is to compute all the pairwise distances and square them. This isn't fast enough. Here's a better method.

Let $x_1,x_2,\ldots,x_N$ be the position of our ladders, such that $x_1 \leq x_2 \leq \cdots \leq x_N$. Now, for $n \geq 2,$ let $$ C_n = \sum_{k=1}^{n-1} (x_n - x_k)^2 ~\text{and}~ S_n = \sum_{k=1}^{n-1} (x_n - x_k) ,$$ so the total cost of feeding this blocks is $C = \sum_{n=2}^N C_n$. We have that \begin{align*} C_n &= \sum_{k=1}^{n-1} (x_n - x_k)^2 = \sum_{k=1}^{n-1}\left((x_n - x_{n-1}) + (x_{n-1} - x_k)\right)^2 \\ &= \sum_{k=1}^{n-1}\left[(x_n - x_{n-1})^2 + 2(x_n-x_{n-1})(x_{n-1} - x_k) + (x_{n-1}-x_k)^2\right]\\ &= C_{n-1} + (n-1)(x_n - x_{n-1})^2 + 2(x_n-x_{n-1})\sum_{k=1}^{n-1}(x_{n-1} - x_k)\\ &= C_{n-1} + (n-1)(x_n - x_{n-1})^2 + 2(x_n-x_{n-1})S_{n-1} \end{align*} since the last term drops out the summation when $k = n - 1$. Then, we can update $S_n = S_{n-1} + (n-1)(x_n - x_{n-1}).$ We let $C_1 = S_1 = 0.$ Thus, we can compute $C_n$ in $O(1)$ time if we already know $C_{n-1}.$

Since we only look at each ladder once, the biggest cost is sorting, so the running time is $O(N\log N)$, where $N$ is the number of ladders. Here's the code.

#include <algorithm>
#include <iostream>
#include <map>
#include <set>
#include <vector>

using namespace std;

const int MOD = 1000000007;

int computeFeedingCost(const map<int, vector<int>> &ladders) {  
  set<int> blockEnds; blockEnds.insert(1000000000); // block delimiter
  long long cost = 0;
  // go through heights in decreasing order
  for (map<int, vector<int>>::const_reverse_iterator hIt = ladders.crbegin(); hIt != ladders.crend(); ++hIt) { 
    int currentLadder = 0;    
    int N = (hIt -> second).size(); // number of ladders at this height
    for (int blockEnd : blockEnds) {  // go block by block, where blocks are delimited by blockEnds vector
      int blockStart = currentLadder; // remember the beginning of the block
      long long xSum = 0;
      long long xSquaredSum = 0;
      while (currentLadder < N && (hIt -> second)[currentLadder] <= blockEnd) {
        if (currentLadder > blockStart) {
          // difference in position from this ladder to previous ladder
          long long xDiff = (hIt -> second)[currentLadder] - (hIt -> second)[currentLadder-1]; 
          xSquaredSum += (currentLadder - blockStart)*(xDiff*xDiff) + 2*xDiff*xSum; xSquaredSum %= MOD;
          xSum += (currentLadder - blockStart)*xDiff; xSum %= MOD;
          cost += xSquaredSum; cost %= MOD; 
        }
        if ((hIt -> second)[currentLadder] == blockEnd) {
          break;                // start next block from this ladder
        } else {
          ++currentLadder;
        }
      }
    }
    for (int newBlockEnd : hIt -> second) blockEnds.insert(newBlockEnd);
  }
  return cost;
}

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int T; cin >> T;
  for (int t = 1; t <= T; ++t) {
    int N;                      // number of ladders
    cin >> N;
    map<int, vector<int>> ladders; // ladders by height
    for (int n = 0; n < N; ++n) {
      int x, h; cin >> x >> h;  // ladder position and height
      ladders[h].push_back(x);
    }
    for (map<int, vector<int>>::iterator it = ladders.begin(); it != ladders.end(); ++it) {
      // within each height sort by position
      sort((it -> second).begin(), (it -> second).end());
    }
    cout << "Case #" << t << ": "
         << computeFeedingCost(ladders)
         << '\n';
  }
  cout << flush;
  return 0;
}

Costly Labels

This problem is much more involved. I'll write about it in a separate post, Assignment Problem and the Hungarian Method.

Tags: competitive programming, algorithm, math

In probability, math contests, and programming contests, we often need to count. Here, I'll write about a few cases that I see pretty often. Before we jump into things, recall the binomial coefficient and various ways of calculating it: $$ {n \choose k} = \frac{n!}{n!(n-k)!} = {n - 1 \choose k } + {n - 1 \choose k - 1 }, $$ where $n \geq k$ and is $0$ if $n < k.$ Thus, we compute the binomial coefficient with dynamic programming by using a triangular array: \begin{matrix} 1 \\ 1 & 1 \\ 1 & 2 & 1\\ 1 & 3 & 3 & 1\\ 1 & 4 & 6 & 4 & 1\\ \vdots & \vdots & \vdots & \ddots & \ddots & \ddots \end{matrix} where if we define $$ X_{n,k} = \begin{cases} {n \choose k}, &k \leq n \\ 0, &k > n, \end{cases} $$ then ${n \choose k} = X_{n,k} = X_{n-1,k} + X_{n-1,k-1}.$

We will see that we can count many things in this manner.

From $n$ objects with replacement

Some cases when we draw from $n$ objects with replacement are an infinite deck of cards, assigning types or categories, or drawing from $\{1,2,3,4,5,6\}$ by rolling a dice.

Ordered set of $k$ objects

If we are sampling $k$ objects from $n$ objects with replacement, each of the $k$ objects has $n$ possibilities. Thus, there are are $\boxed{n^k}$ possibilities.

For example, if $k$ distinct people are buying from a selection of $n$ ice cream flavors, there are $n^k$ different ways, this group of people can order. Another common situation is a sequence of $k$ bits. In this case $n = 2,$ so there are $2^k$ possible sequences.

Unordered set of $k$ objects

Another way to think of this is putting $k$ balls into $n$ bins, where the balls are not distinct. The method for solving this problem is also known as stars and bars.

Imagine each ball as a star, so we have $k$ of them. Along with the stars we have $n - 1$ bars. Now arrange these $(n-1) + k$ objects in any order. For example, take $$\star\star \mid \mid \star \mid \star\star\star.$$ This order would correspond to $2$ balls in the bin $1$, $0$ balls in bin $2$, $1$ ball in bin $3$, and $3$ balls in bin $4$. Thus, we have $(n-1 + k)!$ orderings if the objects were distinct. Since the $k$ balls and $n-1$ bars are identical, we divide by $(n-1)!$ and $k!$. Thus, the number of possible sets is $$ \frac{(n-1 + k)!}{(n-1)!k!} = \boxed{{n + k - 1\choose k}.} $$

From $n$ objects without replacement

This scenario common occurs where we have a finite collection of objects such as a deck of cards.

Unordered set of $k$ objects

We might see this situation when counting the number of $5$-card hands in poker for instance. The order that you draw the cards doesn't matter.

If we have $n$ cards, for the first card there are $n$ possibilities. For the next card, there are $n-1$ possibilities. Thus, if we draw $k$ cards, we have $n(n-1)\cdots(n-k+1) = n!/(n-k)!$ possible draws. Since the order doesn't matter, we divide by $k!.$ Thus, the count is $$\frac{n!}{(n-k)!k!} = \boxed{{n \choose k}.}$$

This makes the formula $${n \choose k} = {n-1 \choose k} + {n - 1 \choose k -1}$$ for computing the binomial coefficient intuitive. Imagine trying to choose $k$ objects from $n$ objects. We can either include or not include the $n$th object. If we don't include the $n$th object we choose $k$ objects from the first $n-1$ objects, which gives us the ${n-1 \choose k}$ term. If we do include the $n$th object then, we only choose $k-1$ objects from the first $n-1$ objects, which gives us the ${n-1 \choose k - 1}$ term.

Another common use of the of binomial coefficient is counting paths. Suppose we are on a grid, we can only make right and down moves, and we are trying to get from $A$ to $B$, where $B$ is $k$ moves to the right and $l$ moves downward. Our set of $n = k + l$ objects is $\{1, 2,\ldots,n\}.$ We choose $k$ indices to make a right move, and the rest of the moves will be down. Then, the number of paths is ${n \choose k}.$

Ordered set of $k$ objects

In this case, we care about the order of the cards we draw. From the discussion above, it's calculated the same way as an unordered set of $k$ objects except we don't divide by $k!$, so the number of possible draws is $$ n(n-1)\cdots(n-k+1) = \boxed{\frac{n!}{(n-k)!} = (n)_k,} $$ where the I have used the Pochhammer symbol to denote the falling factorial.

Recontres Numbers

These count the number of permutations with a certain number of fixed points. A permutation on $n$ elements is an element of the symmetric group $S_n$. For those without a background in algebra, it is essentially a way of rearranging $n$ objects. From our discussion of above, there are $n!$ ways to do so. For example, $$\sigma = \begin{pmatrix} 1 & 2 & 3 & 4\\ 3 & 2 & 4 & 1\\ \end{pmatrix}$$ reorders $(1,2,3,4)$ as $(3,2,4,1)$. $\sigma$ can be thought of as a function $\sigma(1) = 3$, $\sigma(2) = 2$, $\sigma(3) = 4$, and $\sigma(4) = 1$. Since for only $x = 2$ do we have $\sigma(x) = x$, there is only $1$ fixed point.

Now let $D_{n,k}$ be the number of permutations of $n$ objects with $k$ fixed points. Let $D_{0,0} = 1$. Clearly, $D_{1,0} = 0$ since $\sigma(1) = 1$ is the only possible permutation for $n = 1.$ Then, we have the recursive formula for $k > 0$ $$ D_{n,k} = {n \choose k}D_{n-k,0}, $$ which can be thought of as taking an unordered set of $k$ points to be fixed from $\{1,2,\ldots,n\}$, hence the ${n \choose k}$. For the remaining $n - k$ points, we have no fixed points because we want exactly $k$ fixed points.

So if we know $D_{0,0}, D_{1,0},\ldots,D_{n,0}$, we can calculate $D_{n+1,1}, D_{n+1,2},\ldots, D_{n+1,n+1}$. Then, for $k = 0$, since there are $(n+1)!$ total permutations, we have that $$ D_{n+1,0} = (n+1)! - \sum_{k=1}^{n+1}D_{n+1,k}. $$ There's a better way to calculate $D_{n,0}$, though, which I learned at CTY. These permutations with no fixed points are called derangements. Clearly, $D_{0,0} = 1$ and $D_{1,0} = 0$.

Now assume $n \geq 2$. Focus on element $n$. A permutation can be thought of as disjoint cycles. Recall the notation in abstract algebra, where we may write $$\sigma = \begin{pmatrix}1 & 2 & 3\end{pmatrix}\begin{pmatrix}4 & 5\end{pmatrix}\in S_5,$$ which gives us $2$ cycles, one of length $3$ and the other of length $2$. A cycle of length $1$ is a fixed point, so $n$ is part of a cycle of length $2$ or more than $2$. In the case that $n$ is part of a cycle of length $2$, there are $n-1$ options for the other element in the cycle. The number of ways to permute the remaining elements is $D_{n-2,0}$. In the case that $n$ is part of a cycle of length greater than $2$, we can consider permuting the first $n-1$ elements with no fixed points. For each such permutation, we have $n - 1$ elements after which we can insert element $n$, so it becomes part of an existing cycle. In this way, we have that $$ D_{n,0} = (n-1)\left(D_{n-1,0} + D_{n-2,0}\right). $$

Again, we have a triangular array \begin{matrix} 1 \\ 0 & 1 \\ 1 & 0 & 1\\ 2 & 3 & 0 & 1\\ 9 & 8 & 6 & 0 & 1\\ \vdots & \vdots & \vdots & \ddots & \ddots & \ddots \end{matrix}

One helpful way to visualize this process that I like is to imagine a dance class with $n$ couples. After each dance everyone has to find a new partner. There are $D_{n,k}$ ways that $k$ couples stay the same.

Bell Numbers

The Bell numbers count the ways to partition a set. Consider the set $S = \{1,2,3\}$. The possible nonempty subsets are $\{1\}$, $\{2\}$, $\{3\}$, $\{1,2\}$, $\{2,3\}$, $\{1,3\}$, and $\{1,2,3\}$. A partition would be a group of disjoint nonempty subsets such that each element of $S$ is an element of some subset in the partition. Thus, our partitions are $\left\{\{a\},\{b\},\{c\}\right\}$, $\left\{\{a\},\{b,c\}\right\}$, $\left\{\{a, c\},\{b\}\right\}$, $\left\{\{a, b\},\{c\}\right\}$, and $\left\{\{a, b,c\}\right\}$.

Let $B_n$ be number of ways to partition a set of $n$ objects. $B_0 = 1$, $B_1 = 1$, $B_2 = 2$ and $B_3 = 5$ for example. In general to calculate $B_{n+1}$, we have the recurrence relation $$ \boxed{B_{n+1} = \sum_{k=0}^n{n \choose k}B_{n-k} = \sum_{k=0}^n{n \choose k}B_{k}} $$ since ${n \choose k} = {n \choose n-k}$. To see this, consider partitions of the set $\{1,2,\ldots,n,n+1\}$. In the partition there is a subset that contains $n+1$, say $S$. We can have $|S| = k + 1 \in \{1,2,\ldots,n,n+1\}$. Clearly, $n+1 \in S$. Choosing the other $k$ elements of $S$ amounts to selecting an unordered set from $\{1,2,\ldots,n\}$, hence the ${n \choose k}$ factor in each term. For the remaining $n + 1 - (k+1) = n - k$ objects there are $B_{n-k}$ ways to partition them. Thus, we have the terms ${n \choose k}B_{n-k}$. We avoid double counting since the partitions corresponding to each term are disjoint because $n+1$ is in a subset of different size.

Catalan Numbers

Consider strings of $n$ $0$s and $n$ $1$s, so the string has length $2n$. Call this set $\Omega^{2n}$. Let $C_n$ be the number of such strings where no initial substring has more $1$s than $0$s. Formally, $$C_n = \left|\left\{ X = (x_1,x_2,\ldots,x_{2n}) \in \{0,1\}^{2n} : \sum_{i=1}^{2n} x_i = n, \sum_{i=1}^k x_i \leq \frac{k}{2}, \forall k \in \{1,2,\ldots,2n\}\right\}\right|.$$ $C_n$ is the $n$th Catalan number.

To calculate these numbers first note that $C_0 = 1$ and that every $X \in \Omega^{2(n+1)}$ for $n \geq 1$ can be written $$X = (0,X_1,1,X_2),$$ where $X_1$ and $X_2$ are elements of of $\Omega^{2k}$ and $\Omega^{2(n-k)}$, respectively for some $k \in \{0,1,\ldots,n\}$. Such a form is unique. To see this, let $X = (x_1,x_2,\ldots,x_{2n},x_{2n+1},x_{2n+2})$. Note that by the defintion, the first number in the string must be a $0$. Since the total numbers of $0$s and $1$s in the sequence must be equal, there exists an even index $j$ such that $\sum_{i=1}^j x_i = j/2$. Fix $j$ to be the smallest such index. We must have that $x_j = 1$ since otherwise the defintion would have been violated as $\sum_{i=1}^{j-1}x_i = j/2 > (j-1)/2$.

Then, we'll have $$X = (x_1 = 0,X_1,x_j = 1,X_2),$$ where $X_1$ is a string of length $2k = j-2$ and $X_2$ has length $2n + 2 - j = 2n-2k$. We show that $X_1 \in \Omega^{2k}$ and $X_2 \in \Omega^{2(n-k)}$. Since there are an equal number of $0$s and $1$s at index $j$, $X_1$ must have an equal number of $0$s and $1$s. If at any point $1 \leq l \leq 2k$, we have that $\sum_{i=2}^{l + 1}x_i > l/2$ then $\sum_{i=1}^{l+1}x_i \geq (l+1)/2$, which implies that $X \not\in \Omega^{2(n+1)}$ or that there is an index smaller than $j$ such that the initial substring has an equal number of $0$s and $1$s since $l+1 \leq j-1$. Both are a contradiction so we have $X_1 \in \Omega^{2k}$. Showing $X_2 \in \Omega^{2(n-k)}$ is similar. We have that $X_2$ must have an equal number of $0$s and $1$s in order for the whole string to have an equal number of $0$s and $1$s. If for any $1 \leq l \leq 2(n-k)$, we have that $\sum_{i=j+1}^{j+l}x_i > l/2$, then $$ \sum_{i=1}^{j+l}x_i = \sum_{i=1}^{j}x_i + \sum_{i=j+1}^{j+l}x_i = \frac{j}{2} + \sum_{i=j+1}^{j+l}x_i > \frac{j}{2} + \frac{l}{2} = \frac{j+l}{2}, $$ which implies that $X \not\in \Omega^{2(n+1)}$, which is a contradiction.

Thus, we have our desired result that $X = (x_1 = 0,X_1,x_j = 1,X_2)$, where $X_1 \in \Omega^{2k}$ and $X_2 \in \Omega^{2(n-k)}$, where $k \in \{0,1,\ldots,n\}$. Varying $k$, we come upon the recurrence relation $$\boxed{C_{n+1} = \sum_{k=0}^nC_kC_{n-k}.}$$ This is a pretty nice solution, but we can actually do better and find a closed-form solution.

Consider the generating function $$ c(x) = \sum_{n=0}^\infty C_nx^n = 1 + \sum_{n=1}^\infty C_nx^n = 1 + x\sum_{n=0}^\infty C_{n+1}x^n. $$ Substituting in the recurrence relation, for $C_{n+1}$, we have that \begin{align*} c(x) &= 1 + x\sum_{n=0}^\infty C_{n+1}x^n = 1 + x\sum_{n=0}^\infty \sum_{k=0}^nC_kC_{n-k}x^n \\ &= 1 + x\sum_{n=0}^\infty x^n\sum_{k=0}^nC_kC_{n-k} = 1 + x\left[\sum_{n=0}^\infty C_n x^n\right]^2 \\ &= 1 + x[c(x)]^2. \end{align*} Solving for $c(x)$ with the quadratic formula, we find that $$ c(x) = \frac{1 \pm \sqrt{1-4x}}{2x} = \frac{2}{1\mp\sqrt{1-4x}}. $$ Since $c(0) = 1$, $$\displaystyle c(x) = \frac{1 - \sqrt{1-4x}}{2x} = \frac{1}{2x}\left(1 - \sqrt{1-4x}\right).$$

Consider the Taylor series of $f(y) = \sqrt{1+y}.$ By induction, $$f^{(n)}(y) = (-1)^{n+1}\frac{\prod_{k=0}^{n-1}(2k-1)}{2^n}(1+y)^{-(2n-1)/2} \Rightarrow f^{(n)}(0) = (-1)^{n+1}\frac{\prod_{k=0}^{n-1}(2k-1)}{2^n}.$$ Moreover, \begin{align*} f^{(n)}(0) &= (-1)^{n+1}\frac{\prod_{k=0}^{n-1}(2k-1)}{2^n} = (-1)^{n+1}\frac{\prod_{k=0}^{n-1}(2k-1)}{2^n}\cdot \frac{2^n n!(2n-1)}{2^n n!(2n-1)} \\ &= \frac{(-1)^{n+1}}{4^n(2n-1)}\frac{(2n)!}{n!}. \end{align*}

Thus, we have that $$ f(y) = \sqrt{1+y} = 1 + \sum_{n=1}^\infty \frac{(-1)^{n+1}}{4^n(2n-1)}\frac{(2n)!}{n!n!}y^n, $$ so we have \begin{align*} c(x) &= \frac{1}{2x}(1 + f(-4x)) = \frac{1}{2x}\left(\sum_{n=1}^\infty \frac{(-1)^{n}}{4^n(2n-1)}\frac{(2n)!}{n!n!}(-4x)^n\right) \\ &= \frac{1}{2x}\left(\sum_{n=1}^\infty \frac{(-1)^{2n}}{(2n-1)}\frac{(2n)!}{n!n!}x^n\right) = \sum_{n=1}^\infty \frac{1}{2n(2n-1)}\frac{(2n)!}{(n-1)!n!}x^{n-1} \\ &= \sum_{n=1}^\infty \frac{1}{n}\frac{(2n-2)!}{(n-1)!(n-1)!}x^{n-1} = \sum_{n=1}^\infty \frac{1}{n}{2(n-1) \choose n-1 }x^{n-1} \\ &= \sum_{n=0}^\infty \frac{1}{n+1}{2n \choose n}x^{n} = \sum_{n=0}^\infty C_nx^{n}, \end{align*} so $\displaystyle \boxed{C_n = \frac{1}{n+1}{2n \choose n}.}$

Tags: probability, combinatorics, math

One of the more interesting problems that I've come across recently is to calculate the distribution of the last time a simple random walk is at $0.$

Let $X_1,X_2,\ldots,X_{2n}$ be independent, indentically distributed random variables such that $P(X_i = 1) = P(X_i = -1) = 1/2.$ Define $S_k = \sum_{i=1}^k X_i.$ Then, we have a path $$(0,0) \rightarrow (1,S_1) \rightarrow (2,S_2) \rightarrow \cdots \rightarrow (2n,S_{2n}).$$ Define the random variable $L_{2n} = \sup\{ k \leq 2n : S_k = 0\}.$ We want the distribution of $L_{2n}.$

Note that we have that \begin{equation} \mathbb{P}(S_{2n} = 0) = 2^{-2n}{2n \choose n} \end{equation} since we have $n$ positive steps and $n$ negative steps.

Let $\displaystyle N_{n,x} ={n \choose (n+x)/2}$ denote the number of paths from $(0,0)$ to $(n,x)$ since $(n+x)/2$ positive steps implies there are $(n-x)/2$ negative steps, and $(n+x)/2 - (n-x)/2 = x$. Note that $n + x$ must be even for this to be well defined. If $n + x$ is not even, then $N_{n,x} = 0$ since $x$ must have the same parity as $n.$ First, we prove the reflection principle.

Reflection Principle

If $x,y > 0,$ the number of paths that from $(0,x)$ to $(n,y)$ that are $0$ at some time, that is, they touch the $x$-axis is equal to the total number of paths from $(0,-x)$ to $(n,y),$ which is $N_{n,y+x}.$ Therefore, the number of paths from $(0,x)$ to $(n,y)$ that do not touch $0$ is $N_{n,|y-x|} - N_{n,y+x}.$

We can establish a one-to-one correspondence between the set $A$, the paths from $(0,x)$ to $(n,y)$ that are $0$ at some time and the set $B$ the paths from $(0,-x)$ to $(n,y)$.

Consider any path $P$ in $A$. $P$ must include the point $(m,0),$ where $0 < m < n$. Fix $m$ to be the greatest such integer. We construct a path $Q_1$ from $(0,-x)$ to $(m,0)$ by going in the opposite direction as $P.$ We construct a path $Q_2$ from $(m,0)$ to $(n,y)$ by mirroring $P$. Thus, we have that $Q = Q_1 \cup Q_2 \in B.$

Now consider any path $Q$ in $B$. Since paths are continuous, $Q$ must cross the $x$-axis, so $Q$ includes a point $(m,0)$, where $0 < m < n.$ Fix $m$ to be the greatest such integer. We construct $P_1,$ a path from $(0,x)$ to $(m,0)$ by going in the opposite direction as $Q$. We construct $P_2$ by mirroring $Q$. Thus, we have that $P = P_1 \cup P_2 \in A.$

So, we have established a one-to-one correspondence, and therefore, we have proven $|A| = |B|.$

Symmetry of Zeroes

$\mathbb{P}(S_1 \neq 0, S_2 \neq 0,\ldots,S_{2n} \neq 0) = \mathbb{P}(S_{2n} = 0).$

First note that $$ \mathbb{P}(S_1 \neq 0,\ldots,S_{2n} \neq 0) = \mathbb{P}(S_1 > 0,\ldots,S_{2n} > 0) + \mathbb{P}(S_1 < 0,\ldots,S_{2n} < 0) $$ since we can never have the path touch $0.$ Also note that the two terms are equal, so \begin{equation} \mathbb{P}(S_1 \neq 0, S_2 \neq 0,\ldots,S_{2n} \neq 0) = 2\mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} > 0). \end{equation} Now, note that \begin{equation} \mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} > 0) = \sum_{r=1}^{n}\mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} = 2r) \end{equation} since we have taken an even number of steps.

To calculate $\mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} = 2r),$ we note that $X_1 = 1$ since $S_1 > 0$. Then, the number of paths from $(1,1)$ to $(2n,2r)$ that do not touch $0$ by the Reflection Principle is $N_{2n-1,2r-1} - N_{2n-1,2r+1}$. Thus, \begin{align*} \mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} = 2r) &= \left(\frac{1}{2}\right)\left(\frac{1}{2^{2n-1}}\right)\left(N_{2n-1,2r-1} - N_{2n-1,2r+1}\right) \\ &= \left(\frac{1}{2^{2n}}\right)\left(N_{2n-1,2r-1} - N_{2n-1,2r+1}\right). \end{align*}

So, we have that \begin{align*} \mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} > 0) &= \left(\frac{1}{2^{2n}}\right) \sum_{r=1}^n\left(N_{2n-1,2r-1} - N_{2n-1,2r+1}\right) \\ &= \frac{1}{2^{2n}}N_{2n-1,1} = \frac{1}{2^{2n}}{2n - 1 \choose n}\\ &= \frac{1}{2^{2n}}\frac{(2n-1)!}{n!(n-1)!} = \frac{1}{2}\frac{1}{2^{2n}}\frac{(2n)!}{n!n!} \\ &= \frac{1}{2}\mathbb{P}(S_{2n} = 0) \end{align*} by telescoping since $N_{2n-1,2n+1} = 0$ and substituting in the previous equations.

Recalling the earlier equation, we have the desired result, $$ \mathbb{P}(S_1 \neq 0, S_2 \neq 0,\ldots,S_{2n} \neq 0) = 2\mathbb{P}(S_1 > 0, S_2 > 0,\ldots,S_{2n} > 0) = \mathbb{P}(S_{2n} = 0). $$

Distribution of Last Zero Visit

Let $L_{2n}$ be the random variable whose value is the last time the simple random walk visited $0.$ Formally, $$L_{2n} = \sup\left\{2k : k \in \{0,\ldots,n\},~\sum_{i=1}^{2k} X_i = 0\right\}.$$ Then, we have that $$\mathbb{P}(L_{2n} = 2k) = 2^{-2n}{2k \choose k}{2n-2k \choose n-k}.$$

To see this, we have that \begin{align*} \mathbb{P}(L_{2n} = 2k) &= \mathbb{P}(S_{2k} = 0, S_{2k+1} \neq 0, \ldots,S_{2n} \neq 0) \\ &= \mathbb{P}(S_{2k} = 0, S_{2k+1} - S_{2k} \neq 0, \ldots,S_{2n} - S_{2k} \neq 0) \\ &= \mathbb{P}(S_{2k} = 0)\mathbb{P}(S_1 \neq 0, \ldots,S_{2n-2k} \neq 0) \\ &= \mathbb{P}(S_{2k} = 0)\mathbb{P}(S_{2n-2k} = 0), \end{align*} where the last equaility is by Symmetry of Zeroes. Thus, we have that $$\mathbb{P}(L_{2n} = 2k) = \mathbb{P}(S_{2k} = 0)\mathbb{P}(S_{2n-2k} = 0) =2^{-2n}{2k \choose k}{2n-2k \choose n-k}$$ as desired.

Conculusion

Let's look at what this distribution looks like when $n = 15.$ We have symmetric distribution about $15$, so the mean is $15.$ However, the distribution is U-shaped, and the most likely values are very far from the mean. Thus, to take an analogy from sports, if two evenly matched teams were to played against each other multiple times over the course of a season, the most likely scenario is for one team team to lead the other team the entire season. So, saying that team A led team B the entire season is an almost meaningless statistic.

Tags: math, probability, combinatorics, stat

Recently, I saw an interesting problem about graphs. For any integer $p \geq 2$, we call a graph a $p$-graph if it has the property that every group of $p$ vertices has exactly $1$ common neighbor, that is, each one of these $p$ vertices has an edge to some other common vertex $x$. We consider finite undirected graphs with no loops.

The $2$-graphs are actually called Friendship graphs, and they must be triangles with $1$ point in common.

So, let us first look at $3$-graphs. Consider a $3$-graph $G$. Suppose we have vertices $a$, $b$, and $c$. By assumption, they have a common neighbor $x$, so it looks something like this.

Now $a$, $b$, and $x$ must have a common neighbor. Suppose that it's not $c$. Then, we have something like this.

Now, the subgraph containing $a$, $d$, and $x$ is complete. If they have a common neighbor, then $a$, $d$, and $x$ plus that common neighbor is complete, so we have a complete graph with $4$ vertices. Call their common neighbor $v$. If $v \neq b$, we have this situation. But now, $a$, $b$, and $v$ have two common neighbors, $x$ and $d$, so in fact, $v = b$, so the complete graph is formed from $a$, $b$, $d$, and $x$.

By symmetry, we'll have that $a$ and $c$ are part of a complete graph, too, which gives us this.

But now, $b$, $c$, and $d$ have $2$ common neighbors, $a$ and $x$, which is a contradiction. Thus, $d = c$ in the first place, so we actually have this.

Since this graph is isomorphic to the $a$, $b$, $d$, $v$, and $x$ subgraph earlier, we have that $a$, $c$, and $x$ have common neighbor $b$, so the only possible $3$-graph is the complete graph with $4$ vertices.

Now, the the rest of cases will follow by induction. For our induction hypothesis, assume that for $p = 3,\ldots, n - 1$, the only possible $p$-graph is the complete graph with $p + 1$ vertices. We have just proved the base case $p=3$.

Now, consider a $n$-graph for some $n > 3$. Call this graph $G$. Consider $n$ vertices, and let us call them $v_1, v_2, \ldots, v_n$. They have common neighbor $x$. Consider the subgraph consisting of only friends of $x$. Note that this graph excludes $x$. We'll call it $G_x$. Now, $G_x$ is a $n-1$-graph. To see this, note that any set of $n-1$ vertices of $G_x$ plus $x$ itself will have exactly $1$ common neighbor. Call it $y$. Since $y$ is neighbor of $x$, we must have that $y \in G_x$. Thus, $G_x$ is a $n-1$-graph, and by our induction hypothesis, $G_x$ is a complete graph with $n$ vertices. Since every vertex in $G_x$ is connected to $x$, then $G_x \cup \{x\}$ is a complete graph with $n+1$ verties.

Now, we show that $G = G_x \cup \{x\}$. Suppose otherwise for a contradiction, so there is a $y \in G$ such that $y \not\in G_x \cup \{x\}$. $y$ and $x$ have a common neighbor, so $y$ is connected with some $v_i$. But $v_i \in G_x \cup \{x\}$, which is a complete $n+1$ graph, so $v_i$ is the common neighbor of $x$ and $v_1,\ldots,v_{i-1},v_{i+1},\ldots, v_n$. We can consider $G_{v_i}$, the subgraph consisting of only friends of $v_i$. For the same reason that $G_x$ is a complete graph with $n$ vertices, $G_{v_i}$ is also a complete graph with $n$-vertices. But we have that $x,y,v_1,\ldots,v_{i-1},v_{i+1},\ldots,v_n \in G_{v_i}$, which is $n+1$ vertices, so we have a contradiction. Thus, we have that $G = G_x \cup \{x\}$. So, $G$ is a complete graph with $n+1$ vertices. This proves our induction step.

All in all, we have that for $p \geq 3$, the only possible $p$ graph is the complete graph with $p+1$ vertices.

Tags: graph theory, math

Recently, I ran into a problem that forced me to recall a lot of old knowledge and use it in new ways, President and Roads. The problem is this:

We have $V$ vertices and $E$ edges. $2 \leq V \leq 10^5$, and $1 \leq E \leq 10^5$. Index these vertices $0, 1, \ldots, V - 1$. Edges are directed and weighted, and thus, can be written as $e_i = (a_i,b_i,l_i)$, where an edge goes from vertex $a_i$ to vertex $b_i$ with length $l_i$, where $1 \leq l_i \leq 10^6$. We're given a source $s$ and a target $t$. For each edge $e_i$, we ask:

If we take the shortest path from $s$ to $t$, will we always travel along $e_i$? If not, can we decrease $l_i$ to a positive integer so that we will always travel along $e_i$? If so, how much do we need to decrease $l_i$?

Shortest Path

Obviously, we need to find the shortest path. My first thought was to use Floyd-Warshall, since I quickly realized that just the distances from the $s$ wouldn't be enough. Computing the shortest path for all pairs is overkill, however. Let $d(a,b)$ be the distance between two vertices. An edge $e_i = (a_i,b_i,l_i)$ will belong to a shortest path if $$d(s,a_i) + l_i + d(b_i,t)$$ equals to the length of the shortest path. Thus, we only need the distances from the $s$ and to $t$. We can find the distance from $s$ and to $t$ by running Dijkstra's algorithm twice. To find the distances to $t$, we'll need to reverse the edges and pretend $t$ is our source. There's still the problem that in the most simple implementation of Djiktra's, you need to find the vertex of minimum distance each time by linear search, which leads to an $O(V^2)$ algorithm. With a little bit of work, we can bring it down to $O\left((E + V)\log V\right)$ with a priority queue. Unfortunately, the built-in C++ and Java implementations do not support the decrease key operation. That leaves us with two options:

The easiest is to reinsert a vertex into the priority queue, and keep track of already visited vertices. If we see the vertex a second time, throw it away.
Implement our own priority queue.

Of course, I chose to implement my own priority queue based on a binary heap with an underlying hash map to support the decrease key operation. If I was really ambitious, I'd implement a Fibonnaci heap, to achieve $O\left(E + V\log V\right)$ running time. See my C++ implementation below.

I use it in Djikstra's algorithm here:

vector<pair<long long, unordered_set<int>>> computeShortestPath(int s, const vector<unordered_map<int, pair<int, int>>> &edgeList) {
  int N = edgeList.size();
  vector<pair<long long, unordered_set<int>>> distance(N, make_pair(LLONG_MAX/3, unordered_set<int>()));
  phillypham::priority_queue pq(N);
  distance[s].first = 0;
  pq.push(s, 0);
  while (!pq.empty()) {  
    int currentVertex = pq.top(); pq.pop();
    long long currentDistance = distance[currentVertex].first;
    // relax step
    for (pair<int, pair<int, int>> entry : edgeList[currentVertex]) {
      int nextVertex = entry.first; int length = entry.second.first;
      long long newDistance = currentDistance + length;
      if (distance[nextVertex].first > newDistance) {
        distance[nextVertex].first = newDistance;
        distance[nextVertex].second.clear();
        distance[nextVertex].second.insert(currentVertex);
        if (pq.count(nextVertex)) {
          pq.decrease_key(nextVertex, newDistance);
        } else {
          pq.push(nextVertex, newDistance);
        }
      } else if (distance[nextVertex].first == newDistance) {        
        distance[nextVertex].second.insert(currentVertex);
      }
    }
  }
  return distance;
}

computeShortestPath returns a vector of pairs, where each pair consists of the distance from $s$ and the possible previous vertices in a shortest path. Thus, I've basically recreated the graph only including the edges that are involved in some shortest path.

Counting Paths

The next task is to figure out which edges are necessary. There could be multiple shortest paths, so if the shortest path has length $L$, given edge $e_i = (a_i, b_i, l_i)$, the condition $$d(s,a_i) + l_i + d(b_i,t) = L$$ is necessary but not sufficient. First, we need to make sure $e_i$ is distinct as there are no restrictions on duplicate edges. Secondly, every path must pass through $a_i$ and $b_i$. And finally, we need to make sure that $\left|P(b_i)\right| = 1$, where $P(b_i)$ is the set of vertices that come directly before vertex $b_i$ on a shortest path because you could imagine something like this scenario:

The shortest path is of length 4, and we have two options $s \rightarrow a \rightarrow c \rightarrow b \rightarrow t$ and $s \rightarrow a \rightarrow b \rightarrow t$. Every shortest path goes through $a$ and $b$, but the actual edge between $a$ and $b$ is not necessary to achieve the shortest path. Since we already computed the parents of $b_i$ when running computeShortestPath, it's simple to check that $b_i$ only has one parent.

Now, the only thing left to do is count the number of shortest paths that go through each vertex. For any vertex, $v$, let $w$ be the number of ways to reach $v$ from $s$ and $x$ be the number ways to reach $t$ from $v$. Then, the number of paths that pass through $v$ is $wx$. Now one could use something like depth-first search to compute $w_j$ for every vertex $j$. We'd start at $s$, take every path, and increment $w_j$ every time that we encouter $j$. This is too slow, though, as we'll visit every node multiple times. If there are $10^9$ paths to $t$, we'll take all $10^9$.

We can solve this problem by using a priority queue again using a similar idea to Dijkstra's. The idea is that for a paricular vertex $v$, $$w_v = \sum_{u \in P(v)} w_u,$$ where you may recall that $P(v)$ consists of all the parents of $v$. So, we just need to calculate all the number of paths from $s$ to the parents first. Since every edge has positive length, all the parents will have shorter distance from $s$, thus, we can use a priority queue to get the vertices in increasing order of distance from $s$. Every time we visit a node, we'll update all its children, and we'll never have to visit that node again, so we can compute the number of paths in $O\left((E + V)\log V\right)$ time. See the code:

const int MOD_A = 1000000207;
const int MOD_B = 1000000007;
const int MOD_C = 999999733;

tuple<int, int, int> addPathTuples(tuple<int, int, int> a, tuple<int, int, int> b) {
  return make_tuple((get<0>(a) + get<0>(b)) % MOD_A, (get<1>(a) + get<1>(b)) % MOD_B, (get<2>(a) + get<2>(b)) % MOD_C);
}

tuple<int, int, int> multiplyPathTuples(tuple<int, int, int> a, tuple<int, int, int> b) {
  long long a0 = get<0>(a), a1 = get<1>(a), a2 = get<2>(a);
  return make_tuple((a0*get<0>(b)) % MOD_A, (a1*get<1>(b)) % MOD_B, (a2*get<2>(b)) % MOD_C);
}

vector<tuple<int, int, int>> countPaths(int s, 
                                        const vector<pair<long long, unordered_set<int>>> &distances,
                                        const vector<pair<long long, unordered_set<int>>> &children) {
  // assume only edges that make shortest paths are included
  int N = children.size();
  vector<tuple<int, int, int>> pathCounts(N, make_tuple(0, 0, 0)); // store as tuple, basically modular hash function
  pathCounts[s] = make_tuple(1, 1, 1);
  phillypham::priority_queue pq(N);
  pq.push(s, 0);
  while (!pq.empty()) {
    int currentVertex = pq.top(); pq.pop();  
    for (int nextVertex : children[currentVertex].second) {
      pathCounts[nextVertex] = addPathTuples(pathCounts[currentVertex], pathCounts[nextVertex]);
      if (!pq.count(nextVertex)) {
        pq.push(nextVertex, distances[nextVertex].first);
      }
    }    
  }
  return pathCounts;
}

$x$, the number of paths to $t$ from $v$ can be computed in much the same way by reversing edges and starting at $t$.

Now, you might notice that instead of returning the number of paths, I return a vector of tuple<int, int, int>. This is because the number of paths is huge, exceeding what can fit in unsigned long long, which is $2^{64} - 1 = 18446744073709551615$. We don't actually care about the number of paths, though, just that the number of paths is equal to the total number of paths. Thus, we can hash the number of paths. I chose to hash the paths by modding it by three large prime numbers close to $10^9$. See the complete solution below.

Hashing and Chinese Remainder Theorem

At one level, doing this seems somewhat wrong. If there is a collision, I'll get the wrong answer, so I decided to do some analysis on the probability of a collision. It occured to me that hashing is actually just an equivalence relation, where two keys will belong to the same bucket if they are equivalent, so buckets are equivalence classes.

Now, I choose 3 large primes $p_1 = 1000000207$, $p_2 = 1000000007$, and $p_3 = 999999733$, each around $10^9$. We'll run into an error if we get two numbers $x$ and $y$ such that \begin{align*} x &\equiv y \bmod p_1 \\ x &\equiv y \bmod p_2 \\ x &\equiv y \bmod p_3 \end{align*} since $x$ and $y$ will have the same hash. Fix $y$. Let $N = p_1p_2p_3$, $y \equiv a_1 \bmod p_1$, $y \equiv a_2 \bmod p_2$, and $y \equiv a_3 \bmod p_3$. Then, by the Chinese Remainder Theorem, for $x$ to have the same hash, we must have $$ x \equiv a_1\left(\frac{N}{p_1}\right)^{-1}_{p_1}\frac{N}{p_1} + a_2\left(\frac{N}{p_2}\right)^{-1}_{p_2}\frac{N}{p_2} + a_3\left(\frac{N}{p_3}\right)^{-1}_{p_3}\frac{N}{p_3} \bmod N,$$ where $$\left(\frac{N}{p_i}\right)^{-1}_{p_i}$$ is the modular inverse with respect to $p_i$, which we can find by the extended Euclidean algorithm as mentioned here.

We'll have that \begin{align*} \left(\frac{N}{p_1}\right)^{-1}_{p_1} &\equiv 812837721 \bmod p_1 \\ \left(\frac{N}{p_2}\right)^{-1}_{p_2} &\equiv 57354015 \bmod p_2 \\ \left(\frac{N}{p_3}\right)^{-1}_{p_3} &\equiv 129808398 \bmod p_3, \end{align*} so if $a_1 = 10$, $a_2 = 20$, and $a_3 = 30$, then the smallest such $x$ that would be equivalent to $y$ is $x = 169708790167673569857984349$. In general, \begin{align*} x &= 169708790167673569857984349 + Nk \\ &= 169708790167673569857984349 + 999999946999944310999613117k, \end{align*} for some $k \in \mathbb{Z}$, which leaves us with only an approximately $10^{-27}$ chance of being wrong. Thus, for all intents and purposes, our algorithm will never be wrong.

Binary Heap Priority Queue

namespace phillypham {
  class priority_queue {
  private:
    int keysSize;
    vector<int> keys;
    vector<long long> values;
    unordered_map<int, int> keyToIdx;
    int parent(int idx) {
      return (idx + 1)/2 - 1;
    }
    int left(int idx) {
      return 2*(idx+1) - 1;
    }
    int right(int idx) {
      return 2*(idx+1);
    }
    void heap_swap(int i, int j) {
      if (i != j) {
        keyToIdx[keys[j]] = i;
        keyToIdx[keys[i]] = j;
        swap(values[j], values[i]);
        swap(keys[j], keys[i]);
      }
    }
    void max_heapify(int idx) {
      int lIdx = left(idx);
      int rIdx = right(idx);
      int smallestIdx = idx;
      if (lIdx < keysSize && values[lIdx] < values[smallestIdx]) {
        smallestIdx = lIdx;
      }
      if (rIdx < keysSize && values[rIdx] < values[smallestIdx]) {
        smallestIdx = rIdx;
      }
      if (smallestIdx != idx) {
        heap_swap(smallestIdx, idx);
        max_heapify(smallestIdx);
      } 
    }

    void min_heapify(int idx)  {
      while (idx > 0 && values[parent(idx)] > values[idx]) {
        heap_swap(parent(idx), idx);
        idx = parent(idx);
      }
    }

  public:
    priority_queue(int N) {
      keysSize = 0;
      keys.clear(); keys.reserve(N);
      values.clear(); values.reserve(N);
      keyToIdx.clear(); keyToIdx.reserve(N);
    }

    void push(int key, long long value) {
      // if (keyToIdx.count(key)) throw logic_error("key " + ::to_string(key) + " already exists");
      int idx = keysSize; ++keysSize;
      if (keysSize > keys.size()) {
        keys.push_back(key);
        values.push_back(value);
      } else {
        keys[idx] = key;
        values[idx] = value;
      }   
      keyToIdx[key] = idx;
      min_heapify(idx);      
    }

    void increase_key(int key, long long value) {
      // if (!keyToIdx.count(key)) throw logic_error("key " + ::to_string(key) + " does not exist");
      // if (values[keyToIdx[key]] > value) throw logic_error("value " + ::to_string(value) + " is not an increase");
      values[keyToIdx[key]] = value;
      max_heapify(keyToIdx[key]);
    }

    void decrease_key(int key, long long value) {
      // if (!keyToIdx.count(key)) throw logic_error("key " + ::to_string(key) + " does not exist");
      // if (values[keyToIdx[key]] < value) throw logic_error("value " + ::to_string(value) + " is not a decrease");
      values[keyToIdx[key]] = value;
      min_heapify(keyToIdx[key]);       
    }

    void pop() {
      if (keysSize > 0) {
        heap_swap(0, --keysSize);
        keyToIdx.erase(keys[keysSize]);       
        if (keysSize > 0) max_heapify(0);
      } else {
        throw logic_error("priority queue is empty");
      }
    }

    int top() {
      if (keysSize > 0) {
        return keys.front();
      } else {
        throw logic_error("priority queue is empty");
      }
    }    

    int size() {
      return keysSize;
    }

    bool empty() {
      return keysSize == 0;
    }

    int at(int key) {
      return values[keyToIdx.at(key)];
    }

    int count(int key) {
      return keyToIdx.count(key);
    }

    string to_string() {
      ostringstream out;
      copy(keys.begin(), keys.begin() + keysSize, ostream_iterator<int>(out, " "));
      out << '\n';
      copy(values.begin(), values.begin() + keysSize, ostream_iterator<int>(out, " "));
      return out.str();
    }    
  };
}

Solution

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);
  int N, M, s, t; // number of nodes, edges, source, and target
  cin >> N >> M >> s >> t;
  --s; --t;                     // 0 indexing
  vector<tuple<int, int, int>> edges;
  vector<unordered_map<int, pair<int, int>>> edgeList(N);
  vector<unordered_map<int, pair<int, int>>> reverseEdgeList(N);
  for (int m = 0; m < M; ++m) { // read in edges
    int a, b, l;
    cin >> a >> b >> l;
    --a; --b;
    edges.emplace_back(a, b, l);
    if (!edgeList[a].count(b) || edgeList[a][b].first > l) {
      edgeList[a][b] = make_pair(l, 1);
      reverseEdgeList[b][a] = make_pair(l, 1);
    } else if (edgeList[a][b].first == l) {
      ++edgeList[a][b].second;
      ++reverseEdgeList[b][a].second;
    }
  } 
  vector<pair<long long, unordered_set<int>>> distanceFromSource = computeShortestPath(s, edgeList);
  vector<pair<long long, unordered_set<int>>> distanceFromTarget = computeShortestPath(t, reverseEdgeList);
  vector<tuple<int, int, int>> pathCounts = countPaths(s, distanceFromSource, distanceFromTarget);
  vector<tuple<int, int, int>> backPathCounts = countPaths(t, distanceFromTarget, distanceFromSource);
  for (int i = 0; i < N; ++i) pathCounts[i] = multiplyPathTuples(pathCounts[i], backPathCounts[i]);

  long long shortestDistance = distanceFromSource[t].first;
  tuple<int, int, int> shortestPaths = pathCounts[s];
  for (tuple<int, int, int> edge : edges) {
    long long pathDistance = distanceFromSource[get<0>(edge)].first + get<2>(edge) + distanceFromTarget[get<1>(edge)].first;
    if (pathDistance == shortestDistance && // path is shortest
        pathCounts[get<0>(edge)] == shortestPaths && // every path goes through from node
        pathCounts[get<1>(edge)] == shortestPaths && // every path goes through to node
        distanceFromSource[get<1>(edge)].second.size() == 1 && // only paths come from the from node
        edgeList[get<0>(edge)][get<1>(edge)].second == 1) {
      cout << "YES" << '\n';
    } else if (pathDistance - get<2>(edge) + 1 < shortestDistance) {
      cout << "CAN" << ' ' << (pathDistance - shortestDistance) + 1 << '\n';
    } else {
      // which will happen if the city is unconnected since the edge won't be big enough to overcome the big distance
      cout << "NO" << '\n';
    }
  }
  cout << flush;
  return 0;
}

Tags: algorithm, competitive programming, graph theory, computer science, math, number theory

One popular model for classification is nearest neighbors. It's broadly applicable and unbiased as it makes no assumptions about the generating distribution of the data. Suppose we have $N$ observations. Each observation can be written as $(\mathbf{x}_i,y_i)$, where $0 \leq i \leq N$ and $\mathbf{x}_i = (x_{i1},x_{i2},\ldots,x_{ip})$, so we have $p$ features, and $y_i$ is the class to which $\mathbf{x}_i$ belongs. Let $y_i \in C$, where $C$ is a set of possible classes. If we were given a new observation $\mathbf{x}$. We would find the $k$ closest $(\mathbf{x}_i, y_i)$, say $(\mathbf{x}_{j_1}, y_{j_1}), (\mathbf{x}_{j_2}, y_{j_2}),\ldots, (\mathbf{x}_{j_k}, y_{j_k})$. To classify $\mathbf{x}$, we would simply take a majority vote among these $k$ closest points. While simple and intuitive, as we will see, nearest neighbors runs into problems when $p$ is large.

Consider this problem:

Consider $N$ data points uniformly distributed in a $p$-dimensional unit ball centered at the origin. Find the median distance from the origin of the closest data point among the $N$ points.

Let the median distance be $d(p, N)$. First to keep things simple consider a single data point, so $N = 1$. The volume of a $p$-dimensional ball of radius $r$ is proportional to $r^p$, so $V(r) = Kr^p$. Let $d$ be the distance of the point, so $P(d \leq d(p,1)) = 0.5$. Viewing probability as volume, we imagine a smaller ball inside the larger ball, so \begin{align*} \frac{1}{2} &= P(d \leq d(p, 1)) \\ &= \frac{V(d(p,1))}{V(1)} \\ &= d(p,1)^p \\ &\Rightarrow d(p,1) = \left(\frac{1}{2}\right)^{1/p}, \end{align*} and in general, $P(d \leq t) = t^p$, where $0 \leq t \leq 1$. For example when $p = 1$, we have

For $p=2$,

Now, consider the case when we have $N$ data points, $x_1, x_2, \ldots, x_N$. The distance of the closest point is $$d = \min\left(\Vert x_1 \Vert, \Vert x_2 \Vert, \ldots, \Vert x_N \Vert\right).$$ Thus, we'll have \begin{align*} \frac{1}{2} &= P(d \leq d(p,N)) \\ &= P(d > d(p,N)),~\text{since $P(d \leq d(p,N)) + P(d > d(p,N)) = 1$} \\ &= P\left(\Vert x_1\Vert > d(p,N)\right)P\left(\Vert x_2\Vert > d(p,N)\right) \cdots P\left(\Vert x_N\Vert > d(p,N)\right) \\ &= \prod_{i=1}^N \left(1 - P\left(\Vert x_i \Vert \leq d(p,N)\right)\right) \\ &= \left(1 - d(p,N)^p\right)^N,~\text{since $x_i$ are i.i.d and $P(\Vert x_i\Vert \leq t) = t^p$}. \end{align*} And so, \begin{align*} \frac{1}{2} &= \left(1 - d(p,N)^p\right)^N \\ \Rightarrow 1 - d(p,N)^p &= \left(\frac{1}{2}\right)^{1/N} \\ \Rightarrow d(p,N)^p &= 1 - \left(\frac{1}{2}\right)^{1/N}. \end{align*} Finally, we obtain $$\boxed{d(p,N) = \left(1 - \left(\frac{1}{2}\right)^{1/N}\right)^{1/p}}.$$

So, what does this equation tell us? As the dimension $p$ increases, the distance goes to 1, so all points become far away from the origin. But as $N$ increases the distance goes to 0, so if we collect enough data, there will be a point closest to the origin. But note that as $p$ increases, we need an exponential increase in $N$ to maintain the same distance.

Let's relate this to the nearest neighbor method. To make a good prediction on $\mathbf{x}$, perhaps, we need a training set point that is within distance 0.1 from $\mathbf{x}$. We would need 7 data points for there to be a greater than 50% chance of such a point existing if $p = 1$. See how $N$ increases as $p$ increases.

p	N
1	7
2	69
3	693
4	6932
5	69315
10	6,931,471,232
15	$\scriptsize{6.937016 \times 10^{14}}$

The increase in data needed for there to be a high probability that there is a point close to $\mathbf{x}$ is exponential. We have just illustrated the curse of dimensionality. So, in high dimensions, other methods must be used.

Tags: machine learning, stat, probability, math

<<< Previous page

Next page >>>

About Me

Posts tagged math

Applications to Dynamic Programming

Spring Embedding

Implementation Issues

Analysis

Part A

Solution

Part B

Solution

Graphs in $\LaTeX$

From $n$ objects with replacement

Ordered set of $k$ objects

Unordered set of $k$ objects

From $n$ objects without replacement

Unordered set of $k$ objects

Ordered set of $k$ objects

Bell Numbers

Catalan Numbers

Distribution of Last Zero Visit

Conculusion

Shortest Path

Counting Paths

Hashing and Chinese Remainder Theorem