# Posts tagged *math*

Consider the game of Nim. The best way to become acquainted is to play Nim: The Game, which I've coded up here. Win some games!

## Solving Nim

Nim falls under a special class of games, in which, we have several independent games (each pile is its own game), perfect information (you and your opponent know everything), and sequentiality (the game always ends). It turns out every game of this type is equivalent, and we can use Nim as a model on how to solve them.

The most general way to find the optimal strategy is exhaustive enumeration, which we can do recursively.

```
class GameLogic:
def get_next_states(self, current_state):
pass
def is_losing_state(self, state):
pass
def can_move_to_win(gameLogic, current_state):
## take care of base case
if gameLogic.is_losing_state(current_state):
return False
## try all adjacent states, these are our opponent's states
next_states = gameLogic.get_next_states(current_state)
for next_state in next_states:
if can_move_to_win(gameLogic, next_state) is False:
return True # one could possibly return the next move here
## otherwise, we always give our opponent a winning state
return False
```

Of course exhaustive enumeration quickly becomes infeasible. However, the fact that there exists this type of recursive algorithm informs us that we should be looking at induction to solve Nim.

One way to get some intuition at the solution is to think of heaps of just 1 object. No heaps mean that we've already lost, and 1 heap means that we'll win. 2 heaps must reduce to 1 heap, which puts our opponent in a winning state, so we lose. 3 heaps must reduce to 2 heaps, so we'll win. Essentially, we'll win if there are an odd number of heaps.

Now, if we think about representing the number of objects in a heap as a binary number, we can imagine each binary digit as its own game. For example, imagine heaps of sizes, $(27,16,8,2,7)$. Represented as binary, this looks like \begin{align*} 27 &= (11011)_2 \\ 16 &= (10000)_2 \\ 8 &= (01000)_2 \\ 2 &= (00010)_2 \\ 7 &= (00111)_2. \end{align*} The columns corresponding to the $2$s place and $4$s place have an odd number of $1$s, so we can put our opponent in a losing state by removing $6$ objects from the last heap. \begin{align*} 27 &= (11011)_2 \\ 16 &= (10000)_2 \\ 8 &= (01000)_2 \\ 2 &= (00010)_2 \\ 7 - 6 = 1 &= (00001)_2. \end{align*}

Now, the XOR of all the heaps is $0$, so there are an even number of $1$s in each column. As we can see, we have a general algorithm for doing this. Let $N_1,N_2,\ldots,N_K$ be the size of our heaps. If $S = N_1 \oplus N_2 \oplus \cdots \oplus N_K \neq 0$, then $S$ has a nonzero digit. Consider the leftmost nonzero digit. Since $2^{k+1} > \sum_{j=0}^k 2^j$, we can remove objects such that this digit changes, and we can chose any digits to the right to be whatever is necessary so that the XOR of the heap sizes is $0$. Here's a more formal proof.

### Proof that if $N_1 \oplus N_2 \oplus \cdots \oplus N_K = 0$ we're in a losing state

Suppose there are $K$ heaps. Define $N_k^{(t)}$ to be the number of objects in heap $k$ at turn $t$. Define $S^{(t)} = N_1^{(t)} \oplus N_2^{(t)} \oplus \cdots \oplus N_K^{(t)}$.

We lose when $N_1 = N_2 = \cdots = N_k = 0$, so if $S^{(t)} \neq 0$, the game is still active, and we have not lost. By the above algorithm, we can make it so that $S^{(t+1)} = 0$ for our opponent. Then, any move that our opponent does must necessarily make it so that $S^{(t+2)} \neq 0$. In this manner, our opponent always has $S^{(t + 2s + 1}) = 0$ for all $s$. Thus, either they lose, or they give us a state such that $S^{(t+2s)} \neq 0$, so we never lose. Since the game must end, eventually our opponent loses.

## Sprague-Grundy Theorem

Amazingly this same idea can be applied to a variety of games that meets certain conditions through the Sprague-Grundy Theorem. I actually don't quite under the Wikipedia article, but this is how I see it.

We give every indepedent game a *nimber*, meaning that it is equivalent to a heap of that size in Nim. Games that are over are assigned the nimber $0$. To find the nimber of a non-terminating game position, we look at the nimbers of all the game positions that we can move to. The nimber of this position is smallest nonnegative integer strictly greater than all the nimbers that we can move to. So if we can move to the nimbers $\{0,1,3\}$, the nimber is $2$.

At any point of the game, each of our $K$ independent games has a nimber $N_k$. We're in a losing state if and only if $S = N_1 \oplus N_2 \oplus \cdots \oplus N_K = 0$.

For the $\Leftarrow$ direction, first suppose that $S^{(t)} = N_1^{(t)} \oplus N_2^{(t)} \oplus \cdots \oplus N_K^{(t)} = 0$. Because of the way that the nimber's are defined we any move that we do changes the nimber of exactly one of the independent games, so $N_k^{(t)} \neq N_k^{(t + 1)}$, for the next game positions consist of nimbers $1,2,\ldots, N_k^{(t)} - 1$. If we can move to $N_k^{(t)}$, then actually the nimber of the current game position is $N_k^{(t)} + 1$, a contradiction. Thus, we have ensured that $S^{(t + 1)} \neq 0$. Since we can always move to smaller nimbers, we can think of nimbers as the number of objects in a heap. In this way, the opponent applies the same algorithm to ensure that $S^{(t + 2s)} = 0$, so we can never put the opponent in a terminating position. Since the game must end, we'll eventually be in the terminating position.

For the $\Rightarrow$ direction, suppose that we're in a losing state. We prove the contrapositive. If $S^{(t)} = N_1^{(t)} \oplus N_2^{(t)} \oplus \cdots \oplus N_K^{(t)} \neq 0$, the game has not terminated, and we can make it so that $S^{(t+1)} = 0$ by construction of the nimbers. Thus, our opponent is always in a losing state, and since the game must terminate, we'll win.

Now, this is all very abstract, so let's look at an actual problem.

## Floor Division Game

Consider the Floor Division Game from CodeChef. This is the problem that motivated me to learn about all this. Let's take a look at the problem statement.

Henry and Derek are waiting on a room, eager to join the Snackdown 2016 Qualifier Round. They decide to pass the time by playing a game.

In this game's setup, they write $N$ positive integers on a blackboard. Then the players take turns, starting with Henry. In a turn, a player selects one of the integers, divides it by $2$, $3$, $4$, $5$ or $6$, and then takes the floor to make it an integer again. If the integer becomes $0$, it is erased from the board. The player who makes the last move wins.

Henry and Derek are very competitive, so aside from wanting to win Snackdown, they also want to win this game. Assuming they play with the optimal strategy, your task is to predict who wins the game.

The independent games aren't too hard to see. We have $N$ of them in the form of the positive integers that we're given. The numbers are written clearly on the blackboard, so we have perfect information. This game is sequential since the numbers must decrease in value.

A first pass naive solutions uses dynamic programming. Each game position is a nonnegative integer. $0$ is the terminating game position, so its nimber is $0$. Each game position can move to at most $5$ new game positions after dividing by $2$, $3$, $4$, $5$ or $6$ and taking the floor. So if we know the nimbers of these $5$ game positions, it's a simple matter to compute the nimber the current game position. Indeed, here's such a solution.

```
#include <iostream>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <vector>
using namespace std;
long long computeNimber(long long a, unordered_map<long long, long long> &nimbers) {
if (nimbers.count(a)) return nimbers[a];
if (a == 0LL) return 0LL;
unordered_set<long long> moves;
for (int d = 2; d <= 6; ++d) moves.insert(a/d);
unordered_set<long long> neighboringNimbers;
for (long long nextA : moves) {
neighboringNimbers.insert(computeNimber(nextA, nimbers));
}
long long nimber = 0;
while (neighboringNimbers.count(nimber)) ++nimber;
return nimbers[a] = nimber;
}
string findWinner(const vector<long long> &A, unordered_map<long long, long long> &nimbers) {
long long cumulativeXor = 0;
for (long long a : A) {
cumulativeXor ^= computeNimber(a, nimbers);
}
return cumulativeXor == 0 ? "Derek" : "Henry";
}
int main(int argc, char *argv[]) {
ios::sync_with_stdio(false); cin.tie(NULL);
int T; cin >> T;
unordered_map<long long, long long> nimbers;
nimbers[0] = 0;
for (int t = 0; t < T; ++t) {
int N; cin >> N;
vector<long long> A; A.reserve(N);
for (int n = 0; n < N; ++n) {
long long a; cin >> a;
A.push_back(a);
}
cout << findWinner(A, nimbers) << '\n';
}
cout << flush;
return 0;
}
```

Unfortunately, in a worst case scenario, we'll have to start computing nimbers from $0$ and to $\max\left(A_1,A_2,\ldots,A_N\right)$. Since $1 \leq A_n \leq 10^{18}$ can be very large, this solution breaks down. We can speed this up with some mathematical induction.

To me, the pattern was not obvious at all. To help discover it, I generated the nimbers for smaller numbers. This is what I found.

Game Positions | Nimber | Numbers in Interval |
---|---|---|

$[0,1)$ | $0$ | $1$ |

$[1,2)$ | $1$ | $1$ |

$[2,4)$ | $2$ | $2$ |

$[4,6)$ | $3$ | $2$ |

$[6,12)$ | $0$ | $6$ |

$[12,24)$ | $1$ | $12$ |

$[24,48)$ | $2$ | $24$ |

$[48,72)$ | $3$ | $24$ |

$[72,144)$ | $0$ | $72$ |

$[144,288)$ | $1$ | $144$ |

$[288,576)$ | $2$ | $288$ |

$[576, 864)$ | $3$ | $288$ |

$[864,1728)$ | $0$ | $864$ |

$[1728,3456)$ | $1$ | $1728$ |

$[3456,6912)$ | $2$ | $3456$ |

$[6912, 10368)$ | $3$ | $3456$ |

$\vdots$ | $\vdots$ | $\vdots$ |

Hopefully, you can start to see some sort of pattern here. We have repeating cycles of $0$s, $1$2, $2$s, and $3$s. Look at the starts of the cycles $0$, $6$, $72$, and $864$. The first cycle is an exception, but the following cycles all start at $6 \cdot 12^k$ for some nonnegative integer $k$.

Also look at the number of $0$s, $1$2, $2$s, and $3$s. Again, with the exception of the transition from the first cycle to the second cycle, the quantity is multiplied by 12. Let $s$ be the cycle start and $L$ be the cycle length. Then, $[s, s + L/11)$ has nimber $0$, $[s + L/11, s + 3L/11)$ has nimber $1$, $[s + 3L/11, s + 7L/11)$ has nimber $2$, and $[s + 7L/11, s + L)$ has nimber $3$.

Now, given that the penalty for wrong submissions on CodeChef is rather light, you could code this up and submit it now, but let's try to rigorously prove it. It's true when $s_0 = 6$ and $L_0 = 66$. So, we have taken care of the base case. Define $s_k = 12^ks_0$ and $L_k = 12^kL_0$. Notice that $L_k$ is always divisible by $11$ since $$L_{k} = s_{k+1} - s_{k} = 12^{k+1}s_0 - 12^{k}s_0 = 12^{k}s_0(12 - 1) = 11 \cdot 12^{k}s_0.$$

Our induction hypothesis is that this pattern holds in the interval $[s_k, s_k + L_k)$. Now, we attack the problem with case analysis.

- Suppose that $x \in \left[s_{k+1}, s_k + \frac{1}{11}L_{k+1}\right)$. We have that $s_{k+1} = 12s_k$ and $L_{k+1} = 12L_k = 12\cdot 11s_k$, so we have that \begin{align*} 12s_k \leq &~~~x < 2 \cdot 12s_k = 2s_{k+1} \\ \left\lfloor\frac{12}{d}s_k\right\rfloor \leq &\left\lfloor\frac{x}{d}\right\rfloor < \left\lfloor\frac{2}{d}s_{k+1}\right\rfloor \\ 2s_k \leq &\left\lfloor\frac{x}{d}\right\rfloor < s_{k+1} \\ s_k + \frac{1}{11}L_k \leq &\left\lfloor\frac{x}{d}\right\rfloor < s_{k+1} \end{align*} since $2 \leq d \leq 6$, and $L_k = 11 \cdot 12^{k}s_0 = 11s_k$. Thus, $\left\lfloor\frac{x}{d}\right\rfloor$ falls entirely in the previous cycle, but it's large enough so it never has the nimber $0$. Thus, the smallest nonnegative integer among next game positions is $0$, so $x$ has the nimber $0$.
- Now, suppose that $x \in \left[s_{k+1} + \frac{1}{11}L_{k+1}, s_{k+1} + \frac{3}{11}L_{k+1}\right)$. Then, we have that \begin{align*} s_{k+1} + \frac{1}{11}L_{k+1} \leq &~~~x < s_{k+1} + \frac{3}{11}L_{k+1} \\ 2s_{k+1} \leq &~~~x < 4s_{k+1} \\ 2 \cdot 12s_{k} \leq &~~~x < 4 \cdot 12s_{k} \\ \left\lfloor \frac{2 \cdot 12}{d}s_{k}\right\rfloor \leq &\left\lfloor \frac{x}{d} \right\rfloor < \left\lfloor\frac{4 \cdot 12}{d}s_{k}\right\rfloor \\ s_k + 3s_k = 4s_k \leq &\left\lfloor \frac{x}{d} \right\rfloor < 24s_k = 12s_k + 12s_k = s_{k+1} + s_{k+1} \\ s_k + \frac{3}{11}L_k \leq &\left\lfloor \frac{x}{d} \right\rfloor < s_{k+1} + \frac{1}{11}L_{k+1}. \end{align*} Thus, $\left\lfloor \frac{x}{d} \right\rfloor$ lies in both the previous cycle and current cycle. In the previous cycle it lies in the part with nimbers $2$ and $3$. In the current cycle, we're in the part with nimber $0$. In fact, if we let $d = 2$, we have that \begin{equation*} s_{k+1} \leq \left\lfloor \frac{x}{2} \right\rfloor < s_{k+1} + \frac{1}{11}L_{k+1}, \end{equation*} so we can always reach a game position with nimber $0$. Since we can never reach a game position with nimber $1$, this game position has nimber $1$.
- Suppose that $x \in \left[s_{k+1} + \frac{3}{11}L_{k+1}, s_{k+1} + \frac{7}{11}L_{k+1}\right)$. We follow the same ideas, here. \begin{align*} s_{k+1} + \frac{3}{11}L_{k+1} \leq &~~~x < s_{k+1} + \frac{7}{11}L_{k+1} \\ 4s_{k+1} \leq &~~~x < 8s_{k+1} \\ 8s_{k} \leq &\left\lfloor \frac{x}{d} \right\rfloor < 4s_{k+1} \\ s_{k} + \frac{7}{11}L_k \leq &\left\lfloor \frac{x}{d} \right\rfloor < s_{k+1} + \frac{3}{11}L_{k+1}, \end{align*} so $\left\lfloor \frac{x}{d} \right\rfloor$ falls in the previous cycle where the nimber is $3$ and in the current cycle where the nimbers are $0$ and $1$. By fixing $d = 2$, we have that \begin{equation} s_{k+1} + \frac{1}{11}L_{k+1} = 2s_{k+1} \leq \left\lfloor \frac{x}{2} \right\rfloor < 4s_{k+1} = s_{k+1} + frac{3}{11}L_{k+1}, \end{equation} so we can always get to a number where the nimber is $1$. By fixing $d = 4$, we \begin{equation} s_{k+1} \leq \left\lfloor \frac{x}{4} \right\rfloor < 2s_{k+1} = s_{k+1} + \frac{1}{11}L_{k+1}, \end{equation} so we can always get to a number where the nimber is $0$. Since a nimber of $2$ is impossible to reach, the current nimber is $2$.
- Finally, the last case is $x \in \left[s_{k+1} + \frac{7}{11}L_{k+1}, s_{k+1} + L_{k+1}\right)$. This case is actually a little different. \begin{align*} s_{k+1} + \frac{7}{11}L_{k+1} \leq &~~~x < s_{k+1} + L_{k+1} \\ 8s_{k+1} \leq &~~~x < 12s_{k+1} \\ s_{k+1} < \left\lfloor \frac{4}{3}s_{k+1} \right\rfloor \leq &\left\lfloor \frac{x}{d} \right\rfloor < 6s_{k+1}, \end{align*} so we're entirely in the current cycle now. If we use, $d = 6$, then \begin{equation*} s_{k+1} < \left\lfloor \frac{4}{3}s_{k+1} \right\rfloor \leq \left\lfloor \frac{x}{6} \right\rfloor < 2s_{k+1} = s_{k+1} + \frac{1}{11}L_{k+1}, \end{equation*} so we can reach nimber $0$. If we use $d = 4$, we have \begin{equation*} s_{k+1} + \frac{1}{11}L_{k+1} = 2s_{k+1} \leq \left\lfloor \frac{x}{4} \right\rfloor < 3s_{k+1} = s_{k+1} + \frac{2}{11}L_{k+1}, \end{equation*} which gives us a nimber of $1$. Finally, if we use $d = 2$, \begin{equation*} s_{k+1} + \frac{3}{11}L_{k+1} \leq 4s_{k+1} \leq\left\lfloor \frac{x}{4} \right\rfloor < 6s_{k+1} = s_{k+1} + \frac{5}{11}L_{k+1}, \end{equation*} so we can get a nimber of $2$, too. This is the largest nimber that we can get since $d \geq 2$, so $x$ must have nimber $3$.

This covers all the cases, so we're done. Here's the complete code for a $O\left(\max(A_k)\log N\right)$ solution.

```
#include <cmath>
#include <iostream>
#include <string>>
#include <vector>
using namespace std;
long long computeNimber(long long a) {
if (a < 6) { // exceptional cases
if (a < 1) return 0;
if (a < 2) return 1;
if (a < 4) return 2;
return 3;
}
unsigned long long cycleStart = 6;
while (12*cycleStart <= a) cycleStart *= 12;
if (a < 2*cycleStart) return 0;
if (a < 4*cycleStart) return 1;
if (a < 8*cycleStart) return 2;
return 3;
}
string findWinner(const vector<long long> &A) {
long long cumulativeXor = 0;
for (long long a : A) cumulativeXor ^= computeNimber(a);
return cumulativeXor == 0 ? "Derek" : "Henry";
}
int main(int argc, char *argv[]) {
ios::sync_with_stdio(false); cin.tie(NULL);
int T; cin >> T;
for (int t = 0; t < T; ++t) {
int N; cin >> N;
vector<long long> A; A.reserve(N);
for (int n = 0; n < N; ++n) {
long long a; cin >> a;
A.push_back(a);
}
cout << findWinner(A) << '\n';
}
cout << flush;
return 0;
}
```

To avoid facing the ticking clock, I've decided to read a textbook and start a new project Machine Learning: a Probabilistic Perspective in Python. I've been meaning to learn more about the math behind machine learning and the Python ecosystem for scientific computing, so I've decided to code up various solutions for the textbook in Jupyter notebooks. Here are a couple of the more interesting exercises.

## Cross Validation and Nearest Neighbors

A classic problem in machine learning that is often used to benchmark classification algorithms is digit recognition. The standard dataset is The MNIST database of handwritten digits. A digit is represented as $26 \times 26$ grid of grayscale pixels like the following examples.

Surprisingly, a simple nearest neighbor approach works well despite the curse of dimensionality. When using $1$ neighbor and euclidean distance, we find that the error rate is only 3.8% on 1,000 test cases. Here are some correct predictions.

Of course we get some predictions wrong. Here are some.

The notebook to generate these figures is here, KNN classifier on shuffled MNIST data.

### Model Selection

How should we know how many neighbors to use? For a parametric models, we can pick our models using a maximum likelihood estimate (MLE) or a Bayesian approach, using either the posterior mean or maximum a posteriori probability (MAP) estimate. However, nearest neighbors is a nonparametric model since the number of parameters increases with the amount of data.

To measure model performance, we use empirical measures, that is, we directly measure prediction accuracy. There are a few ways of going about this.

- We can test the accuracy on the data that we trained on. This is called the
*training set error*. Generally, it's not a very good metric since it's prone to overfitting. For instance, using 1 neighbor leads to 0% error. - We can withhold some of our data and call it the
*test set*. We fit our model on the data not in the test set and calculate the misclassification rate on the test set. This is the*test set error*. It generally gives a good idea how the model will perform with new data, but it cannot be used when you have a limited number of test cases and cannot afford to throw away data. - A compromise is the
*cross validation error*. Here, we split our data into $k$ folds. For each fold, we fit our model on the remaining $k - 1$ folds and use the chosen fold as our test set. In this way, we compute $k$ different error rates. Then, we take the average of those error rates.

All three approaches were tried and compared in this notebook, CV for KNN.

The test set error and cross validation error are quite close, which indicates that the cross validation error is a good proxy for the test set error when faced with limited data. The training set error will almost always underpredict the true error rate and is often vulnerable to overfitting. In this rare case, all three error rates actually agree that using 1 neighbor is optional, but usually, model that achieves the best training set error is overfitted to the data and predicts future test cases badly.

## Discriminant Analysis

Some of the simplest models are the various flavors of discriminant analysis. Suppose we have classes $k = 1,2,\ldots,K$. We make $N$ obervations $(\mathbf{x}_i, y_i)$, where $i \in \{1,2,\ldots, N\}$ and $y_i \in \{1,2,\ldots,K\}$. $\mathbf{x}_i$ is our data and $y_i$ is our class label. Based off some new data $\mathbf{x}^\prime$, we would want to predict its class label $y^\prime$.

Discriminant analysis makes the strong assumption that $\mathbf{x}$ is generated from a multivariate normal distribution. The distribution has different parameters for each class, so if $\mathbf{x}$ belongs to class $y = k$, $\left(\mathbf{x} \mid y = k\right) \sim \mathcal{N}(\boldsymbol\mu_k,\Sigma_k)$. An example would be trying to predict the gender of a person after observing their height and weight.

Using the MLE, we can fit a multivariate normal to males as I do in the notebook Whitening versus standardizing.

We can do a similar thing for our female observations. Thus, given an observation $\mathbf{x}$, we have estimated the distribution of $\mathbf{x} \mid y = k$. Now, the distribution of $y$ is just a multinomial distribution, with parameters $\boldsymbol\pi \in \left\{(\pi_1,\pi_2,\ldots, pi_K) \in \mathbb{R}^K : \sum_{k=1}^Kx_k = 1,~0 < x_k < 1\right\}$. The MLE estimate for $\boldsymbol\pi$ is just \begin{equation} \boldsymbol{\hat{\pi}} = \left(\frac{N_1}{N}, \frac{N_2}{N},\ldots,\frac{N_K}{N}\right), \end{equation} where $N_k$ is the number of our observations of class $k$.

Let $f(\mathbf{x} \mid \boldsymbol\mu,\Sigma)$ be the probability density function of a multivariate normal with mean $boldsymbol\mu$ and covariance $\Sigma$. Now, we apply Bayes' rule which gives us that \begin{align} p(y = k \mid \mathbf{x}) &= \frac{p(\mathbf{x} \mid y = k)p(y=k)}{\sum_{k^\prime = 1}^Kp(\mathbf{x} \mid y = k^\prime)p(y=k^\prime)}\nonumber\\ &= \frac{f(\mathbf{x} \mid \boldsymbol{\hat{\mu}}_k, \hat{\Sigma}_k)\frac{N_k}{N}}{\sum_{k^\prime = 1}^K f(\mathbf{x} \mid \boldsymbol{\hat{\mu}}_{k^\prime}, \hat{\Sigma}_{k^\prime})\frac{N_{k^\prime}}{N}} \nonumber\\ &= \frac{f(\mathbf{x} \mid \boldsymbol{\hat{\mu}}_k, \hat{\Sigma}_k)N_k}{\sum_{k^\prime = 1}^K f(\mathbf{x} \mid \boldsymbol{\hat{\mu}}_{k^\prime}, \hat{\Sigma}_{k^\prime})N_{k^\prime}}. \end{align}

This gives us the quadratic disriminant analysis (QDA) classifier. Now, we can fix $\Sigma_k = \Sigma$ for all $k$, which leads to further cancellations. This gives the linear discriminant analysis (LDA) classifier. The names quadractic and linear come from the shape of boundaries as seen in the notebook LDA/QDA on height/weight data.

In this case, they lead to the same misclassification rate.

See more notebooks as I add them at Machine Learning: a Probabilistic Perspective in Python.

Lately, I've come across several programming competitions with a strong mathematical component. You would think that this would be my strong suit, but I actually struggled on a few of them. For this reason, I'll write about some of the solutions here.

## Sharing Candies

Here's Sharing Candies from CodeChef.

Alvin and Berto have gotten tired of eating chocolates, so now they have decided to eat candies instead.

Alvin has $A$ apple candies, and Berto has $B$ banana candies. (I know, they have weird tastes.) Alvin and Berto always wants the split of candies to be as fair as possible. The problem is, Alvin only wants apple candies and Berto only wants banana candies!

Here comes Chef to the rescue! Chef bought an infinite number of candy packs. There are two types of packs:

- Packs containing exactly $C$ apple candies.
- Packs containing exactly $D$ banana candies.
Chef wants to give some (could be zero) apple candy packs to Alvin and some (could be zero) banana candy packs to Berto in such a way that the absolute difference between the number of candies they have is minimized. What is this minimum absolute difference?

Note that Chef doesn't want to open any pack; he gives each pack in full.

Let $x$ be the number of packs of apple candies that Chef gives Alvin and $y$ be the number of packs of banana candies. The absolute difference can be written \begin{equation} \left|(A + xC) - (B + yD)\right| = \left|(A - B) + (xC - yD)\right|. \end{equation}

Let $d$ be the greatest common denominator of $C$ and $D$. Then, by Bézout's identity, we have that \begin{equation} xC - yD = kd \end{equation} for any $k$ and some $x$ and $y$. Moreover, every integer solution $(x^\prime,y^\prime)$ belongs to the set \begin{equation} S_{x,y,k} = \left\{\left(x + l\frac{D}{d}, y + l\frac{C}{d}\right) : l \in \mathbb{Z}\right\}. \end{equation}

By choosing large $l$, we can have both $x^\prime \geq 0$ and $y ^\prime \geq 0$, so we have a nonnegative integer solutions. Thus, our solution is \begin{equation} \min\left(\left|A - B\right| \bmod d, d - \left(\left|A - B\right| \bmod d\right)\right) \end{equation} since those are two numbers closet to $0$ we can get by adding or subtracting $k$ to $A - B$. Note that we don't actually have to calculate $x$ and $y$, so we just find $d$ with the Euclidean algorithm. Here's the code.

```
#include <algorithm>
#include <cmath>
#include <climits>
#include <iostream>
using namespace std;
long long computeGcd(long long a, long long b) {
if (a < b) swap(a, b);
// now a >= b
if (b == 0) return a;
long long q = a/b;
a -= q*b;
return computeGcd(b, a);
}
long long minimizeDifference(long long A, long long B,
long long C, long long D) {
long long delta = A > B ? A - B : B - A;
long long gcd = computeGcd(C, D);
return min(delta % gcd, gcd - (delta % gcd));
}
int main(int argc, char *argv[]) {
ios::sync_with_stdio(false); cin.tie(NULL);
int T; cin >> T;
for (int t = 0; t < T; ++t) {
long long A, B, C, D;
cin >> A >> B >> C >> D;
cout << minimizeDifference(A, B, C, D) << '\n';
}
cout << flush;
return 0;
}
```

## Bear and Tower of Cubes

Here's Bear and Tower of Cubes from Codeforces.

Limak is a little polar bear. He plays by building towers from blocks. Every block is a cube with positive integer length of side. Limak has infinitely many blocks of each side length.

A block with side a has volume $a^3$. A tower consisting of blocks with sides $a_1,a_2,\ldots,a_k$ has the total volume $a_1^3 + a_2^3 + \cdots + a_k^3$.

Limak is going to build a tower. First, he asks you to tell him a positive integer $X$ — the required total volume of the tower. Then, Limak adds new blocks greedily, one by one. Each time he adds the biggest block such that the total volume doesn't exceed $X$.

Limak asks you to choose $X$ not greater than $m$. Also, he wants to maximize the number of blocks in the tower at the end (however, he still behaves greedily). Secondarily, he wants to maximize $X$.

Can you help Limak? Find the maximum number of blocks his tower can have and the maximum $X \leq m$ that results this number of blocks.

The key observation to make that I missed is realize that if $a$ is the greatest integer such that $a^3 \leq m$, then we should choose the first block to have length either $a$ or $a-1$.

For a proof of this fact, suppose that we choose our first block to be of length $b$. Then, it must be true that $b^3 \leq X \lneq (b+1)^3$. Our new volume limit is then $X - b^3$ after choosing a block of length $b$. Now, note that \begin{equation} 0 \leq X - b^3 < (b+1)^3 - b^3 = 3b^2 + 3b = 3b(b+1), \end{equation} In this way, the upper bound of our new volume limit $m^\prime = X - b^3$ increases as a function of $b$. Thus, it makes sense to choose $b$ as large as possible. But if we choose $b = a$, then $0 \leq m^\prime \leq m - a^3$ since it's possible that $m$ is not much larger than $a^3$.

Thus, for every volume limit we choose a block of length $a$ or $a - 1$. Since the volume limit decreases quickly, we can just use a brute force recursive solution and keep track of the number of blocks and total volume as we go. Here's the code.

```
#include <cmath>
#include <iostream>
#include <utility>
using namespace std;
pair<int, long long> findMaxBlocksAndVolumeHelper(long long volumeLimit,
int currentBlocks,
long long currentVolume) {
if (volumeLimit == 0) return make_pair(currentBlocks, currentVolume);
long long maxA = floor(cbrt(volumeLimit));
pair<int, long long> bigBlockState = findMaxBlocksAndVolumeHelper(volumeLimit - maxA*maxA*maxA,
currentBlocks + 1,
currentVolume + maxA*maxA*maxA);
if (maxA > 1) {
pair<int, long long> smallBlockState = findMaxBlocksAndVolumeHelper(maxA*maxA*maxA - 1 - (maxA - 1)*(maxA - 1)*(maxA - 1),
currentBlocks + 1,
currentVolume + (maxA - 1)*(maxA - 1)*(maxA - 1));
if (smallBlockState.first > bigBlockState.first ||
(smallBlockState.first == bigBlockState.first && smallBlockState.second > bigBlockState.second))
return smallBlockState;
}
return bigBlockState;
}
pair<int, long long> findMaxBlocksAndVolume(long long volumeLimit) {
return findMaxBlocksAndVolumeHelper(volumeLimit, 0, 0);
}
int main(int argc, char *argv[]) {
ios::sync_with_stdio(false); cin.tie(NULL);
long long M; cin >> M;
pair<int, long long> blockVolume = findMaxBlocksAndVolume(M);
cout << blockVolume.first << ' ' << blockVolume.second << endl;
return 0;
}
```

## Longest Increasing Subsequences

In another, CodeChef problem Longest Increasing Subsequences, we make use of Ternary numerical system.

Chef recently learned about the classic Longest Increasing Subsequence problem. However, Chef found out that while the length of the longest increasing subsequence is unique, the longest increasing subsequence itself is not necessarily unique; for example, in the array $[1, 3, 2, 4]$, there are two longest increasing subsequences: $[1, 3, 4]$ and $[1, 2, 4]$.

Chef decided to investigate on this more, and now he has sufficient mastery in it that he was able to come up with a problem:

Given an integer $K$, output an integer $N$ and a permutation of the array $[1, 2,\ldots, N]$ such that there are exactly $K$ longest increasing subsequences. Chef also requires that $1 \leq N \leq 100$, otherwise he found the problem is too easy.

In case there are multiple possible answers, any one will be accepted.

Consider blocks of decreasing sequences $B_1, B_2,\ldots,B_n$, where for all $x \in B_i$ and $y \in B_j$, $x < y$ if $i < j$. For example, we might have $n = 3$ and \begin{align*} B_1 &= [5,4,3,2,1] \\ B_2 &= [8,7,6] \\ B_3 &= [10,9]. \end{align*} If we concatenate these sequences, we have a permutation of $[1,2,...,10]$ such that length of a longest increasing sequence is 3. We can make any such sequence by choosing exactly one number from each block. Thus, the number of such sequences is $5 \times 3 \times 2 = 30$ in this case.

Now imagine all the blocks are of the same size, say $k$. We want to maximize the number of sequences that we can encode with such blocks. Fix $N$. Then, we have about $N/k$ blocks that can represent $f_N(k) = k^{N/k}$ longest increasing sequences with this strategy. We have that \begin{equation} f_N^\prime(k) = N\exp\left(\frac{N}{k}\log k\right) \frac{1 - \log k}{k^2}, \end{equation} so $f$ is maximized when at $e$. Since $k$ must be an integer, we set $k = \lceil e \rceil = 3$. Thus, our block size is fixed to be size $3$.

Now, suppose that we write \begin{equation} K = d_0 + d_13 + d_23^2 + \cdots + d_n3^n = \sum_{j = 0}^nd_j3^j, \end{equation} where $d_n \neq 0$. Suppose that $K \geq 9$, too. Then, we have $n$ blocks, $B_1,B_2,\ldots,B_n$ of size 3. If $d_n = 2$, we actually let $B_1$ be of size $6$. In this way, by concatenating $B_1,B_2,\ldots,B_n$ we have $d_n3^n$ longest increasing sequences.

Now suppose that $d_j > 0$. We can add $d_j3^j$ sequences by inserting a sequence between $B_{n - j}$ and $B_{n - j + 1}$. If $d_j = 1$, we need to add an increasing sequence of length $n - j$ such that all the numbers are greater than those in $B_{n-j}$ but less than those in $B_{n - j + 1}$. If $d_j = 2$, we add an increasing sequence of length $n - j + 1$ with the same properties, but we transpose the first numbers. As an example, consider $K = 71 = 2 \cdot 3^0 + 2 \cdot 3^1 + 3^2 + 2 \cdot 3^3$. We'll have 3 blocks. Since no $d_j = 0$, we need to interleave a sequence between every block. Thus, our complete sequence will be $$ \overbrace{[14,13,12,11,10,9]}^{B_1} [8] \overbrace{[17,16,15]}^{B_2} [6,5,7] \overbrace{[20,19,18]}^{B_3} [2,1,3,4], $$ which gives exactly $K = 71$ longest increasing sequences.

When $K < 9$, we can simple just use the sequence $[K,K-1,\ldots, 1]$. Here's the code.

```
#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
using namespace std;
const int MAX_N = 100;
vector<int> makeSequence(int K) {
if (K <= 8) { // subsequences of length 1
vector<int> sequence(K);
for (int i = 0; i < K; ++i) sequence[i] = K - i;
return sequence;
}
int bitCount = 0;
vector<int> base3K;
do {
base3K.push_back(K % 3);
K /= 3;
} while (K > 0);
/* maxBit this is the length of the longest increasing subsequence,
* K = d_0*3^0 + d_1*3^1 + ... + d_maxBit*3^maxBit
*/
int maxBit = base3K.size() - 1;
int N = 3*maxBit;
if (base3K.back() == 2) N += 3;
for (int i = 0; i <= maxBit - 1; ++i) {
if (base3K[i] > 0) {
N += maxBit - i;
if (base3K[i] == 2) ++N;
}
}
vector<int> sequence(N);
int currentIdx = 1;
int endCursor = N;
for (int i = 0; i < maxBit; ++i) { // interleave to get other ternary
if (base3K[i] > 0) {
int j = endCursor - (maxBit - i);
if (base3K[i] == 2) --j;
for (; j < endCursor; ++j) {
sequence[j] = currentIdx++;
}
endCursor -= maxBit - i;
if (base3K[i] == 2) { // if digit is 2
--endCursor;
swap(sequence[endCursor], sequence[endCursor + 1]);
}
}
sequence[--endCursor] = N - 2;
sequence[--endCursor] = N - 1;
sequence[--endCursor] = N;
N -= 3;
}
if (endCursor > 0) { // first digit in base 3 is 2
sequence[--endCursor] = N - 2;
sequence[--endCursor] = N - 1;
sequence[--endCursor] = N;
N -= 3;
swap(sequence[0], sequence[3]);
swap(sequence[1], sequence[4]);
swap(sequence[2], sequence[5]);
}
return sequence;
}
int main(int argc, char *argv[]) {
ios::sync_with_stdio(false); cin.tie(NULL);
int T; cin >> T;
for (int t = 0; t < T; ++t) {
int K; cin >> K;
vector<int> sequence = makeSequence(K);
cout << sequence.size() << '\n';
copy(sequence.begin(), sequence.end() - 1, ostream_iterator<int>(cout, " "));
cout << sequence.back() << '\n';
}
cout << flush;
return 0;
}
```

## Chef and Super Array

Another problem from CodeChef is Chef and Super Array.

Chef has a an array $A$ consisting of $N$ elements. He wants to add some elements into the array as per the below mentioned process.

After each minute, Chef iterates over the array in order from left to right, and takes every two neighbouring pair of elements, say $x$ and $y$, he adds a new element $x + y$ in the middle of elements $x$ and $y$.

For example, if initial array $A = \{1, 6, 9\}$.

- After first minute, the array A will be equal to $\{1, \mathbf{7}, 6, \mathbf{15}, 9\}$. Please note that the elements shown in the bold font are the newly added elements during first minute. As you can observe that $7 = 1 + 6$, and $15 = 6 + 9$.
- After second minute, the array will be $\{1, \mathbf{8}, 7, \mathbf{13}, 6, \mathbf{21}, 15, \mathbf{24}, 9\}$. Once again, elements added during the second minute, are shown in bold.
Chef wants to know the sum of elements between $x$th and $y$th positions in the array $A$ (i.e. $A_x + A_{x + 1} + \cdots + A_y$) after $m$ minutes. As the answer could be large, output it modulo $10^9+7 = 1000000007$. Please note that we use $1$-based indexing in the problem.

The important thing to note in this problem is the recursive structure of it. Consider an array $A = \{A_1,A_2,\ldots,A_N\}$. Let $A_i = a$ and $A_{i+1} = b$. Let $S_{a,b}(k)$ be the sum of elements between $a$ and $b$ after $k$ steps of the process. Clearly, $S_{a,b}(0) = 0$. Now, suppose $c$ is between $a$ and $b$. After a step, c appears $3$ times, and we add an additional $a$ and $b$. For example, $\{a,c,b\}$ becomes $\{a, a + c, c, b + c, b\}$. Thus, $S_{a,b}(k + 1) = 3S_{a,b}(k) + (a + b)$. Since we can write $S_{a,b}(0) = (3^0 - 1)\frac{a + b}{2},$ we have that in general, \begin{equation} S_{a,b}(k) = \left(3^k - 1\right)\frac{a + b}{2}. \label{eqn:sum_between} \end{equation}

Now if we use $0$-indexing, element $i$ of array $A$ has index $i2^m$ after $m$ steps. Suppose that we want to sum elements up to index $x$. If $i2^m \leq x$, we can simply use Equation \ref{eqn:sum_between} with $a = A_{i-1}$ and $b = A_i$.

What about the case when $(i-1)2^m \leq x < i2^m$ in general? Let $a = A_{i-1}$, $b = A_i$ and $c = a + b$. Set $x^\prime = x - (i-1)2^m$. Let $T_{a,b,m,x^\prime}$ be the sum of elements with indices from $(i-1)2^m$ to $x$. We have that \begin{equation} T_{a,b,m,x^\prime} = \begin{cases} a, &x^\prime = 0 \\ a + \left(3^m - 1\right)\frac{a + b}{2}, &x^\prime = 2^m - 1 \\ T_{a,c,m - 1, x^\prime}, &x^\prime < 2^{m - 1} \\ a + \left(3^{m - 1} - 1\right)\frac{a + c}{2} + T_{a,c,m - 1, x^\prime - 2^{m - 1}}, &x^\prime \geq 2^{m - 1} \end{cases} \end{equation} since we can imagine that the process starts after $1$ step and reaches this point after $m - 1$ steps. Here's the code for this.

```
#include <algorithm>
#include <iostream>
#include <iterator>
#include <vector>
using namespace std;
const int MOD = 1000000007;
const int MAX_M = 30;
long long p2[MAX_M + 1];
long long p3[MAX_M + 1];
/* consider the subarray newA[k],...,newA[l] after m minutes
* where newA[k] = a, newA[l] = b
*/
int subSum(int i, int a, int b, int m) {
if (i == 0) return a;
if (i == p2[m] - 1) return (a + (a + b)*(p3[m] - 1)/2) % MOD;
if (i < p2[m - 1]) return subSum(i, a, a + b, m - 1);
return (subSum(p2[m - 1] - 1, a, a + b, m - 1) + subSum(i - p2[m - 1], a + b, b, m - 1)) % MOD;
}
// find the sum of newA[0], newA[1],...,newA[i]
int cumulativeSum(long long i, const vector<int> &A, int m) {
if (i < 0) return 0;
int idx = 0;
int S = 0;
// in new array A[idx] is located at index idx*2^m
while (idx*p2[m] <= i) {
S += subSum(min(p2[m] - 1, i - idx*p2[m]), A[idx], A[idx + 1], m);
S %= MOD;
++idx;
}
return S;
}
int sum(const vector<int> &A, int m, long long x, long long y) {
int S = cumulativeSum(y, A, m) - cumulativeSum(x - 1, A, m);
S %= MOD;
if (S < 0) S += MOD;
return S;
}
int main(int argc, char *argv[]) {
ios::sync_with_stdio(false); cin.tie(NULL);
// precompute powers of 2 and 3
p2[0] = 1LL; p3[0] = 1LL;
for (int i = 1; i <= MAX_M; ++i) {
p2[i] = p2[i - 1]*2LL; p3[i] = p3[i - 1]*3LL;
}
// run through test cases
int T; cin >> T;
for (int t = 0; t < T; ++t) {
int N, m;
long long x, y;
cin >> N >> m;
cin >> x >> y;
--x; --y; // 0-index
vector<int> A; A.reserve(N + 1);
for (int i = 0; i < N; ++i) {
int a; cin >> a; A.push_back(a);
}
A.push_back(0); // dummy number padding
cout << sum(A, m, x, y) << '\n';
}
cout << flush;
return 0;
}
```

Recently, I came across two problems that required the convex hull trick. Often, we have a set of lines, and given a time $t$, we want to find out which line has the maximal (or minimal value). To make this more concrete, here's the first problem in which I came across this technique, Cyclist Race.

Chef has been invited to be a judge for a cycling race.

You may think about cyclists in the race as points on a line. All the cyclists start at the same point with an initial speed of zero. All of them move in the same direction. Some cyclists may force their cycles to speed up. The change of speed takes place immediately. The rest of the time, they move with a constant speed. There are N cyclists in total, numbered from 1 to N.

In order to award points to the participants, Chef wants to know how far the cyclists have reached at certain points of time. Corresponding to the above two, there will be two types of queries.

- Change the speed of cyclist i at some time.
- Ask for the position of the race leader at some time.
Return the results of the second type of query.

Now, it's not too hard to see that the distance traveled by each cyclist is a piecewise linear function. For each query of the second type, we could just evaluate all these peicewise linear functions and take the max, but there's a better way.

In this particular problem, the speed is always increasing. So if you look at the thick lines in the title picture that indicate which cyclist is in the lead, it forms the bottom of a convex hull, hence the name, the convex hull trick. The distance of the lead cyclist is also piecewise linear, so the goal becomes to merge the piecewise linear functions of all the cyclist into one.

Essentially, we'll want to create a vector $\left((x_0, l_0), (x_1,l_1),\ldots,(x_{n-1},l_{n-1})\right)$, where $l_i$ is a line, and $x_i$ is the $x$-coordinate at which the line becomes relavant. That is, line $l_i$ has the maximal value on the interval $\left[x_i, x_{i+1}\right]$, where $x_n = \infty$. In this way, if we sort our vector by $x_i$, then to find the maximal value at $x$, we can do a binary search.

To build this vector, we to first have all our lines sorted in ascending order by slope. We iterate through our lines, but we only want to keep some of them. Let us call our convex hull $C$. Initialize $C = \left((-\infty, l_0)\right)$. Now, suppose we encouter line $l$ and we need to make a decision. Recall that the slope $l$ will need be greater than or equal to any line line in $C$. There are several cases.

- If $C$ has at least two lines, consider the previous line $l^\prime$ and the line before $l^\prime$, $l^{\prime\prime}$. $l^\prime$ becomes relevant at the intersection with $l^{\prime\prime}$ at say $x^\prime$. If $l$ intersects $l^{\prime\prime}$ at $x$ and $x \leq x^\prime$ $l$ becomes relevant before $l^\prime$, so we remove $l^\prime$. Repeat until we remove as many unnecessary lines as possible. Then, if append to our vector $(x,l)$ where $x$ is the $x$-coordinate of the intersection of $l$ and $l^\prime$.
- If $C$ has only one line and $l$ has the same slope as that line, then only keep the line with the bigger $y$-intercept.
- If $C$ has only one line and $l$ has a larger slope, then append to our vector $(x,l)$ where $x$ is the $x$-coordinate of the $l$ and the one line in $C$.

See the title picture for an example. We'd initialize with the green line $\left(-\infty, y = 0.1x\right)$. Now, we're in case 3, so we'd add the blue line $(0, y = 0.2x)$ since the two lines intersect at $(0,0)$. For the thick red line, we find ourselves in case 1. We'll pop off the blue line and find ourselves in case 3 again, so now our vector is $C = \left((-\infty, y= 0.1x),(0,y=x/3)\right)$.

The next line we encounter is the green line $y = 0.5x - 0.8$. We're in case 1, but we can't pop off any lines since its intersection with other two lines is far to the right, so we simply append this line and its intersection with the thick red line, so we have $$C = \left((-\infty, y= 0.1x), (0,y=x/3), (4.8,y=0.5x-0.8)\right).$$ Our next line is thick blue line, $y = 2x/3 - 1.4$. This intersects the thick red line at $x = 4.2$, while the thick pervious green line intersects at $x =4.8$, so we'll pop off the green line $y = 0.5x-0.5$, and push the thick blue line on to the stack and get $C = \left((-\infty, y= 0.1x), (0,y=x/3), (4.2,y=2x/3-1.4)\right)$. Finally, we encounter the thick green line and our convex hull is $$ C = \left((-\infty, y= 0.1x), (0,y=x/3), (4.2,y=2x/3-1.4),(5.4, y=2x-8.6)\right). $$

Hopefully, it's clear that the natural data structure to keep track of the lines is a vector, which we'll use as a stack. Then, adding new lines is just a constant time operation. The most time-consuming operation is the initial sorting of the lines, so constructing the convex hull is $O(N\log N)$, where $N$ is the number of lines. Evaluating the distance at some $x$ is $O(\log N)$ using binary search as we discussed earlier. If we have $M$, queries the total running time will be $O\left((N + M)\log N \right)$.

Here's the code for this problem.

```
#include <algorithm>
#include <iostream>
#include <utility>
#include <vector>
using namespace std;
class Line {
private:
long double m, b; // y = mx + b
public:
Line(long double m, long double b) : m(m), b(b) {}
long double& slope() { return m; }
long double& yIntercept() { return b; }
pair<long double, long double> intersect(Line other) {
long double x = (this -> b - other.yIntercept())/(other.slope() - this -> m);
long double y = (this -> m)*x + this -> b;
return make_pair(x, y);
}
double getY(long double x) { return m*x + b; }
};
vector<pair<long double, Line>> makeConvexHull(vector<Line> &lines) {
int N = lines.size();
vector<pair<long double, Line>> convexHull; convexHull.reserve(N);
if (N == 0) return convexHull;
convexHull.emplace_back(0, lines.front());
for (int i = 1; i < N; ++i) {
// pop stack while new line's interesection with second to last line is to the left of last line
// note that slopes are strictly increasing
while (convexHull.size() > 1 &&
lines[i].intersect(convexHull[convexHull.size() - 2].second).first <= convexHull.back().first) {
convexHull.pop_back();
}
// check intersection with x = 0 when there's only 1 line
if (convexHull.size() == 1 && lines[i].yIntercept() >= convexHull.back().first) convexHull.pop_back();
convexHull.emplace_back(lines[i].intersect(convexHull.back().second).first, lines[i]);
}
return convexHull;
}
int main(int argc, char *argv[]) {
ios::sync_with_stdio(false); cin.tie(NULL);
int N, Q; cin >> N >> Q; // number of cyclists and queries
// current speed, time change, distance traveled so far
vector<Line> lines; // y = mx + b, first is m, second is b
lines.emplace_back(0,0);
vector<pair<int, long long>> cyclists(N, make_pair(0, 0)); // first is speed v, second is x, where x + vt is location at time t
vector<long long> queryTimes;
for (int q = 0; q < Q; ++q) {
int queryType; cin >> queryType;
long long time;
switch(queryType) {
case 1: // speed change
int cyclist, newSpeed;
cin >> time >> cyclist >> newSpeed;
--cyclist; // convert to 0 indexing
// suppose current function is x + vt
// new function is x + v*time + (t - time)*newSpeed
// (x + v*time - time*newSpeed) + newSpeed*t
// = (x + (v-newSpeed)*time) + newSpeed*t
cyclists[cyclist].second += time*(cyclists[cyclist].first - newSpeed);
cyclists[cyclist].first = newSpeed;
lines.emplace_back(newSpeed, cyclists[cyclist].second);
break;
case 2: // leader position
cin >> time;
queryTimes.push_back(time);
break;
}
}
// sort slopes in ascending order
sort(lines.begin(), lines.end(),
[](Line &a, Line &b) -> bool {
if (a.slope() != b.slope()) return a.slope() < b.slope();
return a.yIntercept() < b.yIntercept();
});
if (lines.size() == 1) { // there will always be at least 1 line
for (long long t : queryTimes) {
cout << (long long) lines.back().getY(t) << '\n';
}
} else {
// eliminate irrelevant lines, first is time where the line becomes relevant
vector<pair<long double, Line>> convexHull = makeConvexHull(lines);
// since times are strictly increasing just walk through lines 1 by 1
int cursor = 0;
for (long long t : queryTimes) {
while (cursor + 1 < convexHull.size() && convexHull[cursor + 1].first <= t) ++cursor;
cout << (long long) convexHull[cursor].second.getY(t) << '\n';
}
}
cout << flush;
return 0;
}
```

In this particular problem, negative times make no sense, so we can start at $x = 0$ for our convex hull.

## Applications to Dynamic Programming

In certain cases, we can apply this trick to a dynamic programming problem to significantly improve the running time. Consider the problem Levels and Regions.

Radewoosh is playing a computer game. There are $N$ levels, numbered $1$ through $N$. Levels are divided into $K$ regions (groups). Each region contains some positive number of consecutive levels.

The game repeats the the following process:

- If all regions are beaten then the game ends immediately. Otherwise, the system finds the first region with at least one non-beaten level. Let $X$ denote this region.
- The system creates an empty bag for tokens. Each token will represent one level and there may be many tokens representing the same level.

- For each already beaten level $i$ in the region $X$, the system adds $t_i$ tokens to the bag (tokens representing the $i$-th level).
- Let $j$ denote the first non-beaten level in the region $X$. The system adds $t_j$ tokens to the bag.
- Finally, the system takes a uniformly random token from the bag and a player starts the level represented by the token. A player spends one hour and beats the level, even if he has already beaten it in the past.
Given $N$, $K$ and values $t_1,t_2,\ldots,t_N$, your task is to split levels into regions. Each level must belong to exactly one region, and each region must contain non-empty consecutive set of levels. What is the minimum possible expected number of hours required to finish the game?

It's not obvious at all how the convex hull trick should apply here. Well, at least it wasn't obvious to me. Let's do some math first. Consider the case where we only have 1 region first consisting of levels $1,2,\ldots,n$. Let $T_i$ be the time at which we finish level $i$. Write \begin{equation} T_n = T_1 + (T_2 - T_1) + (T_3 - T_2) + \cdots + (T_n - T_{n-1}). \label{eqn:expectation_Tn} \end{equation}

$T_1 = 1$ since we'll just put $t_i$ tokens in the bag and draw from those $t_i$ tokens, so $t_i/t_i = 1$, so we always play and beat the first level immediately. Now $T_{i} - T_{i-1}$ is the time that it takes to be level $i$ given that levels $1,2,\ldots,i-1$ are beaten. This has a geometric distribution. Every time we try to beat level $i$, we'll put in $t_1 + t_2 + \cdots + t_i$ tokens in the back and try to get one of the $t_i$ tokens labeled $i$. The probability of doing so is \begin{equation} p = \frac{t_i}{\sum_{j=1}^i t_j}. \end{equation} Thus, we'll have that \begin{align} \mathbb{E}(T_i - T_{i-1}) &= p + 2(1-p)p + 3(1-p)p + \cdots = \sum_{k=1}^\infty k(1-p)^{k-1}p \nonumber\\ &= \frac{p}{\left(1-(1-p)\right)^2} = \frac{1}{p} \nonumber\\ &= \frac{\sum_{j=1}^i t_j}{t_i}. \label{eqn:expectation_delta_T} \end{align}

Now by linearity of expectation, applying Equation \ref{eqn:expectation_delta_T} to Equation \ref{eqn:expectation_Tn} will give us that \begin{equation} \mathbb{E}T_n = \sum_{i = 1}^n\frac{\sum_{j=1}^i t_j}{t_i}. \label{eqn:expection_T_n} \end{equation}

Now, define $E_{k,n}$ denote the minimum expected time to beat $n$ levels if we split them into $k$ regions. Note that each region must have at least 1 level, so this is only well-defined if $n \geq k$. Now, the levels must be beaten in order, so imagine putting levels $t_l,t_{l+1},\ldots,t_n$ into the last region. Since each region must have at least 1 level, we'll have that $k \leq l \leq n$, which gives us the recurrence relation \begin{equation} E_{k,n} = \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{\sum_{j=l}^i t_j}{t_i}\right\} \label{eqn:expectation_recurrence} \end{equation} by Equation \ref{eqn:expection_T_n}.

Now, suppose we define \begin{equation} S_k = \sum_{i=1}^k t_i ~\text{and}~ R_k = \sum_{i=1}^k \frac{1}{t_i}, \label{eqn:sum_dp} \end{equation} which leads to \begin{equation} E_{1,n} = \mathbb{E}T_n = \sum_{i=1}^n\frac{S_i}{t_i} \label{eqn:base_expectation} \end{equation} Equation \ref{eqn:expectation_recurrence} becomes \begin{align} E_{k,n} &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{\sum_{j=l}^i t_j}{t_i}\right\} \nonumber\\ &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{S_i - S_{l-1}}{t_i}\right\} \nonumber\\ &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \sum_{i = l}^n\frac{S_i}{t_i} - S_{l-1}\left(R_n - R_{l-1}\right)\right\} \nonumber\\ &= \inf_{k \leq l \leq n}\left\{E_{k - 1, l - 1} + \left(E_{1,n} - E_{1,l-1}\right) - S_{l-1}\left(R_n - R_{l-1}\right)\right\} \nonumber\\ &= E_{1,n} + \inf_{k \leq l \leq n}\left\{\left(E_{k - 1, l - 1} - E_{1,l-1} + S_{l-1}R_{l-1}\right) - S_{l-1}R_n\right\} \label{eqn:expectation_line} \end{align} by Equations \ref{eqn:base_expectation} and \ref{eqn:sum_dp}.

Do you see the lines of decreasing slope in Equation \ref{eqn:expectation_line}, yet? It's okay if you don't. Look at the expression inside the $\inf$. The $y$-intercept is in parentheses and the slope is $-S_{l-1}$ which is decreasing with $l$. So index our lines by $l$.

Fix $k$. Imagine that we have calculated $E_{j,n}$ for all $j < k$. Now, we're going to attempt to calculate $E_{k,k},E_{k,k+1},\ldots, E_{k,N}$ in that order. To calculate $E_{k,n}$ we only need to consider the lines $l = k,\ldots,n$. The intercept does not vary as a function of $n$, so we can add lines one-by-one. Before calculating $E_k,n$, we'll add the line with slope $-S_{n-1}$. Now our answer will be $E_{K,N}$.

In this way, it simplifies a $O(KN^2)$ problem into a $O(KN\log N)$ problem. Notice that here, instead of building our convex hull before making any queries like in the previous problem, we dynamically update it. Here's the code with some differences since $0$-indexing was used in the code.

```
#include <algorithm>
#include <iomanip>
#include <iostream>
#include <utility>
#include <vector>
using namespace std;
class Line {
private:
long double m, b; // y = mx + b
public:
Line(long double m, long double b) : m(m), b(b) {}
long double slope() { return m; }
long double& yIntercept() { return b; }
pair<long double, long double> intersect(Line other) {
long double x = (this -> b - other.yIntercept())/(other.slope() - this -> m);
long double y = (this -> m)*x + this -> b;
return make_pair(x, y);
}
double getY(long double x) { return m*x + b; }
};
int main(int argc, char *argv[]) {
ios::sync_with_stdio(false); cin.tie(NULL);
int N, K; cin >> N >> K; // number of levels and regions
vector<int> T; T.reserve(N); // tokens
for (int i = 0; i < N; ++i) {
int t; cin >> t;
T.push_back(t);
}
vector<long long> S; S.reserve(N); // cumulative sum of tokens
vector<long double> R; R.reserve(N); // cumulative sum of token reciprocals
S.push_back(T.front());
R.push_back(1.0/T.front());
for (int n = 1; n < N; ++n) {
S.push_back(S.back() + T[n]);
R.push_back(R.back() + 1.0L/T[n]);
}
/* let eV[k][n] be the minimum expected time to beat
* levels 0,1,...,n-1 spread across regions 0,1,...,k-1
*/
vector<vector<long double>> eV(K, vector<long double>(N)); // only indices eV[k][n] with n >= k are valid
eV.front().front() = 1;
for (int n = 1; n < N; ++n) { // time to beat each level is a geometric distribution
eV[0][n] = eV[0][n-1] + ((long double) S[n])/T[n];
}
/* eV[k][n] = min(eV[k-1][l-1] + (S[l]-S[l-1])/t_l + ... + (S[n] - S[l-1])/t_{n}),
* where we vary k <= l < n. That is, we're putting everything with index at least l
* in the last region. Note that each region needs at least 1 level. This simplifes to
* eV[k][n] = min(eV[k-1][l-1] + eV[0][n] - eV[0][l-1] - S[l-1](R[n] - R[l-1])
* = eV[0][n] + min{(eV[k-1][l-1] - eV[0][l-1] + S[l-1]R[l-1]) - S[l-1]R[n]},
* Thus, for each l we have a line with slope -S[l - 1].
* -S[l-1] is strictly decreasing and R[n] is strictly increasing.
* Use the convex hull trick.
*/
vector<pair<long double, Line>> convexHull; convexHull.reserve(N);
for (int k = 1; k < K; ++k) {
/* in convex hull we add lines in order of decreasing slope,
* the the convexHull[i].first is the x-coordinate where
* convexHull[i].second and convexHull[i-1].second intersect
*/
convexHull.clear();
int cursor = 0;
for (int n = k; n < N; ++n) { // calculate eV[k][n]
/* add lines l = k,...,n to build convex hull
* loop invariant is that lines l = k,...,n-1 have been added, so just add line l = n
*/
long double slope = -S[n - 1];
long double yIntercept = eV[k - 1][n - 1] - eV[0][n - 1] + S[n - 1]*R[n - 1];
Line l(slope, yIntercept);
while (convexHull.size() > 1 &&
convexHull.back().first >= l.intersect(convexHull[convexHull.size() - 2].second).first) {
convexHull.pop_back(); // get rid of irrelevant lines by checking that line intersection is to the left
}
// check intesection with x = 0 if no lines left
if (convexHull.size() == 1 && l.yIntercept() <= convexHull.back().second.yIntercept()) convexHull.pop_back();
convexHull.emplace_back(convexHull.empty() ? 0 : l.intersect(convexHull.back().second).first, l); // add line
/* binary search for the correct line since they are sorted by x intersections
* lower bound is old cursor since x coordinate of intersection is monotonically decreasing
* and R[n] is increasing, too
*/
if (cursor >= convexHull.size()) cursor = convexHull.size() - 1;
cursor = upper_bound(convexHull.begin() + cursor, convexHull.end(), R[n],
[](long double x, pair<long double, Line> l) { return x < l.first; }) - convexHull.begin();
--cursor; // since we actually want the last index that is less than or equal to
eV[k][n] = eV[0][n] + convexHull[cursor].second.getY(R[n]);
}
}
cout << fixed;
cout << setprecision(10);
cout << eV.back().back() << endl;
return 0;
}
```

By the way the drawing was done with GeoGebra. It's a pretty cool way to do math.

One of the features of Snapstream Searcher is matrix search. One can search for a whole list of terms over a date range and receive a matrix of co-occurrences, where a co-occurrence is defined as two terms being mentioned in the same program within a specified number of characters.

One way to visualize such data is as a graph. Each country is a node. Between each pair of nodes, we place an edge which is weighted according to the strength of their relationship. We'd suspect that countries that frequently co-occur will form clusters.

## Spring Embedding

To accomplish this clustering, I used spring embedding. Suppose we have $N$ nodes, labeled from $1$ to $N$. Between nodes $i$ and $j$, we place a string of length $l_{ij}$ with spring constant $k_{ij}$. Recall that Hooke's law states that the force needed to stretch or compress the spring to a length $d$ is $F(d) = k_{ij}(d - l_{ij})$, which implies that spring has potential energy $$ E(d) = \frac{1}{2}k_{ij}(d-l_{ij})^2. $$ Suppose each node $i$ has position $(x_i,y_i)$ and node $j$ has position $(x_j,y_j)$. Define the distance between two nodes $$ d(i,j) = \sqrt{(x_j-x_i)^2 + (y_j-y_i)^2}. $$ The total energy of the system is then \begin{equation} E = \frac{1}{2}\sum_{i=1}^{N-1}\sum_{j=i+1}^{N}k_{ij}\left(d(i,j) - l_{ij}\right)^2 \end{equation} If we call $l_{ij}$ the ideal length of the spring, deviations from the ideal length lead to higher energy. We want to choose $(x_i, y_i)$ for all $i = 1,2,\ldots,N$ such that the total energy is minimized. We can do this with a steepest gradient descent method. Much to my surprise, I actually used something from my Numerical Analysis class.

## Implementation Issues

To actually implement this algorithm a couple of issues must be addressed:

- computing the $l_{ij}$ as a function of occurrences and co-occurrences, and
- normalizing $l_{ij}$ so the nodes fit on the canvas.

For the first issue, if there are a lot of co-occurrences, we want the nodes to be more closely linked. But nodes that are mentioned frequently like USA would have the most co-occurrences with every other node by chance since it appears most frequently in general. To address this issue some normalization is done. Let $S$ be the total number of occurrences of all search terms. Let $R_i$ be the number of occurrences of term $i$ and $C_j$ the number of occurrences of term $j$. Finally, let $M_{ij}$ be the number of co-occurrences of terms $i$ and $j$. We define \begin{equation} A_{ij} = \frac{\left(M_{ij}/S\right)}{\left(R_i/S\right)\left(C_j/S\right)} = \frac{SM_{ij}}{R_iC_j}, \end{equation} which you can intuitively think of as the proportion of co-occurrences we actaully observed over the number of co-occurrences that we would expect if the occurrences were independent of each other.

Now, since more co-occurrences than expected should lead to a smaller ideal distance, we define \begin{equation} l_{ij} = \frac{1}{c + A_{ij}}, \end{equation} where we chose $c = 0.01$.

Note that $l_{ij} \in (0,1/c)$ which is between $0$ and $10$ in the case that $c = 0.01.$ To plot this we need to translate this distance into pixels. We don't want the minimum distance to be too small because than the nodes will be on top of each other. Nor do we want the max distance to be too big since the nodes will fall off the canvas. We apply an affine transformation from $l_{ij}$ to $[L, U].$ Let $l^* = \max_{i,j}l_{ij}$ and $l_* = \min_{i,j}l_{ij},$ and define \begin{equation} l_{ij}^\prime = L + \frac{l_{ij} - l_*}{l^* - l_*}(U - L). \end{equation} I simply chose to fix $L = 80.$ To choose $U$, I used a technique that I learned from programming contests. We want to vary $U$ until all the nodes fit in the canvas. Running the spring embedding algorithm is very expensive however, so we can't try all possible values of $U$. Thus, we do a binary search and find the largest $U$ such that all the nodes fit. This solves the second issue.

You can this algorithm implemented at Country Relationships. Play around with the graph, and let me know what you think!

## Analysis

If you look at Country Relationships, right away you see a cluster of Middle Eastern countries with Russia, France, Belgium, and Israel as bridge to the United States. China also places a central role and is closely related to Vietnam and Japan.

If you change the time period to December 2015 and click `Spring Embed`

again to re-cluster, inexplicably the Philippines and Colombia are strongly related. Recall that Steve Harvey mixed up the winners of Miss Universe 2015.

In January 2016, North Korea appears, for it claimed to test a hydrogen bomb. Brazil grows larger with as the fear of the Zika virus takes grip.

In February 2016, Cuba becomes larger due to President Obama announcing that he'll visit. Brazil also gets even larger as Zika fears intensify.

In March 2016, Belgium becomes very large due to the terrorist attack. Of course, Ireland makes its debut thanks to St. Patrick's Day.

In April 2016, Ecuador appears due to the earthquake. It so happens that Japan had earthquakes, too. News programs often group earthquake reporting together, so Ecuador and Japan appear to be closely related.

Try making your own graph by doing a matrix search here!

I recently added equation numbering to my blog. To see it in action, here's one of my homework exercises. I like this problem because I actually found it useful to diagonalize to exponentiate the matrix.

Let $\{X_n\}$ be the random walk on the space $\{0,1,2,3,4\},$ with $0$ and $4$ absorbing and with $p(i,i) = 1/2,$ $p(i,i+1) = p(i,i-1)= 1/4$ for $1 \leq i \leq 3.$ Let $\tau$ denote the absorption time. The graph can be seen in the title picture.

## Part A

What is the limit as $n \rightarrow \infty$ of the law $X_n$ conditioned on non-absorption, that is, $$\lim_{n \rightarrow \infty}\mathcal{L}(X_n \mid \tau > n)?$$

### Solution

We have that $\displaystyle \boxed{ \lim_{n \rightarrow \infty}\mathcal{L}\left(X_n \mid \tau > n\right) = \begin{cases} 1 - \frac{1}{\sqrt{2}}, &X_n \in \{1,3\} \\ \sqrt{2} - 1, &X_n = 2. \end{cases}}$

To see this, without loss of generality, let $X_0 = 2.$ We can do this since if $X_0 = 1$ or $X_0 = 3,$ then $\mathbb{P}(X_m = X_0,~\forall m \leq n \mid \tau > n) \rightarrow 0$ as $n \rightarrow \infty.$ Then, we can apply the Markov property.

Now, define $p_n^*(x,y) = \mathbb{P}(X_{n} = y \mid \tau > n,~X_0 = 2).$
We have that

\begin{equation}
p^*_1(2,y) = \begin{cases}
1/4, &y \in \{1,3\} \\
1/2, &y = 2.
\end{cases}
~\text{and, in general,}~
p_n^*(2,y) = \begin{cases}
a_n, &y \in \{1,3\} \\
b_n = 1 - 2a_n, &y = 2
\end{cases}
\label{eqn:3a_a_def}
\end{equation}
by symmetry, where $a_1 = 1/4$ and $b_1 = 1/2 = 1 - 2a_1.$

Now, we have that
\begin{align}
a_{n+1} = \mathbb{P}(X_{n+1} = 1 \mid \tau > n + 1,~X_0 = 2)
&= p_{n+1}^*(2,1) \nonumber\\
&= \frac{p_n^*(2,1)p(1,1) + p_n^*(2,2)p(2,1)}{\mathbb{P}(\tau > n +
1 \mid X_0 = 2, \tau > n)} \nonumber\\
&= \frac{p_n^*(2,1)p(1,1) + p_n^*(2,2)p(2,1)}{1 -
p_n^*(2,1)p(1,0) - p_n^*(2,3)p(3,4)} \nonumber\\
&= \frac{a_n(1/2) + b_n(1/4)}{1 - (2a_n/4)} \nonumber\\
&= \frac{a_n(1/2) + (1-2a_n)(1/4)}{(4 - 2a_n)/4} \nonumber\\
&= \frac{1}{4 - 2a_n}.
\label{eqn:3a_a_recurs}
\end{align}
$a_n$ converges by induction because
$$
|a_{n+1} - a_n| = \left|\frac{1}{4-2a_n} -
\frac{1}{4-a_{n-1}}\right|
= \left|\frac{4 - 2a_{n-1} - 4 +
2a_n}{(4-2a_{n})(4-2a_{n-1})}\right|
\leq \frac{1}{2}\left|a_n-a_{n-1}\right|
$$
since $0 \leq a_n \leq 1$ for all $n.$ Solving for $a$ in
\begin{equation}
a = \frac{1}{4-2a} \Rightarrow 2a^2 - 4a + 1 = 0 \Rightarrow
a = 1 \pm \frac{1}{\sqrt{2}}.
\label{eqn:3a_a}
\end{equation}

Thus, we must have that $a_n \rightarrow 1 - \frac{1}{\sqrt{2}}$
since $a_n \leq 1$ for all $n.$ Then, we have that $b_n
\rightarrow b = 1 - 2a = \sqrt{2} - 1.$

## Part B

What is the limiting distribution conditioned on *never*
getting absorbed, that is,
$$
\lim_{n \rightarrow \infty}\lim_{M \rightarrow \infty}
\mathcal{L}(X_n \mid \tau > M)?
$$

### Solution

We have that $\displaystyle\boxed{\lim_{n \rightarrow \infty}\lim_{M \rightarrow \infty}\mathcal{L}(X_n \mid \tau > M)= \begin{cases}1/4, &X_n \in \{1,3\}\\ 1/2, &X_n = 2.\end{cases}}$

Again, we can assume without loss of generality that $X_0 = 2.$ Fix $n \in \mathbb{N}.$ From Equation \ref{eqn:3a_a_def} in the previous part, we know $\mathbb{P}(X_n = 2\mid \tau > n) = 1-2a_{n},$ where we define $a_0 = 0.$ Now, suppose, we know $\mathbb{P}(X_n = 2 \mid \tau > M)$ for $M \geq n.$ Then, by Bayes' theorem, \begin{align} \mathbb{P}(X_n = 2 \mid \tau > M + 1) &= \mathbb{P}(X_n = 2 \mid \tau > M + 1, \tau > M) \nonumber\\ &= \frac{\mathbb{P}(\tau > M + 1 \mid X_n = 2, \tau > M)\mathbb{P}(X_n = 2 \mid \tau > M)}{\mathbb{P}(\tau > M + 1 \mid \tau > M)}. \label{eqn:3b_bayes} \end{align} Calculating the two unknown factors, we have that in the denominator, \begin{align} \mathbb{P}(\tau > M + 1 \mid \tau > M) &= \sum_{x=1}^3 \mathbb{P}(\tau > M + 1, X_M = x \mid \tau > M) \nonumber\\ &= \frac{3}{4}\left(\mathbb{P}(X_M = 1 \mid \tau > M) + \mathbb{P}(X_M = 3 \mid \tau > M)\right) + \mathbb{P}(X_M = 2 \mid \tau > M)\nonumber\\ &= \frac{3}{4}\left(a_{M} + a_{M}\right) + (1-2a_{M}) \nonumber\\ &= 1 - \frac{1}{2}a_{M}, \label{eqn:3b_tau} \end{align} and in the numerator, \begin{align} \mathbb{P}(\tau > M + 1 \mid X_n = 2, \tau > M) &= \sum_{x=1}^3\mathbb{P}(\tau > M + 1, X_M = x \mid X_n = 2, \tau > M)\nonumber\\ &= \sum_{x=1}^3\mathbb{P}(\tau > M + 1, X_M = x \mid X_n = 2, \tau > M)\nonumber\\ &= \frac{3}{2}\mathbb{P}(X_M = 1 \mid X_n = 2, \tau > M) + \mathbb{P}(X_M = 2 \mid X_n = 2, \tau > M) \nonumber\\ &= \frac{3}{2}a_{M - n} + 1 - 2a_{M - n} \nonumber\\ &= 1 - \frac{1}{2}a_{M - n}, \label{eqn:3b_bayes_num} \end{align} where we use that $\mathbb{P}(\tau > M + 1, X_M = 1 \mid X_n = 2, \tau > M) = \mathbb{P}(\tau > M + 1, X_M = 3 \mid X_n = 2, \tau > M),$ and the fact that $\mathbb{P}(X_M = x \mid X_n = 2, \tau > M)$ is the probability of a chain starting at $2$ that doesn't hit an absorbing state in $M - n$ transistions.

Putting together Equations \ref{eqn:3b_bayes_num}, \ref{eqn:3b_tau}, and \ref{eqn:3b_bayes}, we have that \begin{equation*} \mathbb{P}(X_n = 2 \mid \tau > M + 1) = \frac{1 - \frac{1}{2}a_{M-n}}{1 - \frac{1}{2}a_{M}}\mathbb{P}(X_n = 2 \mid \tau > M), \end{equation*} and by induction, we'll have that \begin{equation*} \mathbb{P}(X_n = 2 \mid \tau > M) = (1 - 2a_{n}) \prod_{k=n}^{M - 1} \frac{1 - \frac{1}{2}a_{k-n}}{1 - \frac{1}{2}a_{k}}, \end{equation*} where the product is just $1 - 2a_{n}$ if $M = n.$ For large enough $M,$ when $k \geq 2n$ the factors in the numerator will cancel, so we will have that for $M \geq 2n$ that \begin{align} \mathbb{P}(X_n = 2 \mid \tau > M) &= (1 - 2a_{n}) \prod_{k=0}^{n - 1}\left(1-\frac{1}{2}a_{k}\right) \prod_{k=M - n}^{M-1}\frac{1}{1-\frac{1}{2}a_{k}} \nonumber\\ &= (1 - 2a_{n}) \prod_{k=0}^{n - 1}\left(4-2a_{k}\right) \prod_{k=M - n}^{M-1}\frac{1}{4-2a_{k}} \nonumber\\ &= (1 - 2a_{n}) \prod_{k=1}^{n}\frac{1}{a_{k}} \prod_{k=M - n + 1}^{M}a_k \label{eqn:3b_p2} \end{align} by the recursive definition of $a_n$ in Equation \ref{eqn:3a_a_recurs}.

Taking the limit as $M \rightarrow \infty,$ we have that \begin{equation} \lim_{M \rightarrow \infty} \mathbb{P}(X_n = 2 \mid \tau > M) = (1 - 2a_{n})a^{n}\prod_{k=1}^{n}\frac{1}{a_{k}}. \label{eqn:3b_p_lim} \end{equation} by Equation \ref{eqn:3a_a}, where $\displaystyle a = 1 - \frac{1}{\sqrt{2}} = \frac{1}{2 + \sqrt{2}}.$

Define $c_{-1} = 0$ and $c_0 = 1/4,$ so $a_0 = 0 = \frac{c_{-1}}{c_0}.$ In general, we can write $a_n = \frac{c_{n-1}}{c_n},$ where $c_n = -2c_{n-2} + 4c_{n-1}$ for $k \geq 1.$ To see this, by definition of $a_n$ in Equation \ref{eqn:3a_a_recurs}, \begin{equation} a_{n + 1} = \frac{1}{4 - 2a_n} = \frac{1}{4 - 2\frac{c_{n-1}}{c_{n}}} = \frac{c_n}{-2c_{n-1} + 4c_n} = \frac{c_n}{c_{n+1}}. \label{eqn:3b_a_c} \end{equation}

Now, with Equation \ref{eqn:3b_a_c}, Equation \ref{eqn:3b_p_lim} becomes \begin{equation} \lim_{M \rightarrow \infty} \mathbb{P}(X_n = 2 \mid \tau > M) = (1 - 2a_{n})a^{n}\prod_{k=1}^{n}\frac{c_k}{c_{k-1}} = (1 - 2a_{n})a^{n}\frac{c_{n}}{c_0} \label{eqn:3b_p_lim_2} \end{equation}

Note, that we can rewrite $c_n$ by exponentiating a matrix: \begin{align} \begin{pmatrix} c_{n} \\ c_{n + 1} \end{pmatrix} &= \begin{pmatrix} 0 & 1 \\ -2 & 4 \end{pmatrix}^n \begin{pmatrix} c_0 \\ c_1 \end{pmatrix} \nonumber\\ &= \begin{pmatrix} 1 & 1 \\ 2 + \sqrt{2} & 2 - \sqrt{2} \end{pmatrix} \begin{pmatrix} 2 + \sqrt{2} & 0 \\ 0 & 2 - \sqrt{2} \end{pmatrix}^n \begin{pmatrix} \frac{1}{2}-\frac{1}{\sqrt{2}} & \frac{1}{2\sqrt{2}} \\ \frac{1}{2}+\frac{1}{\sqrt{2}} & -\frac{1}{2\sqrt{2}} \end{pmatrix} \begin{pmatrix} 1/4 \\ 1 \end{pmatrix} \nonumber\\ &= \begin{pmatrix} \frac{1}{8}\left((2+\sqrt{2})^n(1+\sqrt{2}) + (2-\sqrt{2})^n(1-\sqrt{2})\right) \\ \frac{1}{8}\left((2+\sqrt{2})^{n+1}(1+\sqrt{2}) + (2-\sqrt{2})^{n+1}(1-\sqrt{2})\right) \end{pmatrix} \label{eqn:3b_c_matrix} \end{align} by diagonalization. Using Equation \ref{eqn:3b_c_matrix}, Equation \ref{eqn:3b_p_lim_2} becomes \begin{equation} \lim_{M \rightarrow \infty} \mathbb{P}(X_n = 2 \mid \tau > M) = \frac{1}{2}(1 - 2a_{n})\left((1+\sqrt{2}) + \left(\frac{2-\sqrt{2}}{2+\sqrt{2}}\right)^n(1-\sqrt{2})\right), \label{eqn:3b_p_lim_3} \end{equation} where we recall that $a = \frac{1}{2 + \sqrt{2}}.$ $|2-\sqrt{2}| < 1$ and so, taking the limit as $n \rightarrow \infty$ of Equation \ref{eqn:3b_p_lim_3}, the second term goes to $0$: \begin{equation} \lim_{n\rightarrow 0}\lim_{M \rightarrow \infty} \mathbb{P}(X_n = 2 \mid \tau > M) = \frac{1-2a}{2}(1+\sqrt{2}) = \frac{\sqrt{2} - 1}{2}(1+\sqrt{2}) = \frac{1}{2}. \label{eqn:3b_p_lim_lim} \end{equation} And finally, by symmetry, \begin{align} \lim_{n\rightarrow 0}\lim_{M \rightarrow \infty} \mathbb{P}(X_n = 1 \mid \tau > M) &= \lim_{n\rightarrow 0}\lim_{M \rightarrow \infty} \mathbb{P}(X_n = 3 \mid \tau > M) \nonumber\\ &= \frac{1 - \lim_{n\rightarrow 0}\lim_{M \rightarrow \infty} \mathbb{P}(X_n = 2 \mid \tau > M)}{2} \nonumber\\ &= \frac{1}{4}. \label{eqn:3b_p_1_3} \end{align} Equations \ref{eqn:3b_p_lim_lim} and \ref{eqn:3b_p_1_3} combine to give us the limiting distribution.

## Graphs in $\LaTeX$

Here's the code for the graph. I used a package called `tikz`

.

```
\documentclass[10pt]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{tikz}
\usetikzlibrary{arrows}
\usepackage[active, tightpage]{preview}
\PreviewEnvironment{tikzpicture}
\renewcommand\PreviewBbAdjust {-0bp -25bp 0bp 25bp}
\begin{document}
\begin{tikzpicture}[->,>=stealth',shorten >=1pt,auto,node distance=1.75cm,
thick,main node/.style={circle,draw}]
\node[main node] (0) {$0$};
\node[main node] (1) [right of=0] {$1$};
\node[main node] (2) [right of=1] {$2$};
\node[main node] (3) [right of=2] {$3$};
\node[main node] (4) [right of=3] {$4$};
\path[every node/.style={font=\sffamily}]
(0) edge [loop above] node {$1$} (0)
(0)
(1) edge [bend right] node [below] {$1/4$} (2)
edge [loop above] node {$1/2$} (1)
edge node [below] {$1/4$} (0)
(1)
(2) edge [bend right] node [above] {$1/4$} (1)
edge [loop below] node {$1/2$} (2)
edge [bend left] node [above] {$1/4$} (3)
(2)
(3) edge [bend left] node [below] {$1/4$} (2)
edge [loop above] node {$1/2$} (3)
edge node [below] {$1/4$} (4)
(3)
(4) edge [loop above] node {$1$} (4);
\end{tikzpicture}
\end{document}
```

Then, to convert the graph to a `png`

file with a transparent background, I used ImageMagick.

```
convert -density 300 graph.pdf -format png graph.png
```