Photo URL is broken

Before I left Philly, we took a short road trip down to Baltimore to visit Tim Wu. When we got there, the first thing we did was get some Korean BBQ at 1:30 AM at Honey Pig. We don't do so well with alcohol, so we only had 1 bottle of Soju between the 5 of us. Despite having to sleep on the floor with only substandard air conditioning, I ended up sleeping quite well that night.

The next day, we went fishing and crabbing. Unfortunately, this was not a success. With the exception of Dan Wu's fish, we did not catch anything despite spending nearly 5 hours.

Dan Wu's catch

As Tim and I took a walk around Fort Smallwood Park, we found a beach and an approximately 7-year-old kid that told us we could find clams by digging into the sand with our feet. Desperate to catch anything, I heeded his advice and waded into Chesapeake Bay and attuned myself to the sensations of my feet. At first, I thought that they were nothing more than smooth rocks, but after 5 minutes or so, I took a dive and had my first clam. Within the next half hour, I had about a dozen more.

As you can see in the title picture, I ended up steaming those clams and eating them with butter. They were a bit sandy, but otherwise, they were great. After descaling and cleaning the fish, I steamed him or her, too, for maybe 6 ounces of meat? All told, we got 1 solid meal for a single person for 5 hours of effort from 7 people. It's probably the most Paleo thing that I've ever made since I not only cooked but also caught those clams by hand. It doesn't look like we'd survive in Paleolithic times, though, with our fishing skills.

Steamed fish

Since we couldn't catch enough to eat, we ended our fishing trip with some crab cakes from G & M. Usually, I think of crab cakes as pretty poor value propositions since they tend to be small without much crab meat. Here, I was proven wrong as these crab cakes were huge and full of protein. I don't think any of us left hungry. Finally, we had a nice romantic walk along the harbor. Well, in my brother's case, it was more of a Poké Walk.

Friends at the harbor. Photo Credits: Masato Sugeno

Photo URL is broken

I rarely apply anything that I've learned from competitive programming to an actual project, but I finally got the chance with Snapstream Searcher. While computing daily correlations between countries (see Country Relationships), we noticed a big spike in Austria and the strength of its relationship with France as seen here. It turns out Wendy's ran an ad with this text.

It's gonna be a tough blow. Don't think about Wendy's spicy chicken. Don't do it. Problem is, not thinking about that spicy goodness makes you think about it even more. So think of something else. Like countries in Europe. France, Austria, hung-a-ry. Hungry for spicy chicken. See, there's no escaping it. Pffft. Who falls for this stuff? And don't forget, kids get hun-gar-y too.

Since commercials are played frequently across a variety of non-related programs, we started seeing some weird results.

My professor Robin Pemantle has this idea of looking at the surrounding text and only counting matches that had different surrounding text. I formalized this notion into something we call contexts. Suppose that we're searching for string $S$. Let $L$ be the $K$ characters the left and $R$ the $K$ characters to the right. Thus, a match in a program is a 3-tuple $(S,L,R)$. We define the following equivalence relation: given $(S,L,R)$ and $(S^\prime,L^\prime,R^\prime)$, \begin{equation} (S,L,R) \sim (S^\prime,L^\prime,R^\prime) \Leftrightarrow \left(S = S^\prime\right) \wedge \left(\left(L = L^\prime\right) \vee \left(R = R^\prime\right)\right), \end{equation} so we only count a match as new if and only if both the $K$ characters to the left and the $K$ characters to right of the new match differ from all existing matches.

Now, consider the case when we're searching for a lot of patterns (200+ countries) and $K$ is large. Then, we will have a lot of matches, and for each match, we'll be looking at $K$ characters to the left and right. Suppose we have $M$ matches. Then, we're looking at $O(MK)$ extra computation since to compare each $L$ and $R$ with all the old $L^\prime$ and $R^\prime$, we would need to iterate through $K$ characters.

One solution to this is to compute string hashes and compare integers instead. But what good is this if we need to iterate through $K$ characters to compute this hash? This is where the Rabin-Karp rolling hash comes into play.

Rabin-Karp Rolling Hash

Fix $M$ which will be the number of buckets. Consider a string of length $K$, $S = s_0s_1s_2\cdots s_{K-1}$. Then, for some $A$, relatively prime to $M$, we define our hash function \begin{equation} H(S) = s_0A^{0} + s_1A^{1} + s_2A^2 + \cdots + s_{K-1}A^{K-1} \pmod M, \end{equation} where $s_i$ is converted to an integer according to ASCII.

Now, suppose we have a text $T$ of length $L$. Define \begin{equation} C_j = \sum_{i=0}^j t_iA^{i} \pmod{M}, \end{equation} and let $T_{i:j}$ be the substring $t_it_{i+1}\cdots t_{j}$, so it's inclusive. Then, $C_j = H(T_{0:j})$, and \begin{equation} C_j - C_{i - 1} = t_iA^{i} + t_{i+1}A^{i+1} + \cdots + t_jA^j \pmod M, \end{equation} so we have that \begin{equation} H(T_{i:j}) = t_iA^{0} + A_{i+1}A^{1} + \cdots + t_jA_{j-i} \pmod M = A^{-i}\left(C_j - C_{i-1}\right). \end{equation} In this way, we can compute the hash of any substring by simple arithmetic operations, and the computation time does not depend on the position or length of the substring. Now, there are actually 3 different versions of this algorithm with different running times.

  1. In the first version, $M^2 < 2^{32}$. This allows us to precompute all the modular inverses, so we have a $O(1)$ computation to find the hash of a substring. Also, if $M$ is this small, we never have to worry about overflow with 32-bit integers.
  2. In the second version, an array of size $M$ fits in memory, so we can still precompute all the modular inverses. Thus, we continue to have a $O(1)$ algorithm. Unfortunately, $M$ is large enough that there may be overflow, so we must use 64-bit integers.
  3. Finally, $M$ becomes so large that we cannot fit an array of size $M$ in memory. Then, we have to compute the modular inverse. One way to do this is the extended Euclidean algorithm. If $M$ is prime, we can also use Fermat's little theorem, which gives us that $A^{i}A^{M-i} \equiv A^{M} \equiv 1 \pmod M,$ so we can find $A^{M - i} \pmod{M}$ quickly with some modular exponentiation. Both of these options are $O(\log M).$

Usually, we want to choose $M$ as large as possible to avoid collisions. In our case, if there's a collision, we'll count an extra context, which is not actually a big deal, so we may be willing to compromise on accuracy for faster running time.

Application to Snapstream Reader

Now, every time that we encouter a match, the left and right hash can be quickly computed and compared with existing hashes. However, which version should we choose? We have 4 versions.

  1. No hashing, so this just returns the raw match count
  2. Large modulus, so we cannot cache the modular inverse
  3. Intermediate modulus, so can cache the modular inverse, but we need to use 64-bit integers
  4. Small modulus, so we cache the modular inverse and use 32-bit integers

We run these different versions with 3 different queries.

  • Query A: {austria} from 2015-8-1 to 2015-8-31
  • Query B: ({united kingdom} + {scotland} + {wales} + ({england} !@ {new england})) from 2015-7-1 to 2015-7-31
  • Query C: ({united states} + {united states of america} + {usa}) @ {mexico} from 2015-9-1 to 2015-9-30

First, we check for collisions. Here are the number of contexts found for the various hashing algorithms and search queries for $K = 51$.

Hashing Version A B C
1 181 847 75
2 44 332 30
3 44 331 30
4 44 331 30

In version 1 since there's no hashing, that's the raw match count. As we'd expect, hashing greatly reduces the number of matches. Also, there's no collisions until we have a lot of matches (847, in this case). Thus, we might be okay with using a smaller modulus if we get a big speed-up since missing 1 context out of a 1,000 won't change trends too much.

Here's the benchmark results.

Obviously, all versions of hashing are slower than no hashing. Using a small modulus approximately doubles the time, which makes sense, for we're essentially reading the text twice: once for searching and another time for hashing. Using an intermediate modulus adds another 3 seconds. Having to perform modular exponentiation to compute the modular inverse adds less than a second in the large modulus version. Thus, using 64-bit integers versus 32-bit integers is the major cause of the slowdown.

For this reason, we went with the small modulus version despite the occasional collisions that we encouter. The code can be found on GitHub in the StringHasher class.

Photo URL is broken

After my good friend Han Zhang introduced me Lion's Head meatballs at Yaso Tangbao in Downtown Brooklyn, I decided that I had to figure out how to cook these. It took me nearly a year before I got around to doing so, but I finally got the chance to make them with Liz Liang when visiting the Washington, D.C. area.

We used the recipe How to Make Shanghai Lion's Head Meatballs from Serious Eats. Fortunately, it doesn't take that long, so we didn't miss out on too many Pokémon.

Sizzling Lion's Head meatballs

Sizzling Lion's head meatballs

Overall, I found it to be a very good recipe, but the proportion of meat and noodles could be adjusted. The woman that made it lives in some strange world where 12 ounces of ground pork feeds 4. We ended up using a little over 2 pounds of ground pork, which resulted in 10 huge meatballs, each about 2.5 inches in diameter. You could double the vermicelli, too, but I like a high meat-to-noodle ratio. The water chestnuts were an interesting twist that might not be for everyone, but they add a nice crunchy texture to the meatballs.

For dessert, Liz was kind enough to make us some blueberry bread pudding, too!

I only helped make the custard. The recipe is blueberry bread and butter pudding. I thought it turned out great, and I like how it wasn't it all that sweet.

Anyway, there's only a week left until I move to Seattle. I'll be saying my goodbyes to Philly soon.

Photo URL is broken

Consider the game of Nim. The best way to become acquainted is to play Nim: The Game, which I've coded up here. Win some games!

Solving Nim

Nim falls under a special class of games, in which, we have several independent games (each pile is its own game), perfect information (you and your opponent know everything), and sequentiality (the game always ends). It turns out every game of this type is equivalent, and we can use Nim as a model on how to solve them.

The most general way to find the optimal strategy is exhaustive enumeration, which we can do recursively.

class GameLogic:

    def get_next_states(self, current_state):

    def is_losing_state(self, state):

def can_move_to_win(gameLogic, current_state):
    ## take care of base case
    if gameLogic.is_losing_state(current_state):
        return False
    ## try all adjacent states, these are our opponent's states
    next_states = gameLogic.get_next_states(current_state)
    for next_state in next_states:
        if can_move_to_win(gameLogic, next_state) is False:
            return True # one could possibly return the next move here
    ## otherwise, we always give our opponent a winning state 
    return False

Of course exhaustive enumeration quickly becomes infeasible. However, the fact that there exists this type of recursive algorithm informs us that we should be looking at induction to solve Nim.

One way to get some intuition at the solution is to think of heaps of just 1 object. No heaps mean that we've already lost, and 1 heap means that we'll win. 2 heaps must reduce to 1 heap, which puts our opponent in a winning state, so we lose. 3 heaps must reduce to 2 heaps, so we'll win. Essentially, we'll win if there are an odd number of heaps.

Now, if we think about representing the number of objects in a heap as a binary number, we can imagine each binary digit as its own game. For example, imagine heaps of sizes, $(27,16,8,2,7)$. Represented as binary, this looks like \begin{align*} 27 &= (11011)_2 \\ 16 &= (10000)_2 \\ 8 &= (01000)_2 \\ 2 &= (00010)_2 \\ 7 &= (00111)_2. \end{align*} The columns corresponding to the $2$s place and $4$s place have an odd number of $1$s, so we can put our opponent in a losing state by removing $6$ objects from the last heap. \begin{align*} 27 &= (11011)_2 \\ 16 &= (10000)_2 \\ 8 &= (01000)_2 \\ 2 &= (00010)_2 \\ 7 - 6 = 1 &= (00001)_2. \end{align*}

Now, the XOR of all the heaps is $0$, so there are an even number of $1$s in each column. As we can see, we have a general algorithm for doing this. Let $N_1,N_2,\ldots,N_K$ be the size of our heaps. If $S = N_1 \oplus N_2 \oplus \cdots \oplus N_K \neq 0$, then $S$ has a nonzero digit. Consider the leftmost nonzero digit. Since $2^{k+1} > \sum_{j=0}^k 2^j$, we can remove objects such that this digit changes, and we can chose any digits to the right to be whatever is necessary so that the XOR of the heap sizes is $0$. Here's a more formal proof.

Proof that if $N_1 \oplus N_2 \oplus \cdots \oplus N_K = 0$ we're in a losing state

Suppose there are $K$ heaps. Define $N_k^{(t)}$ to be the number of objects in heap $k$ at turn $t$. Define $S^{(t)} = N_1^{(t)} \oplus N_2^{(t)} \oplus \cdots \oplus N_K^{(t)}$.

We lose when $N_1 = N_2 = \cdots = N_k = 0$, so if $S^{(t)} \neq 0$, the game is still active, and we have not lost. By the above algorithm, we can make it so that $S^{(t+1)} = 0$ for our opponent. Then, any move that our opponent does must necessarily make it so that $S^{(t+2)} \neq 0$. In this manner, our opponent always has $S^{(t + 2s + 1}) = 0$ for all $s$. Thus, either they lose, or they give us a state such that $S^{(t+2s)} \neq 0$, so we never lose. Since the game must end, eventually our opponent loses.

Sprague-Grundy Theorem

Amazingly this same idea can be applied to a variety of games that meets certain conditions through the Sprague-Grundy Theorem. I actually don't quite under the Wikipedia article, but this is how I see it.

We give every indepedent game a nimber, meaning that it is equivalent to a heap of that size in Nim. Games that are over are assigned the nimber $0$. To find the nimber of a non-terminating game position, we look at the nimbers of all the game positions that we can move to. The nimber of this position is smallest nonnegative integer strictly greater than all the nimbers that we can move to. So if we can move to the nimbers $\{0,1,3\}$, the nimber is $2$.

At any point of the game, each of our $K$ independent games has a nimber $N_k$. We're in a losing state if and only if $S = N_1 \oplus N_2 \oplus \cdots \oplus N_K = 0$.

For the $\Leftarrow$ direction, first suppose that $S^{(t)} = N_1^{(t)} \oplus N_2^{(t)} \oplus \cdots \oplus N_K^{(t)} = 0$. Because of the way that the nimber's are defined we any move that we do changes the nimber of exactly one of the independent games, so $N_k^{(t)} \neq N_k^{(t + 1)}$, for the next game positions consist of nimbers $1,2,\ldots, N_k^{(t)} - 1$. If we can move to $N_k^{(t)}$, then actually the nimber of the current game position is $N_k^{(t)} + 1$, a contradiction. Thus, we have ensured that $S^{(t + 1)} \neq 0$. Since we can always move to smaller nimbers, we can think of nimbers as the number of objects in a heap. In this way, the opponent applies the same algorithm to ensure that $S^{(t + 2s)} = 0$, so we can never put the opponent in a terminating position. Since the game must end, we'll eventually be in the terminating position.

For the $\Rightarrow$ direction, suppose that we're in a losing state. We prove the contrapositive. If $S^{(t)} = N_1^{(t)} \oplus N_2^{(t)} \oplus \cdots \oplus N_K^{(t)} \neq 0$, the game has not terminated, and we can make it so that $S^{(t+1)} = 0$ by construction of the nimbers. Thus, our opponent is always in a losing state, and since the game must terminate, we'll win.

Now, this is all very abstract, so let's look at an actual problem.

Floor Division Game

Consider the Floor Division Game from CodeChef. This is the problem that motivated me to learn about all this. Let's take a look at the problem statement.

Henry and Derek are waiting on a room, eager to join the Snackdown 2016 Qualifier Round. They decide to pass the time by playing a game.

In this game's setup, they write $N$ positive integers on a blackboard. Then the players take turns, starting with Henry. In a turn, a player selects one of the integers, divides it by $2$, $3$, $4$, $5$ or $6$, and then takes the floor to make it an integer again. If the integer becomes $0$, it is erased from the board. The player who makes the last move wins.

Henry and Derek are very competitive, so aside from wanting to win Snackdown, they also want to win this game. Assuming they play with the optimal strategy, your task is to predict who wins the game.

The independent games aren't too hard to see. We have $N$ of them in the form of the positive integers that we're given. The numbers are written clearly on the blackboard, so we have perfect information. This game is sequential since the numbers must decrease in value.

A first pass naive solutions uses dynamic programming. Each game position is a nonnegative integer. $0$ is the terminating game position, so its nimber is $0$. Each game position can move to at most $5$ new game positions after dividing by $2$, $3$, $4$, $5$ or $6$ and taking the floor. So if we know the nimbers of these $5$ game positions, it's a simple matter to compute the nimber the current game position. Indeed, here's such a solution.

#include <iostream>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <vector>

using namespace std;

long long computeNimber(long long a, unordered_map<long long, long long> &nimbers) {
  if (nimbers.count(a)) return nimbers[a];
  if (a == 0LL) return 0LL;
  unordered_set<long long> moves;
  for (int d = 2; d <= 6; ++d) moves.insert(a/d);
  unordered_set<long long> neighboringNimbers;
  for (long long nextA : moves) {
    neighboringNimbers.insert(computeNimber(nextA, nimbers));
  long long nimber = 0;
  while (neighboringNimbers.count(nimber)) ++nimber;
  return nimbers[a] = nimber;

string findWinner(const vector<long long> &A, unordered_map<long long, long long> &nimbers) {
  long long cumulativeXor = 0;
  for (long long a : A) {
    cumulativeXor ^= computeNimber(a, nimbers);
  return cumulativeXor == 0 ? "Derek" : "Henry";

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);  
  int T; cin >> T;
  unordered_map<long long, long long> nimbers;
  nimbers[0] = 0;
  for (int t = 0; t < T; ++t) {
    int N; cin >> N;
    vector<long long> A; A.reserve(N);
    for (int n = 0; n < N; ++n) {
      long long a; cin >> a;
    cout << findWinner(A, nimbers) << '\n';
  cout << flush;
  return 0;

Unfortunately, in a worst case scenario, we'll have to start computing nimbers from $0$ and to $\max\left(A_1,A_2,\ldots,A_N\right)$. Since $1 \leq A_n \leq 10^{18}$ can be very large, this solution breaks down. We can speed this up with some mathematical induction.

To me, the pattern was not obvious at all. To help discover it, I generated the nimbers for smaller numbers. This is what I found.

Game Positions Nimber Numbers in Interval
$[0,1)$ $0$ $1$
$[1,2)$ $1$ $1$
$[2,4)$ $2$ $2$
$[4,6)$ $3$ $2$
$[6,12)$ $0$ $6$
$[12,24)$ $1$ $12$
$[24,48)$ $2$ $24$
$[48,72)$ $3$ $24$
$[72,144)$ $0$ $72$
$[144,288)$ $1$ $144$
$[288,576)$ $2$ $288$
$[576, 864)$ $3$ $288$
$[864,1728)$ $0$ $864$
$[1728,3456)$ $1$ $1728$
$[3456,6912)$ $2$ $3456$
$[6912, 10368)$ $3$ $3456$
$\vdots$ $\vdots$ $\vdots$

Hopefully, you can start to see some sort of pattern here. We have repeating cycles of $0$s, $1$2, $2$s, and $3$s. Look at the starts of the cycles $0$, $6$, $72$, and $864$. The first cycle is an exception, but the following cycles all start at $6 \cdot 12^k$ for some nonnegative integer $k$.

Also look at the number of $0$s, $1$2, $2$s, and $3$s. Again, with the exception of the transition from the first cycle to the second cycle, the quantity is multiplied by 12. Let $s$ be the cycle start and $L$ be the cycle length. Then, $[s, s + L/11)$ has nimber $0$, $[s + L/11, s + 3L/11)$ has nimber $1$, $[s + 3L/11, s + 7L/11)$ has nimber $2$, and $[s + 7L/11, s + L)$ has nimber $3$.

Now, given that the penalty for wrong submissions on CodeChef is rather light, you could code this up and submit it now, but let's try to rigorously prove it. It's true when $s_0 = 6$ and $L_0 = 66$. So, we have taken care of the base case. Define $s_k = 12^ks_0$ and $L_k = 12^kL_0$. Notice that $L_k$ is always divisible by $11$ since $$L_{k} = s_{k+1} - s_{k} = 12^{k+1}s_0 - 12^{k}s_0 = 12^{k}s_0(12 - 1) = 11 \cdot 12^{k}s_0.$$

Our induction hypothesis is that this pattern holds in the interval $[s_k, s_k + L_k)$. Now, we attack the problem with case analysis.

  • Suppose that $x \in \left[s_{k+1}, s_k + \frac{1}{11}L_{k+1}\right)$. We have that $s_{k+1} = 12s_k$ and $L_{k+1} = 12L_k = 12\cdot 11s_k$, so we have that \begin{align*} 12s_k \leq &~~~x < 2 \cdot 12s_k = 2s_{k+1} \\ \left\lfloor\frac{12}{d}s_k\right\rfloor \leq &\left\lfloor\frac{x}{d}\right\rfloor < \left\lfloor\frac{2}{d}s_{k+1}\right\rfloor \\ 2s_k \leq &\left\lfloor\frac{x}{d}\right\rfloor < s_{k+1} \\ s_k + \frac{1}{11}L_k \leq &\left\lfloor\frac{x}{d}\right\rfloor < s_{k+1} \end{align*} since $2 \leq d \leq 6$, and $L_k = 11 \cdot 12^{k}s_0 = 11s_k$. Thus, $\left\lfloor\frac{x}{d}\right\rfloor$ falls entirely in the previous cycle, but it's large enough so it never has the nimber $0$. Thus, the smallest nonnegative integer among next game positions is $0$, so $x$ has the nimber $0$.
  • Now, suppose that $x \in \left[s_{k+1} + \frac{1}{11}L_{k+1}, s_{k+1} + \frac{3}{11}L_{k+1}\right)$. Then, we have that \begin{align*} s_{k+1} + \frac{1}{11}L_{k+1} \leq &~~~x < s_{k+1} + \frac{3}{11}L_{k+1} \\ 2s_{k+1} \leq &~~~x < 4s_{k+1} \\ 2 \cdot 12s_{k} \leq &~~~x < 4 \cdot 12s_{k} \\ \left\lfloor \frac{2 \cdot 12}{d}s_{k}\right\rfloor \leq &\left\lfloor \frac{x}{d} \right\rfloor < \left\lfloor\frac{4 \cdot 12}{d}s_{k}\right\rfloor \\ s_k + 3s_k = 4s_k \leq &\left\lfloor \frac{x}{d} \right\rfloor < 24s_k = 12s_k + 12s_k = s_{k+1} + s_{k+1} \\ s_k + \frac{3}{11}L_k \leq &\left\lfloor \frac{x}{d} \right\rfloor < s_{k+1} + \frac{1}{11}L_{k+1}. \end{align*} Thus, $\left\lfloor \frac{x}{d} \right\rfloor$ lies in both the previous cycle and current cycle. In the previous cycle it lies in the part with nimbers $2$ and $3$. In the current cycle, we're in the part with nimber $0$. In fact, if we let $d = 2$, we have that \begin{equation*} s_{k+1} \leq \left\lfloor \frac{x}{2} \right\rfloor < s_{k+1} + \frac{1}{11}L_{k+1}, \end{equation*} so we can always reach a game position with nimber $0$. Since we can never reach a game position with nimber $1$, this game position has nimber $1$.
  • Suppose that $x \in \left[s_{k+1} + \frac{3}{11}L_{k+1}, s_{k+1} + \frac{7}{11}L_{k+1}\right)$. We follow the same ideas, here. \begin{align*} s_{k+1} + \frac{3}{11}L_{k+1} \leq &~~~x < s_{k+1} + \frac{7}{11}L_{k+1} \\ 4s_{k+1} \leq &~~~x < 8s_{k+1} \\ 8s_{k} \leq &\left\lfloor \frac{x}{d} \right\rfloor < 4s_{k+1} \\ s_{k} + \frac{7}{11}L_k \leq &\left\lfloor \frac{x}{d} \right\rfloor < s_{k+1} + \frac{3}{11}L_{k+1}, \end{align*} so $\left\lfloor \frac{x}{d} \right\rfloor$ falls in the previous cycle where the nimber is $3$ and in the current cycle where the nimbers are $0$ and $1$. By fixing $d = 2$, we have that \begin{equation} s_{k+1} + \frac{1}{11}L_{k+1} = 2s_{k+1} \leq \left\lfloor \frac{x}{2} \right\rfloor < 4s_{k+1} = s_{k+1} + frac{3}{11}L_{k+1}, \end{equation} so we can always get to a number where the nimber is $1$. By fixing $d = 4$, we \begin{equation} s_{k+1} \leq \left\lfloor \frac{x}{4} \right\rfloor < 2s_{k+1} = s_{k+1} + \frac{1}{11}L_{k+1}, \end{equation} so we can always get to a number where the nimber is $0$. Since a nimber of $2$ is impossible to reach, the current nimber is $2$.
  • Finally, the last case is $x \in \left[s_{k+1} + \frac{7}{11}L_{k+1}, s_{k+1} + L_{k+1}\right)$. This case is actually a little different. \begin{align*} s_{k+1} + \frac{7}{11}L_{k+1} \leq &~~~x < s_{k+1} + L_{k+1} \\ 8s_{k+1} \leq &~~~x < 12s_{k+1} \\ s_{k+1} < \left\lfloor \frac{4}{3}s_{k+1} \right\rfloor \leq &\left\lfloor \frac{x}{d} \right\rfloor < 6s_{k+1}, \end{align*} so we're entirely in the current cycle now. If we use, $d = 6$, then \begin{equation*} s_{k+1} < \left\lfloor \frac{4}{3}s_{k+1} \right\rfloor \leq \left\lfloor \frac{x}{6} \right\rfloor < 2s_{k+1} = s_{k+1} + \frac{1}{11}L_{k+1}, \end{equation*} so we can reach nimber $0$. If we use $d = 4$, we have \begin{equation*} s_{k+1} + \frac{1}{11}L_{k+1} = 2s_{k+1} \leq \left\lfloor \frac{x}{4} \right\rfloor < 3s_{k+1} = s_{k+1} + \frac{2}{11}L_{k+1}, \end{equation*} which gives us a nimber of $1$. Finally, if we use $d = 2$, \begin{equation*} s_{k+1} + \frac{3}{11}L_{k+1} \leq 4s_{k+1} \leq\left\lfloor \frac{x}{4} \right\rfloor < 6s_{k+1} = s_{k+1} + \frac{5}{11}L_{k+1}, \end{equation*} so we can get a nimber of $2$, too. This is the largest nimber that we can get since $d \geq 2$, so $x$ must have nimber $3$.

This covers all the cases, so we're done. Here's the complete code for a $O\left(\max(A_k)\log N\right)$ solution.

#include <cmath>
#include <iostream>
#include <string>>
#include <vector>

using namespace std;

long long computeNimber(long long a) {
  if (a < 6) { // exceptional cases
    if (a < 1) return 0;
    if (a < 2) return 1;
    if (a < 4) return 2;
    return 3;
  unsigned long long cycleStart = 6;
  while (12*cycleStart <= a) cycleStart *= 12;
  if (a < 2*cycleStart) return 0;
  if (a < 4*cycleStart) return 1;
  if (a < 8*cycleStart) return 2;
  return 3;

string findWinner(const vector<long long> &A) {
  long long cumulativeXor = 0;
  for (long long a : A) cumulativeXor ^= computeNimber(a);
  return cumulativeXor == 0 ? "Derek" : "Henry";

int main(int argc, char *argv[]) {
  ios::sync_with_stdio(false); cin.tie(NULL);  
  int T; cin >> T;
  for (int t = 0; t < T; ++t) {
    int N; cin >> N;
    vector<long long> A; A.reserve(N);
    for (int n = 0; n < N; ++n) {
      long long a; cin >> a;
    cout << findWinner(A) << '\n';
  cout << flush;
  return 0;

Photo URL is broken

Long time no post.

I am sharing the recipe I should not be sharing... the ramen recipe. Many Japanese chefs keep their own ramen recipe a secret from society. They guard it well, only sharing it to the inheritance of their resturant. Therefore, it is hard to find a legitimate one floating around the vast world of internet.Try looking ramen recipes up, you will only find about 5 types, which they differ very slightly.

Being in America, decent ramen was so hard to come by. I still remember the flavor, the texture, and the aroma of the ramen I had in Japan. It was around 2006 summer when my mom, sister, and I were walking back from visiting family graves. We were starving. We stumble across a small ramen shop underneath the bridge with about 5 seats. And that is still easliy the best ramen I ever had.

And yes, since then, I am picky with my ramen. Other ramen shops in Philly does not satisfy at all.

My journey began in January 2015, I made my ramen with the help of Phil. With countless trials and mistakes, I am 90% close to the ramen I dreamt of: the real shoyu ramen (soysauce ramen).

In Japan, most ramen shop only specialize in one of the four flavors; shoyu, tonkotsu, miso, and shio (salt). I use shoyu ramen as a basis of how good a ramen shop is. If a shop cannot make shoyu ramen, then they cannot make ramen. Period.

To make ramen, it consists of two types of broth. Dashi and Tare. Dashi is the stock broth you make without any salt content. Tare is the sauce that determines the flavor of the soup.

Let's get to it.


  • 5 old hens
  • 300 g Leek (white part)
  • 1 bunch of scallion (white part)
  • 15 g Katsuobushi*
  • 120 g Ginger with skin
  • 57 g Dried shiitake mushroom
  • 57 g Dried kelp (japanese grade)


  • 1 old hen
  • 2 cups Light soy sauce (NOT LOW SODIUM)
  • 1 cup sake
  • 1/2 mirin
  • 1 pack of bacon
  • 15 g Katsuobushi


  • 2 lbs of pork belly
  • 1 stalk of ginger
  • 1 leek and the green part that you cut off for the Dashi
  • 1 tbs of hondashi**
  • 1/2 of soy sauce dark**
  • Water


  • 1 cup of the chashu broth
  • Eggs


  • 300 g Bread flour
  • 1/2 cup warm water
  • 1 teaspoon and 1/8 teaspoon of Kansui (potassium carbonate and sodium bi carbonate solution)

*Katsuobushi is dried shaved mackerel. I use the dark shaved one which is only found in Japan as of now...

** This measurement is not finalized. I eye them out


  1. Fill about 80% of the 16 quart pot with water and heat it to 170 degrees
  2. Add the dried kelp and shiitake mushroom then turn off heat. Cover the pot
  3. Leave them in for 2 hours
  4. Discard the kelp, then add ginger, hens, leeks, scallions, katsuobushi. Cover the pot
  5. Simmer at the lowest heat your stove has for about 12 to 15 hours


  1. Heat oven to 400 degrees
  2. Roast the old hen in an oven safe sauce pan until it is brown and crisp
  3. Add the soy sauce, bacon, sake, mirin, and katsuobushi and simmer at the lowest heat setting on top of the stove. We are infusing the flavor, not cooking it
  4. Simmer for about 30 minutes then leave the ingredients for about 2 hours


  1. Tie the pork belly into a circle
  2. Heat oven to 350 degrees
  3. Sear the pork belly in a dutch oven until all sides are brown
  4. Add the leak and ginger, cook it for a bit, then add water until it covers the meat
  5. Add soy sauce and hondashi. Cover the pot and cook it in oven for 2 hours minimum, 5 hours max. We are infusing the flavor into the meat. Do not cook this too long or else the pork belly will not maintain it's shape


  1. Boil water until it is hard boiling
  2. Poke a hole on the bottom fat side of the egg. This will let the air out while cooking
  3. Add the egg all at once into the water and cook it for 6 minutes and 15 seconds FLAT
  4. Peal eggs in cold water immediately
  5. Use the broth you made from the chashu. Marinate the eggs, in a seperate bowl, covering the top with a paper towel, in fridge overnight


  1. Mix the kansui and water
  2. Add to flour and knead
  3. Use pasta maker

To combine: I use about 3:1 ratio of Dashi to Tare. I think I pour less Tare to reduce the sodium but that should be about right. I add the Tare first then dilute it with Dashi

Garnish: I use Japanese pickled bamboo, sweet corn, sauteed onions, scallion, and nori (dried seaweed)

Picture below is my Version 3.0. The picture on the top is my version 5.0.


Photo URL is broken

To avoid facing the ticking clock, I've decided to read a textbook and start a new project Machine Learning: a Probabilistic Perspective in Python. I've been meaning to learn more about the math behind machine learning and the Python ecosystem for scientific computing, so I've decided to code up various solutions for the textbook in Jupyter notebooks. Here are a couple of the more interesting exercises.

Cross Validation and Nearest Neighbors

A classic problem in machine learning that is often used to benchmark classification algorithms is digit recognition. The standard dataset is The MNIST database of handwritten digits. A digit is represented as $26 \times 26$ grid of grayscale pixels like the following examples.

Surprisingly, a simple nearest neighbor approach works well despite the curse of dimensionality. When using $1$ neighbor and euclidean distance, we find that the error rate is only 3.8% on 1,000 test cases. Here are some correct predictions.

Of course we get some predictions wrong. Here are some.

The notebook to generate these figures is here, KNN classifier on shuffled MNIST data.

Model Selection

How should we know how many neighbors to use? For a parametric models, we can pick our models using a maximum likelihood estimate (MLE) or a Bayesian approach, using either the posterior mean or maximum a posteriori probability (MAP) estimate. However, nearest neighbors is a nonparametric model since the number of parameters increases with the amount of data.

To measure model performance, we use empirical measures, that is, we directly measure prediction accuracy. There are a few ways of going about this.

  1. We can test the accuracy on the data that we trained on. This is called the training set error. Generally, it's not a very good metric since it's prone to overfitting. For instance, using 1 neighbor leads to 0% error.
  2. We can withhold some of our data and call it the test set. We fit our model on the data not in the test set and calculate the misclassification rate on the test set. This is the test set error. It generally gives a good idea how the model will perform with new data, but it cannot be used when you have a limited number of test cases and cannot afford to throw away data.
  3. A compromise is the cross validation error. Here, we split our data into $k$ folds. For each fold, we fit our model on the remaining $k - 1$ folds and use the chosen fold as our test set. In this way, we compute $k$ different error rates. Then, we take the average of those error rates.

All three approaches were tried and compared in this notebook, CV for KNN.

The test set error and cross validation error are quite close, which indicates that the cross validation error is a good proxy for the test set error when faced with limited data. The training set error will almost always underpredict the true error rate and is often vulnerable to overfitting. In this rare case, all three error rates actually agree that using 1 neighbor is optional, but usually, model that achieves the best training set error is overfitted to the data and predicts future test cases badly.

Discriminant Analysis

Some of the simplest models are the various flavors of discriminant analysis. Suppose we have classes $k = 1,2,\ldots,K$. We make $N$ obervations $(\mathbf{x}_i, y_i)$, where $i \in \{1,2,\ldots, N\}$ and $y_i \in \{1,2,\ldots,K\}$. $\mathbf{x}_i$ is our data and $y_i$ is our class label. Based off some new data $\mathbf{x}^\prime$, we would want to predict its class label $y^\prime$.

Discriminant analysis makes the strong assumption that $\mathbf{x}$ is generated from a multivariate normal distribution. The distribution has different parameters for each class, so if $\mathbf{x}$ belongs to class $y = k$, $\left(\mathbf{x} \mid y = k\right) \sim \mathcal{N}(\boldsymbol\mu_k,\Sigma_k)$. An example would be trying to predict the gender of a person after observing their height and weight.

Using the MLE, we can fit a multivariate normal to males as I do in the notebook Whitening versus standardizing.

We can do a similar thing for our female observations. Thus, given an observation $\mathbf{x}$, we have estimated the distribution of $\mathbf{x} \mid y = k$. Now, the distribution of $y$ is just a multinomial distribution, with parameters $\boldsymbol\pi \in \left\{(\pi_1,\pi_2,\ldots, pi_K) \in \mathbb{R}^K : \sum_{k=1}^Kx_k = 1,~0 < x_k < 1\right\}$. The MLE estimate for $\boldsymbol\pi$ is just \begin{equation} \boldsymbol{\hat{\pi}} = \left(\frac{N_1}{N}, \frac{N_2}{N},\ldots,\frac{N_K}{N}\right), \end{equation} where $N_k$ is the number of our observations of class $k$.

Let $f(\mathbf{x} \mid \boldsymbol\mu,\Sigma)$ be the probability density function of a multivariate normal with mean $boldsymbol\mu$ and covariance $\Sigma$. Now, we apply Bayes' rule which gives us that \begin{align} p(y = k \mid \mathbf{x}) &= \frac{p(\mathbf{x} \mid y = k)p(y=k)}{\sum_{k^\prime = 1}^Kp(\mathbf{x} \mid y = k^\prime)p(y=k^\prime)}\nonumber\\ &= \frac{f(\mathbf{x} \mid \boldsymbol{\hat{\mu}}_k, \hat{\Sigma}_k)\frac{N_k}{N}}{\sum_{k^\prime = 1}^K f(\mathbf{x} \mid \boldsymbol{\hat{\mu}}_{k^\prime}, \hat{\Sigma}_{k^\prime})\frac{N_{k^\prime}}{N}} \nonumber\\ &= \frac{f(\mathbf{x} \mid \boldsymbol{\hat{\mu}}_k, \hat{\Sigma}_k)N_k}{\sum_{k^\prime = 1}^K f(\mathbf{x} \mid \boldsymbol{\hat{\mu}}_{k^\prime}, \hat{\Sigma}_{k^\prime})N_{k^\prime}}. \end{align}

This gives us the quadratic disriminant analysis (QDA) classifier. Now, we can fix $\Sigma_k = \Sigma$ for all $k$, which leads to further cancellations. This gives the linear discriminant analysis (LDA) classifier. The names quadractic and linear come from the shape of boundaries as seen in the notebook LDA/QDA on height/weight data.

In this case, they lead to the same misclassification rate.

See more notebooks as I add them at Machine Learning: a Probabilistic Perspective in Python.