Certain problems in competitive programming call for more advanced data structure than our built into Java's or C++'s standard libraries. Two examples are an order statistic tree and a priority queue that lets you modify priorities. It's questionable whether these implementations are useful outside of competitive programming since you could just use Boost.

## Order Statistic Tree

Consider the problem ORDERSET. An order statistic tree trivially solves this problem. And actually, implementing an order statistic tree is not so difficult. You can find the implementation here. Basically, you have a node invariant

```
operator()(node_iterator node_it, node_const_iterator end_nd_it) const {
node_iterator l_it = node_it.get_l_child();
const size_type l_rank = (l_it == end_nd_it) ? 0 : l_it.get_metadata();
node_iterator r_it = node_it.get_r_child();
const size_type r_rank = (r_it == end_nd_it) ? 0 : r_it.get_metadata();
const_cast<metadata_reference>(node_it.get_metadata())= 1 + l_rank + r_rank;
}
```

where each node contains a count of nodes in its subtree. Every time you insert a new node or delete a node, you can maintain the invariant in $O(\log N)$ time by bubbling up to the root.

With this extra data in each node, we can implement two new methods, (1) `find_by_order`

and (2) `order_of_key`

. `find_by_order`

takes a nonnegative integer as an argument and returns the node corresponding to that index, where are data is sorted and we use $0$-based indexing.

```
find_by_order(size_type order) {
node_iterator it = node_begin();
node_iterator end_it = node_end();
while (it != end_it) {
node_iterator l_it = it.get_l_child();
const size_type o = (l_it == end_it)? 0 : l_it.get_metadata();
if (order == o) {
return *it;
} else if (order < o) {
it = l_it;
} else {
order -= o + 1;
it = it.get_r_child();
}
}
return base_type::end_iterator();
}
```

It works recursively like this. Call the index we're trying to find $k$. Let $l$ be the number of nodes in the left subtree.

- $k = l$: If you're trying to find the $k$th-indexed element, then there will be $k$ nodes to your left, so if the left child has $k$ elements in its subtree, you're done.
- $k < l$: The $k$-indexed element is in the left subtree, so replace the root with the left child.
- $k > l$: The $k$ indexed element is in the right subtree. It's equivalent to looking for the $k - l - 1$ element in the right subtree. We subtract away all the nodes in the left subtree and the root and replace the root with the right child.

`order_of_key`

takes whatever type is stored in the nodes as an argument. These types are comparable, so it will return the index of the smallest element that is greater or equal to the argument, that is, the least upper bound.

```
order_of_key(key_const_reference r_key) const {
node_const_iterator it = node_begin();
node_const_iterator end_it = node_end();
const cmp_fn& r_cmp_fn = const_cast<PB_DS_CLASS_C_DEC*>(this)->get_cmp_fn();
size_type ord = 0;
while (it != end_it) {
node_const_iterator l_it = it.get_l_child();
if (r_cmp_fn(r_key, this->extract_key(*(*it)))) {
it = l_it;
} else if (r_cmp_fn(this->extract_key(*(*it)), r_key)) {
ord += (l_it == end_it)? 1 : 1 + l_it.get_metadata();
it = it.get_r_child();
} else {
ord += (l_it == end_it)? 0 : l_it.get_metadata();
it = end_it;
}
}
return ord;
}
```

This is a simple tree traversal, where we keep track of order as we traverse the tree. Every time we go down the right branch, we add $1$ for every node in the left subtree and the current node. If we find a node that it's equal to our key, we add $1$ for every node in the left subtree.

While not entirely trivial, one could write this code during a contest. But what happens when we need a balanced tree. Both Java implementations of `TreeSet`

and C++ implementations of `set`

use a red-black tree, but their APIs are such that the trees are not easily extensible. Here's where Policy-Based Data Structures come into play. They have a mechanism to create a node update policy, so we can keep track of metadata like the number of nodes in a subtree. Conveniently, `tree_order_statistics_node_update`

has been written for us. Now, our problem can be solved quite easily. I have to make some adjustments for the $0$-indexing. Here's the code.

```
#include <functional>
#include <iostream>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
namespace phillypham {
template<typename T,
typename cmp_fn = less<T>>
using order_statistic_tree =
__gnu_pbds::tree<T,
__gnu_pbds::null_type,
cmp_fn,
__gnu_pbds::rb_tree_tag,
__gnu_pbds::tree_order_statistics_node_update>;
}
int main(int argc, char *argv[]) {
ios::sync_with_stdio(false); cin.tie(NULL);
int Q; cin >> Q; // number of queries
phillypham::order_statistic_tree<int> orderStatisticTree;
for (int q = 0; q < Q; ++q) {
char operation;
int parameter;
cin >> operation >> parameter;
switch (operation) {
case 'I':
orderStatisticTree.insert(parameter);
break;
case 'D':
orderStatisticTree.erase(parameter);
break;
case 'K':
if (1 <= parameter && parameter <= orderStatisticTree.size()) {
cout << *orderStatisticTree.find_by_order(parameter - 1) << '\n';
} else {
cout << "invalid\n";
}
break;
case 'C':
cout << orderStatisticTree.order_of_key(parameter) << '\n';
break;
}
}
cout << flush;
return 0;
}
```

## Dijkstra's algorithm and Priority Queues

Consider the problem SHPATH. Shortest path means Dijkstra's algorithm of course. Optimal versions of Dijkstra's algorithm call for exotic data structures like Fibonacci heaps, which lets us achieve a running time of $O(E + V\log V)$, where $E$ is the number of edges, and $V$ is the number of vertices. In even a fairly basic implementation in the classic CLRS, we need more than what the standard priority queues in Java and C++ offer. Either, we implement our own priority queues or use a slow $O(V^2)$ version of Dijkstra's algorithm.

Thanks to policy-based data structures, it's easy to use use a fancy heap for our priority queue.

```
#include <algorithm>
#include <climits>
#include <exception>
#include <functional>
#include <iostream>
#include <string>
#include <unordered_map>
#include <utility>
#include <vector>
#include <ext/pb_ds/priority_queue.hpp>
using namespace std;
namespace phillypham {
template<typename T,
typename cmp_fn = less<T>> // max queue by default
class priority_queue {
private:
struct pq_cmp_fn {
bool operator()(const pair<size_t, T> &a, const pair<size_t, T> &b) const {
return cmp_fn()(a.second, b.second);
}
};
typedef typename __gnu_pbds::priority_queue<pair<size_t, T>,
pq_cmp_fn,
__gnu_pbds::pairing_heap_tag> pq_t;
typedef typename pq_t::point_iterator pq_iterator;
pq_t pq;
vector<pq_iterator> map;
public:
class entry {
private:
size_t _key;
T _value;
public:
entry(size_t key, T value) : _key(key), _value(value) {}
size_t key() const { return _key; }
T value() const { return _value; }
};
priority_queue() {}
priority_queue(int N) : map(N, nullptr) {}
size_t size() const {
return pq.size();
}
size_t capacity() const {
return map.size();
}
bool empty() const {
return pq.empty();
}
/**
* Usually, in C++ this returns an rvalue that you can modify.
* I choose not to allow this because it's dangerous, however.
*/
T operator[](size_t key) const {
return map[key] -> second;
}
T at(size_t key) const {
if (map.at(key) == nullptr) throw out_of_range("Key does not exist!");
return map.at(key) -> second;
}
entry top() const {
return entry(pq.top().first, pq.top().second);
}
int count(size_t key) const {
if (key < 0 || key >= map.size() || map[key] == nullptr) return 0;
return 1;
}
pq_iterator push(size_t key, T value) {
// could be really inefficient if there's a lot of resizing going on
if (key >= map.size()) map.resize(key + 1, nullptr);
if (key < 0) throw out_of_range("The key must be nonnegative!");
if (map[key] != nullptr) throw logic_error("There can only be 1 value per key!");
map[key] = pq.push(make_pair(key, value));
return map[key];
}
void modify(size_t key, T value) {
pq.modify(map[key], make_pair(key, value));
}
void pop() {
if (empty()) throw logic_error("The priority queue is empty!");
map[pq.top().first] = nullptr;
pq.pop();
}
void erase(size_t key) {
if (map[key] == nullptr) throw out_of_range("Key does not exist!");
pq.erase(map[key]);
map[key] = nullptr;
}
void clear() {
pq.clear();
fill(map.begin(), map.end(), nullptr);
}
};
}
```

By replacing `__gnu_pbds::pairing_heap_tag`

with `__gnu_pbds::binomial_heap_tag`

, `__gnu_pbds::rc_binomial_heap_tag`

, or `__gnu_pbds::thin_heap_tag`

, we can try different types of heaps easily. See the `priority_queue`

interface. Unfortunately, we cannot try the binary heap because modifying elements invalidates iterators. Conveniently enough, the library allows us to check this condition dynamically .

```
#include <iostream>
#include <functional>
#include <ext/pb_ds/priority_queue.hpp>
using namespace std;
int main(int argc, char *argv[]) {
__gnu_pbds::priority_queue<int, less<int>, __gnu_pbds::binary_heap_tag> pq;
cout << (typeid(__gnu_pbds::container_traits<decltype(pq)>::invalidation_guarantee) == typeid(__gnu_pbds::basic_invalidation_guarantee)) << endl;
// prints 1
cout << (typeid(__gnu_pbds::container_traits<__gnu_pbds::priority_queue<int, less<int>, __gnu_pbds::binary_heap_tag>>::invalidation_guarantee) == typeid(__gnu_pbds::basic_invalidation_guarantee)) << endl;
// prints 1
return 0;
}
```

See the documentation for `basic_invalidation_guarantee`

. We need at least `point_invalidation_guarantee`

for the below code to work since we keep a vector of iterators in our `phillypham::priority_queue`

.

```
vector<int> findShortestDistance(const vector<vector<pair<int, int>>> &adjacencyList,
int sourceIdx) {
int N = adjacencyList.size();
phillypham::priority_queue<int, greater<int>> minDistancePriorityQueue(N);
for (int i = 0; i < N; ++i) {
minDistancePriorityQueue.push(i, i == sourceIdx ? 0 : INT_MAX);
}
vector<int> distances(N, INT_MAX);
while (!minDistancePriorityQueue.empty()) {
phillypham::priority_queue<int, greater<int>>::entry minDistanceVertex =
minDistancePriorityQueue.top();
minDistancePriorityQueue.pop();
distances[minDistanceVertex.key()] = minDistanceVertex.value();
for (pair<int, int> nextVertex : adjacencyList[minDistanceVertex.key()]) {
int newDistance = minDistanceVertex.value() + nextVertex.second;
if (minDistancePriorityQueue.count(nextVertex.first) &&
minDistancePriorityQueue[nextVertex.first] > newDistance) {
minDistancePriorityQueue.modify(nextVertex.first, newDistance);
}
}
}
return distances;
}
```

Fear not, I ended up using my own binary heap that wrote from Dijkstra, Paths, Hashing, and the Chinese Remainder Theorem. Now, we can benchmark all these different implementations against each other.

```
int main(int argc, char *argv[]) {
ios::sync_with_stdio(false); cin.tie(NULL);
int T; cin >> T; // number of tests
for (int t = 0; t < T; ++t) {
int N; cin >> N; // number of nodes
// read input
unordered_map<string, int> cityIdx;
vector<vector<pair<int, int>>> adjacencyList; adjacencyList.reserve(N);
for (int i = 0; i < N; ++i) {
string city;
cin >> city;
cityIdx[city] = i;
int M; cin >> M;
adjacencyList.emplace_back();
for (int j = 0; j < M; ++j) {
int neighborIdx, cost;
cin >> neighborIdx >> cost;
--neighborIdx; // convert to 0-based indexing
adjacencyList.back().emplace_back(neighborIdx, cost);
}
}
// compute output
int R; cin >> R; // number of subtests
for (int r = 0; r < R; ++r) {
string sourceCity, targetCity;
cin >> sourceCity >> targetCity;
int sourceIdx = cityIdx[sourceCity];
int targetIdx = cityIdx[targetCity];
vector<int> distances = findShortestDistance(adjacencyList, sourceIdx);
cout << distances[targetIdx] << '\n';
}
}
cout << flush;
return 0;
}
```

I find that the policy-based data structures are much faster than my own hand-written priority queue.

Algorithm | Time (seconds) |
---|---|

PBDS Pairing Heap, Lazy Push | 0.41 |

PBDS Pairing Heap | 0.44 |

PBDS Binomial Heap | 0.48 |

PBDS Thin Heap | 0.54 |

PBDS RC Binomial Heap | 0.60 |

Personal Binary Heap | 0.72 |

Lazy push is small optimization, where we add vertices to the heap as we encounter them. We save a few hundreths of a second at the expense of increased code complexity.

```
vector<int> findShortestDistance(const vector<vector<pair<int, int>>> &adjacencyList,
int sourceIdx) {
int N = adjacencyList.size();
vector<int> distances(N, INT_MAX);
phillypham::priority_queue<int, greater<int>> minDistancePriorityQueue(N);
minDistancePriorityQueue.push(sourceIdx, 0);
while (!minDistancePriorityQueue.empty()) {
phillypham::priority_queue<int, greater<int>>::entry minDistanceVertex =
minDistancePriorityQueue.top();
minDistancePriorityQueue.pop();
distances[minDistanceVertex.key()] = minDistanceVertex.value();
for (pair<int, int> nextVertex : adjacencyList[minDistanceVertex.key()]) {
int newDistance = minDistanceVertex.value() + nextVertex.second;
if (distances[nextVertex.first] == INT_MAX) {
minDistancePriorityQueue.push(nextVertex.first, newDistance);
distances[nextVertex.first] = newDistance;
} else if (minDistancePriorityQueue.count(nextVertex.first) &&
minDistancePriorityQueue[nextVertex.first] > newDistance) {
minDistancePriorityQueue.modify(nextVertex.first, newDistance);
distances[nextVertex.first] = newDistance;
}
}
}
return distances;
}
```

All in all, I found learning to use these data structures quite fun. It's nice to have such easy access to powerful data structures. I also learned a lot about C++ templating on the way.

I just finished reading Jane Austen's *Emma*. Despite it being full of romance, I didn't particularly like this book. Several of the characters practiced various forms of duplicity and were rewarded with happy endings. There seems to be some attempt at lesson warning against youthful arrogance and the obsession with social class, but such admonishments come across as weak since everyone ends up happily ever after anyway.

For some reason the quote that stuck out most to me was made by Frank Churchill:

It is always the lady's right to decide on the degree of acquaintance.

I suppose that this is true. All the gentleman can do is ask, but the lady has the final say. I just found it strange since it seemed to imply that women have the upper hand in dating even in the 1800s.

I recall the quote from William Thackeray's *Vanity Fair*:

And oh, what a mercy it is that these women do not exercise their powers oftener! We can’t resist them, if they do. Let them show ever so little inclination, and men go down on their knees at once: old or ugly, it is all the same. And this I set down as a positive truth. A woman with fair opportunities, and without an absolute hump, may marry WHOM SHE LIKES. Only let us be thankful that the darlings are like the beasts of the field, and don’t know their own power. They would overcome us entirely if they did.

Another author asserts that women have power over men.

In light of Donald Trump's election, I've been thinking a lot about misogny and feminism. It's clear that women put up with a lot of misogny and objectification from Donald Trump's comments, the Harvard Mens Soccer Team's "Scouting Report", and the harsh judgement lopped on Hilary Clinton for everything from her looks to the way she speaks. I could offer even more evidence like personal anecdotes and the gender pay gap, but this post would go on forever.

Despite all the overwhelming evidence of the challenges that women face, many men and even some women don't find much to like in feminism, for Jill Filipovic writes in The Men Feminists Left Behind that

...young women are soaring, in large part because we are coming of age in a kind of feminist sweet spot: still exhibiting many traditional feminine behaviors — being polite, cultivating meaningful connections, listening and communicating effectively — and finding that those same qualities work to our benefit in the classroom and workplace, opening up more opportunities for us to excel.

The fact of the matter is that while men like Donald Trump and the Harvard Mens Soccer team exhibit despicable behavior, most men are not billionaires or star athletes and are not in the position of power to get away with such actions. Even if we do harbor such hateful attitudes, we're not in a position to act on them, which we may feel absolves us of our guilt. Thus, it can feel that feminist are falsely accusing us of wrongdoing.

On the contrary, a typical young man's interaction with women often puts him in a losing position:

- if the woman is a classmate, she probably has a higher grade as women do much better in school on average.
- in dating, it's rejection after rejection for men. You might go on a few dates, pay for a couple of dinners, and never receive an answer to a text, and
- while female body image problems receive the most attention, male attractiveness is actually judged more harshly according to Dataclysm, where Christian Rudder writes,
When you consider the supermodels, the porn, the cover girls, the Lara Croft–style fembots, the Bud Light ads, and, most devious of all, the Photoshop jobs that surely these men see every day, the fact that male opinion of female attractiveness is still where it’s supposed to be is, by my lights, a small miracle. It’s practically common sense that men should have unrealistic expectations of women’s looks, and yet here we see it’s just not true.

Now, I know many women will tell me that a lot of those men are jerks and deserved to be rejected. All they really wanted was sex. They'll certainly have personal anecdotes of being used for sex or being ghosted themselves. There seems to be some type of selection bias, however, where women focus on men in positions of power. According to Robin W. Simon in Teaching Men to Be Emotionally Honest, boys are "more invested in ongoing romantic relationships," so the typical male actually takes heart break more severely but is left with no outlet for his emotions because men don't feel safe to be emotionally vulnerable.

So, maybe Jane Austen and William Thackeray have a point. Many men can feel rather oppressed and might even see misogyny as somewhat justified on that account. Objectification of women becomes a sort of defense mechanism to make rejection more bearable, for it's easier to take rejection from someone you don't respect. I know that at times, I've had these thoughts.

Now, to be perfectly clear, I no longer feel this way. The issues that women face are very real, and while I will probably never fully understand them, I know there are very real forces of oppression that women fight against, and that feminism is necessary. I wrote this so that women might understand where some men are coming from. I just thought that there could be better understanding between the two sides because it's important for us men to fight for gender equality, too.

Men actually have a lot to gain from gender equality. While women might feel forced to be spend a lot of time on childcare, many men feel forced into careers they might not have chosen otherwise, but for the pressure to provide for a family. In the United States, men commit suicide much more often, and use of mental health resources may be a contributor, for men don't feel comfortable seeking help.

I guess it just occured to me that a lot of the animosity between the two sexes can be attributed to outmoded dating rituals, so maybe we can start there?

Enough about my abs. Back to more important stuff, you know, like math. I've been slowly working my way through Machine Learning: a Probabilistic Perspective.

Since my last post on this topic Nearest Neighbors and Discriminant Analysis, I've gotten to do some cool problems on spam classification, Spam classification using logistic regression and Spam classification with naive Bayes. It's somewhat surprising how logistic regression performs much better than Naive Bayes with less parameters (5.8% versus 11% misclassification rate). Of course, logistic regression is a discriminative model, while Naive Bayes is generative. Estimating the conditional distribution is in some sense a smaller, and hence, easier problem. Generative models do have some advantages, though, especially when there is missing data.

Other problems have been a bit of drag. Now that I'm reading about Latent linear models and Sparse linear models, I've been getting killed by matrix calculus. I've decided to write down some of the more useful identities as a reference to myself. As evidence of how tedious these exercises are, the derivations in the textbooks and solutions manual are riddle with errors.

Some resources:

- The Matrix Cookbook
- Appendix C of Bishop's Pattern Recognition and Machine Learning
- Chapter 4 and Equations 4.10 in particular of Murphy's Machine Learning: a Probabilistic Perspective
- Steven W. Nydick's slides, A Different(ial) Way

## Factor Analysis

The basic idea factor analysis model actually seemed quite intuitive when I first saw it. The idea is that underlying data is just a vector independent standard normals, that is, \begin{equation} \mathbf{z}_i \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_L). \end{equation} However, we actually observe \begin{equation} \mathbf{x}_i \mid \mathbf{z}_i \sim \mathcal{N}(\boldsymbol\mu + \mathbf{W}\mathbf{z}_i, \boldsymbol\Psi), \end{equation} where $\boldsymbol\Psi$ is diagonal.

In general, I'll denote observed values with $\mathbf{x}_i$ and hidden variables with $\mathbf{z}_i$. Intuitively, I think of $\mathbf{z}_i$ as the "genetics." A single gene may affect many different traits eye color, height, hair color, bicep size, and intelligence. So if we observed 2 genes that affect those 5 traits, the entry $\mathbf{W}_{ij}$ is the effect that gene $j$ has on trait $i$. We'll denote the number of latent factors, the length of $\mathbf{z}_i$, as $L$, the number of observed factors, the length of $\mathbf{x}_i$ as $D$, and finally, the number of observations as $N$. Thus, $\mathbf{W}$ is a $D \times L$ matrix.

But why stop there? To generalize this model, we can consider a mixture of factor analysis models. Now, the underlying data is $(\mathbf{z}_i, q_i),$ where \begin{equation} q_i \sim \operatorname{Cat}(\pi_1,\pi_2,\ldots,\pi_K), \end{equation} and for each category $k \in \{1,2,\ldots,K\}$, we a separate $\mathbf{W}_k$ and $\boldsymbol\mu_k$, so that \begin{equation} \mathbf{x}_i \mid (\mathbf{z}_i, q_i = k) \sim \mathcal{N}(\boldsymbol\mu_k + \mathbf{W}_k\mathbf{z}_i, \boldsymbol\Psi). \end{equation}

One helpful way to view this is a graphical model, which I've included in the title picture. One can easily sort out the dependencies. The observed variables are shaded. The deterministic parameters are in diamonds. The latent factors are the unshaded circles with thin borders. The parameters that we are trying to estimate are given thick borders, so we want to estimate $\boldsymbol\theta = \left(\mathbf{W}_k, \boldsymbol\mu_k,\boldsymbol\Psi,\pi_k\right)$ in this case.

### Fitting the model with the EM Algorithm

From the graphical model, one can write down the probability or likelihood of the data, \begin{equation} L(\boldsymbol\theta) = p\left(\mathcal{D} \mid \mathbf{\theta}\right) = \prod_{i=1}^N \mathcal{N}\left(\mathbf{z}_i \mid \mathbf{0}, \mathbf{I}_L\right) \prod_{k^\prime=1}^K\left(\pi_{k^\prime} \mathcal{N}\left(\mathbf{x}_i \mid \mathbf{W}_k\mathbf{z}_i + \boldsymbol\mu_k, \boldsymbol\Psi\right) \right)^{I(q_i = k)}. \end{equation}

Typically, one fits models by maximizing this likelihood, or equivalently, the log-likelihood. The problem is that we can't evaluate the function if we don't know $\mathbf{z}_i$ and $q_i$. This is where the expectation–maximization (EM) algorithm comes in. We replace the unknown values with their expectation and then maximize. We do this iteratively until we achieve convergence.

Use the graphical model as a reference, we can plug in the values that are in diamonds or shaded, we are taking the expectation of the terms that involve variables in circles with a thin border, and we are choosing the values for the variables in thick borders such that the likelihood is maximized.

I won't go into the mathematical and convergence properties of this algorithm, but I'll show how it's performed. To take the expectation, we need some value for $\boldsymbol\theta$, so we choose some initial $\boldsymbol\theta_0$. At each iteration, we use $\boldsymbol\theta_l = \left(\pi_k^{(l)},\boldsymbol\mu_k^{(l)},\mathbf{W}_k^{(l)}, \boldsymbol\Psi^{(l)}\right)$ to create a better estimate $\boldsymbol\theta_{l + 1} = \left(\pi_k^{(l+1)},\boldsymbol\mu_k^{(l+1)},\mathbf{W}_k^{(l+1)}, \boldsymbol\Psi^{(l+1)}\right).$

Let's write the log-likelihood \begin{align} l(\boldsymbol\theta) &= \sum_{i=1}^N\log\mathcal{N}\left(\mathbf{z}_i \mid \mathbf{0}, \mathbf{I}_L\right) + \sum_{i=1}^N\sum_{k = 1}^K I(q_i = k)\left[\log\pi_{k} + \log\mathcal{N}\left(\mathbf{x}_i \mid \mathbf{W}_k\mathbf{z}_i + \boldsymbol\mu_k, \boldsymbol\Psi\right)\right] \label{eqn:loglikelihood}\\ &= \sum_{i=1}^N \left[-\frac{L}{2}\log 2\pi -\frac{1}{2}\mathbf{z}_i^\intercal\mathbf{z}_i \right] + \nonumber\\ &\sum_{i=1}^N\sum_{k = 1}^K I(q_i = k)\left[ \log\pi_{k} -\frac{N}{2}\log 2\pi -\frac{1}{2}\log |\boldsymbol\Psi| -\frac{1}{2}\left(\mathbf{x}_i - \mathbf{W}_k\mathbf{z}_i - \boldsymbol\mu_k\right)^\intercal{\boldsymbol\Psi}^{-1}\left(\mathbf{x}_i - \mathbf{W}_k\mathbf{z}_i- \boldsymbol\mu_k\right) \right]. \nonumber \end{align}

We need to eliminate anything with $q_i$ and $\mathbf{z}_i$ by taking expectation. Let's first start with $I(q_i = k).$

\begin{align} \mathbb{E}\left[I(q_i = k) \mid \mathbf{x}_i, \boldsymbol\theta_l\right] &= p(q_i = k \mid \mathbf{x}_i, \boldsymbol\theta_l) \nonumber\\ &= \frac{p(\mathbf{x}_i \mid q_i = k, \boldsymbol\theta_l)p(q_i = k \mid \boldsymbol\theta_l)}{\sum_{k^\prime = 1}^K p(\mathbf{x}_i \mid q_i = k^\prime, \boldsymbol\theta_l)p(q_i = k^\prime \mid \boldsymbol\theta_l)n} \label{eqn:Iq_initial} \end{align} by Bayes' rule.

Recall that $p(q_i = k \mid \boldsymbol\theta_l) = \pi_k^{(l)}$, and \begin{align} p(\mathbf{x}_i \mid q_i = k, \boldsymbol\theta_l) &= \int p(\mathbf{x}_i,\mathbf{z}_i \mid q_i = k, \boldsymbol\theta_l)d\mathbf{z}_i \nonumber\\ &= \int \mathcal{N}\left(\mathbf{x}_i \mid \mathbf{W}_k^{(l)}\mathbf{z}_i + \boldsymbol\mu_k^{(l)}, \boldsymbol\Psi^{(l)}\right)\mathcal{N}\left(\mathbf{z}_i \mid \mathbf{0}, \mathbf{I}_L\right)d\mathbf{z}_i \nonumber\\ &= \mathcal{N}\left( \mathbf{x}_i \mid \boldsymbol\mu_k^{(l)}, \boldsymbol\Psi^{(l)} + \mathbf{W}_k^{(l)}\left(\mathbf{W}_k^{(l)}\right)^\intercal\right) \label{eqn:px} \end{align} by Equation 4.126 of Murphy's textbook. Plugging in Equation \ref{eqn:px} into Equation \ref{eqn:Iq_initial}, we find that \begin{equation} r_{ik}^{(l)} = \mathbb{E}\left[I(q_i = k) \mid \mathbf{x}_i, \boldsymbol\theta_l\right] = \frac{\pi_k^{(l)}\mathcal{N}\left( \mathbf{x}_i \mid \boldsymbol\mu_k^{(l)}, \boldsymbol\Psi^{(l)} + \mathbf{W}_k^{(l)}\left(\mathbf{W}_k^{(l)}\right)^\intercal\right)} {\sum_{k^\prime=1}^K\pi_{k^\prime}^{(l)}\mathcal{N}\left( \mathbf{x}_i \mid \boldsymbol\mu_{k^\prime}^{(l)}, \boldsymbol\Psi^{(l)} + \mathbf{W}_{k^\prime}^{(l)}\left(\mathbf{W}_{k^\prime}^{(l)}\right)^\intercal\right)}. \label{eqn:Iq} \end{equation}

Next, we need to take care of the terms with $\mathbf{z}_i$. To do this, we find the conditional distribution for $\mathbf{z}_i$. Note that we can condition on both $\mathbf{x}_i$ and $q_i$ since we only care about the $\mathbf{z}_i$ terms multiplied by $I(q_i = k)$, for the other $\mathbf{z}_i$ terms disappear when we take the derivative with respect $\pi_k$, $\mathbf{W}_k$, $\boldsymbol\mu_k$, or $\boldsymbol\Psi$.

If we note that $\mathbf{x}_i \mid (\mathbf{z}_i, q_i = k, \boldsymbol\theta_l) \sim \mathcal{N}\left(\mathbf{W}_k^{(l)}\mathbf{z}_i + \boldsymbol\mu_k^{(l)}, \boldsymbol\Psi^{(l)}\right)$, and $\mathbf{z}_i \mid (q_i = k, \boldsymbol\theta_l) \sim \mathcal{N}\left(\mathbf{0}, \mathbf{I}_L\right),$ \begin{align} p(\mathbf{z}_i \mid \mathbf{x}_i, q_i = k,\boldsymbol\theta_l) &= \mathcal{N}\left(\mathbf{z}_i \mid \mathbf{m}_{ik}^{(l)}, \boldsymbol\Sigma_{ik}^{(l)}\right), \\ \text{where}~\boldsymbol\Sigma_{ik}^{(l)} &= \left(\mathbf{I}_L + \left(\mathbf{W}_k^{(l)}\right)^\intercal\left(\boldsymbol\Psi^{(l)}\right)^{-1}\mathbf{W}_k^{(l)}\right)^{-1} \nonumber\\ \text{and}~\mathbf{m}_{ik}^{(l)} &= \boldsymbol\Sigma_{ik}^{(l)} \mathbf{W}_k^{(l)}\left(\boldsymbol\Psi^{(l)}\right)^{-1}\left(\mathbf{x}_i - \boldsymbol\mu_k^{(l)}\right)\nonumber \end{align} by Equation 4.125 in Murphy's textbook.

Now, let's do a couple of things to clean up notation. First, to simply Equation \ref{eqn:loglikelihood}, we'll drop all terms that aren't functions of $\boldsymbol\theta$, so we have \begin{equation} \tilde{l}(\boldsymbol\theta) = \sum_{i=1}^N\sum_{k = 1}^K I(q_i = k)\left[ \log\pi_{k} -\frac{1}{2}\log |\boldsymbol\Psi| -\frac{1}{2}\left(\mathbf{x}_i - \mathbf{W}_k\mathbf{z}_i - \boldsymbol\mu_k\right)^\intercal{\boldsymbol\Psi}^{-1}\left(\mathbf{x}_i - \mathbf{W}_k\mathbf{z}_i- \boldsymbol\mu_k\right) \right]. \label{eqn:loglikelihood1} \end{equation}

In the next step, we define \begin{equation} \tilde{\mathbf{z}}_i = \begin{pmatrix} \mathbf{z}_i \\ 1 \end{pmatrix},~\text{and}~ \tilde{\mathbf{W}}_k = \begin{pmatrix} \mathbf{W}_k & \boldsymbol\mu_k \end{pmatrix}. \end{equation} Now $\boldsymbol\theta = (\tilde{\mathbf{W}}_k, \boldsymbol\Psi, \pi_k)$, and we can rewrite Equation \ref{eqn:loglikelihood1} as \begin{align} \tilde{l}(\boldsymbol\theta) &= \sum_{i=1}^N\sum_{k = 1}^K I(q_i = k)\left[ \log\pi_{k} -\frac{1}{2}\log |\boldsymbol\Psi| -\frac{1}{2}\left(\mathbf{x}_i - \tilde{\mathbf{W}}_k\tilde{\mathbf{z}}_i\right)^\intercal{\boldsymbol\Psi}^{-1}\left(\mathbf{x}_i - \tilde{\mathbf{W}}_k\tilde{\mathbf{z}}_i\right) \right] \nonumber\\ &= \sum_{i=1}^N\sum_{k = 1}^K I(q_i = k)\left[ \log\pi_{k} -\frac{1}{2}\log |\boldsymbol\Psi| -\frac{1}{2}\left( \mathbf{x}_i^\intercal\boldsymbol\Psi^{-1}\mathbf{x}_i - 2\mathbf{x}_i^\intercal\boldsymbol\Psi^{-1}\tilde{\mathbf{W}}_k\tilde{\mathbf{z}}_i + \tilde{\mathbf{z}}_i^\intercal\tilde{\mathbf{W}}_k^\intercal\boldsymbol\Psi^{-1}\tilde{\mathbf{W}}_k\tilde{\mathbf{z}}_i \right) \right] \nonumber\\ &= \sum_{i=1}^N\sum_{k = 1}^K I(q_i = k)\left[ \log\pi_{k} -\frac{1}{2}\log |\boldsymbol\Psi| -\frac{1}{2}\left( \mathbf{x}_i^\intercal\boldsymbol\Psi^{-1}\mathbf{x}_i - 2\mathbf{x}_i^\intercal\boldsymbol\Psi^{-1}\tilde{\mathbf{W}}_k\tilde{\mathbf{z}}_i + \operatorname{tr}\left(\tilde{\mathbf{W}}_k^\intercal\boldsymbol\Psi^{-1}\tilde{\mathbf{W}}_k\tilde{\mathbf{z}}_i\tilde{\mathbf{z}}_i^\intercal \right)\right) \right] \end{align} by cyclic property the trace.

Now, we note that \begin{equation} \tilde{\mathbf{z}}_i \mid \left(\mathbf{x}_i, q_i = k,\boldsymbol\theta_l\right) \sim \mathcal{N}\left( \begin{pmatrix} \mathbf{m}_{ik}^{(l)} \\ 1 \end{pmatrix}, \begin{pmatrix} \boldsymbol\Sigma_{ik}^{(l)} & \mathbf{0} \\ \mathbf{0}^\intercal & 0 \end{pmatrix} \right). \end{equation} Using this, we'll have that \begin{align} \mathbf{b}_{ik}^{(l)} &= \mathbb{E}\left[\tilde{\mathbf{z}}_i \mid \mathbf{x}_i ,q_i = k, \boldsymbol\theta_l\right] =\begin{pmatrix} \mathbf{m}_{ik}^{(l)} \\ 1 \end{pmatrix} \\ \mathbf{C}_{ik}^{(l)} &= \mathbb{E}\left[\tilde{\mathbf{z}}_i\tilde{\mathbf{z}}_i^\intercal \mid \mathbf{x}_i ,q_i = k, \boldsymbol\theta_l\right] = \begin{pmatrix} \boldsymbol\Sigma_{ik}^{(l)} & \mathbf{b}_{ik}^{(l)} \\ \left(\mathbf{b}_{ik}^{(l)}\right)^\intercal & 1 \end{pmatrix}. \end{align}

Thus, E-step becomes writing our objective function as \begin{equation} Q_{\boldsymbol\theta_l}(\boldsymbol\theta) = \sum_{i=1}^N\sum_{k = 1}^K r_{ik}^{(l)}\left[ \log\pi_{k} -\frac{1}{2}\log |\boldsymbol\Psi| -\frac{1}{2}\left( \mathbf{x}_i^\intercal\boldsymbol\Psi^{-1}\mathbf{x}_i - 2\mathbf{x}_i^\intercal\boldsymbol\Psi^{-1}\tilde{\mathbf{W}}_k\mathbf{b}_{ik}^{(l)} + \operatorname{tr}\left(\tilde{\mathbf{W}}_k^\intercal\boldsymbol\Psi^{-1}\tilde{\mathbf{W}}_k\mathbf{C}_{ik}^{(l)} \right)\right) \right]. \label{eqn:estep} \end{equation}

We want to maximize $Q$ to obtain \begin{equation} \boldsymbol\theta_{l+1} = \operatorname*{arg\,max}_{\boldsymbol\theta} Q_{\boldsymbol\theta_l}(\boldsymbol\theta) \end{equation} for the M-step. This can be done by taking derivatives.

For $\pi_k$ this, is not that hard. Let $\pi_K = 1 - \sum_{k=1}^{K-1}\pi_k$, which gives us that \begin{equation} \frac{\partial}{\partial\pi_k}Q_{\boldsymbol\theta_l}(\boldsymbol\theta) = \sum_{i=1}^N \left(\frac{r_{ik}^{(l)}}{\pi_k} - \frac{r_{iK}^{(l)}}{\pi_K}\right) \end{equation} for $k < K$. Setting this equal to $0$, we can solve for $\hat{\boldsymbol\pi}$. \begin{align*} \hat\pi_K\sum_{i=1}^N r_{ik}^{(l)} &= \hat\pi_k\sum_{i=1}^N r_{iK}^{(l)} \\ \hat\pi_K\sum_{i=1}^N\left(r_{i1}^{(l)} + r_{i2}^{(l)} + \cdots + r_{i,K-1}^{(l)}\right) &= \left(\hat\pi_1 + \hat\pi_2 + \cdots + \hat\pi_{K-1}\right)\sum_{i=1}^N r_{iK}^{(l)} \\ \hat\pi_K\sum_{i=1}^N\left(1 - r_{iK}^{(l)}\right) &= \left(1 - \hat\pi_K\right)\sum_{i=1}^N r_{iK}^{(l)} \\ N\hat\pi_K - \hat\pi_K\sum_{i=1}^N r_{iK}^{(l)} &= \sum_{i=1}^N r_{iK}^{(l)} - \hat\pi_K\sum_{i=1}^N r_{iK}^{(l)} \\ \hat\pi_K &= \frac{1}{N}\sum_{i=1}^N r_{iK}^{(l)}. \end{align*}

By symmetry, we have that \begin{equation} \pi_k^{(l + 1)} = \hat\pi_k = \frac{1}{N}\sum_{i=1}^N r_{ik}^{(l)} \label{eqn:mpi} \end{equation} for any $k$.

For $\boldsymbol\Psi$, we need several matrix identities. The first one is \begin{equation} \frac{\partial}{\partial \mathbf{A}} \log |\mathbf{A}| = \left(\mathbf{A}^{-1}\right)^\intercal. \label{eqn:mat_det} \end{equation}

To see this, note that we can write $|\mathbf{A}| = \sum_{i=1}^N \left(-1\right)^{i+j}\mathbf{A}_{ij}\left|\mathbf{A}_{-i,-j}\right|$ for any $j$, so we have that \begin{align*} \frac{\partial}{\partial \mathbf{A}_{ij}} \log |\mathbf{A}| &= \frac{1}{|\mathbf{A}|}(-1)^{i+j}\left|\mathbf{A}_{-i,-j}\right| \\ &= \frac{1}{|\mathbf{A}|}\mathbf{C}_{ij} \\ &= \left(\mathbf{A}^{-1}\right)^\intercal_{ij}, \end{align*} where we have used the definition of the matrix inverse in terms of the adjugate matrix, and $\mathbf{C}$ is the cofactor matrix.

Next, we prove that \begin{equation} \frac{\partial}{\partial \mathbf{A}}\mathbf{x}^\intercal \mathbf{A}\mathbf{y} = \mathbf{x}\mathbf{y}^\intercal. \label{eqn:mat_quad} \end{equation} To see this we rewrite \begin{equation*} \mathbf{x}^\intercal \mathbf{A}\mathbf{y} = \operatorname{tr}\left(\mathbf{x}^\intercal \mathbf{A}\mathbf{y}\right) = \operatorname{tr}\left(\mathbf{A}\mathbf{y}\mathbf{x}^\intercal\right) = \sum_{i=1}^N\sum_{k=1}^N \mathbf{A}_{ik}y_kx_i, \end{equation*} which implies that \begin{equation*} \frac{\partial}{\partial \mathbf{A}_{ij}}\mathbf{x}^\intercal \mathbf{A}\mathbf{y} = x_iy_j. \end{equation*}

Now, the last trick is to rewrite $\log|\boldsymbol\Psi| = -\log\left|\boldsymbol\Psi^{-1}\right|$, and note that the MLE is preserved by parametrization. So, using Equations \ref{eqn:mat_det} and \ref{eqn:mat_quad}, we can take the derivative of Equation \ref{eqn:estep} with respect to $\boldsymbol\Psi^{-1}$ to get \begin{align*} \frac{\partial}{\partial\boldsymbol\Psi^{-1}}Q_{\boldsymbol\theta_l}(\boldsymbol\theta) &= \sum_{i=1}^N\sum_{k = 1}^K r_{ik}^{(l)}\left[ \frac{1}{2}\boldsymbol\Psi -\frac{1}{2}\left(\mathbf{x}_i\mathbf{x}_i^\intercal - 2\tilde{\mathbf{W}}_k\mathbf{b}_{ik}^{(l)}\mathbf{x}_i^\intercal + \tilde{\mathbf{W}}_k\mathbf{C}_{ik}^{(l)}\tilde{\mathbf{W}}_k^\intercal\right) \right]. \end{align*} If we set this equal to $0$, we find that \begin{align*} \hat{\boldsymbol\Psi} = \frac{1}{N}\sum_{i=1}^N\sum_{k = 1}^K r_{ik}^{(l)}\left( \mathbf{x}_i\mathbf{x}_i^\intercal - 2\tilde{\mathbf{W}}_k\mathbf{b}_{ik}^{(l)}\mathbf{x}_i^\intercal + \tilde{\mathbf{W}}_k\mathbf{C}_{ik}^{(l)}\tilde{\mathbf{W}}_k^\intercal \right), \end{align*} where we have used the symmetry of $\boldsymbol\Psi$.

We have two issues. $\boldsymbol\Psi$ is suppose to be diagonal, and we need to choose a value for $\tilde{\mathbf{W}}_k$. To enforce the diagonal constraint, we just take the diagonal of $\hat{\boldsymbol\Psi}$ since that we could have just taken derivatives component-wise. For $\tilde{\mathbf{W}}_k$, we use $\tilde{\mathbf{W}}_k^{(l+1)}$, which turns out to not depend on $\boldsymbol\Psi,$ so we have \begin{equation} \boldsymbol\Psi^{(l+1)} = \frac{1}{N}\sum_{i=1}^N\sum_{k = 1}^K r_{ik}^{(l)}\operatorname{diag}\left( \mathbf{x}_i\mathbf{x}_i^\intercal - 2\tilde{\mathbf{W}}_k^{(l+1)}\mathbf{b}_{ik}^{(l)}\mathbf{x}_i^\intercal + \tilde{\mathbf{W}}^{(l+1)}_k\mathbf{C}_{ik}^{(l)}\left(\tilde{\mathbf{W}}_k^{(l+1)}\right)^\intercal \right). \label{eqn:mpsi} \end{equation}

Now, we need to deal with the various $\tilde{\mathbf{W}}_k$. I'm not going to prove this identity, but we have that \begin{equation} \frac{\partial}{\partial \mathbf{X}}\operatorname{tr}\left(\mathbf{X}^\intercal \mathbf{B} \mathbf{X}\mathbf{C}\right) = \mathbf{B}\mathbf{X}\mathbf{C} + \mathbf{B}^\intercal \mathbf{X} \mathbf{C}^\intercal, \label{eqn:trw} \end{equation} which is Equation 117 in the The Matrix Cookbook. Taking the derivative of Equation \ref{eqn:estep} with respect to $\tilde{\mathbf{W}}_k$, we get \begin{align} \frac{\partial}{\partial\tilde{\mathbf{W}}_k}Q_{\boldsymbol\theta_l}(\boldsymbol\theta) &= -\frac{1}{2}\sum_{i=1}^N r_{ik}^{(l)} \left( 2\boldsymbol\Psi^{-1}\mathbf{x}_i\left(\mathbf{b}_{ik}^{(l)}\right)^\intercal - \boldsymbol\Psi^{-1}\tilde{\mathbf{W}}_k\mathbf{C}_{ik}^{(l)} - \boldsymbol\Psi^{-1}\tilde{\mathbf{W}}_k\mathbf{C}_{ik}^{(l)} \right) \nonumber\\ &= \sum_{i=1}^N r_{ik}^{(l)} \left(\boldsymbol\Psi^{-1}\tilde{\mathbf{W}}_k\mathbf{C}_{ik}^{(l)} - \boldsymbol\Psi^{-1}\mathbf{x}_i\left(\mathbf{b}_{ik}^{(l)}\right)^\intercal \right), \end{align} where we have used the symmetry of $\boldsymbol\Psi$ and $\mathbf{C}_{ik}^{(l)}$. Setting this equal to $0$, we find that \begin{equation} \tilde{\mathbf{W}}_k^{(l+1)} = \left(\sum_{i=1}^N r_{ik}^{(l)}\mathbf{x}_i\left(\mathbf{b}_{ik}^{(l)}\right)^\intercal\right) \left(\sum_{i=1}^N r_{ik}^{(l)}\mathbf{C}_{ik}^{(l)} \right)^{-1}. \label{eqn:mw} \end{equation}

All in all, we combine Equations \ref{eqn:mpi}, \ref{eqn:mpsi}, and \ref{eqn:mw} to get $\boldsymbol\theta_{l+1}$, so our M-step is \begin{align} \pi_k^{(l + 1)} &= \hat\pi_k = \frac{1}{N}\sum_{i=1}^N r_{ik}^{(l)} \\ \tilde{\mathbf{W}}_k^{(l+1)} &= \left(\sum_{i=1}^N r_{ik}^{(l)}\mathbf{x}_i\left(\mathbf{b}_{ik}^{(l)}\right)^\intercal\right) \left(\sum_{i=1}^N r_{ik}^{(l)}\mathbf{C}_{ik}^{(l)} \right)^{-1} \\ \boldsymbol\Psi^{(l+1)} &= \frac{1}{N}\sum_{i=1}^N\sum_{k = 1}^K r_{ik}^{(l)}\operatorname{diag}\left( \mathbf{x}_i\mathbf{x}_i^\intercal - 2\tilde{\mathbf{W}}_k^{(l+1)}\mathbf{b}_{ik}^{(l)}\mathbf{x}_i^\intercal + \tilde{\mathbf{W}}^{(l+1)}_k\mathbf{C}_{ik}^{(l)}\left(\tilde{\mathbf{W}}_k^{(l+1)}\right)^\intercal \right) . \end{align}

## Principal Component Analysis

Now, $\boldsymbol\Psi$ begin diagonal is already a pretty strong assumption. Principal Component Analysis (PCA) makes the even stronger assumption that $\boldsymbol\Psi = \sigma^2 \mathbf{I}_D$. For this reason it's much easier to compute. All it takes is a singular value decomposition (SVD).

A nice application of these methods is Latent Semantic Indexing. Here, we've represented 9 documents as a word count vector. There are nearly 500 words. Using only 2 latent factors, we can cluster the documents.

Documents 1, 2, and 3 are about alien abductions.

Unfortunately, the assumption that the variance does not differ between the dimensions can be a bad one. In Probabilistic Principal Component Analysis versus Factor Analysis, we see how PCA fails to capture the relationship between $x_1$ and $x_2$ since it reduces the variance by focusing on $x_3$.

Back in 2011, I started eating Paleo, and briefly became famous with my story Quitting Rice. While I was pretty hardcore at first, I've been more 80/20, now. Perhaps, the first thing you'll notice is that I've hardly any made any gains in 5 years ☹. Such is the sad life of a natty lifter. I'm actually significantly stronger, but it doesn't show. I'm actually about 15 lbs heavier in that July 2016 picture.

Now, on the plus side, I have achieved long-term weight loss. According to this article from the The American Journal of Clinical Nutrition, only about 20% of people that lose weight keep it off. It's not clear whether that's because the diets stop working, or people can't adhere to them long-term. Despite significant life changes, I've found this diet rather easy to maintain, however.

May 2013: doing some yard work

Since I've started Paleo, I've had 3 different jobs, earned a graduate degree, and lived in 3 different cities: Boston, Philadelphia, and Seattle. I've taken plenty of vacations, which include Vietnam, Thailand, and Australia. Even when I barely lifted one year, I managed to stay lean.

2014: at a beach in Australia

Thus, in my $N = 1$ experiment, I've decided that Paleo works long-term. Cooking is not so hard, and I find that I can pretty much each as much as I want and not even think about it. Despite availing myself of Google's limitless food, I haven't really gained weight, here, in Seattle.

Of course, there's the other problem of eating all that cholesterol and saturated fat. It would seem that I've headed towards coronary heart disease. And depending on how you interpret my lipid panel, you may be right.

From August 2016

While my triglycerides and HDL are great, that red LDL certainly doesn't look good. I'm not a doctor, so I don't really know what to make of it. There are some studies saying that LDL number may not even be correct because my triglycerides are so low (see: The impact of low serum triglyceride on LDL-cholesterol estimation). It's strange that eating lots of trigylcerides leads to very low trigylcerides. I probably get 60% of my calories from fat, with most of that fat being saturated.

Moreover, in Comparison of serum lipid values in patients with coronary artery disease at 70 years of age, they say that:

Triglycerides and ratio of triglycerides to HDL cholesterol were the most powerful, independent variables related to precocity of CAD.

According to their model, my risk of coronary artery disease should be very low, so who knows? Is it not common knowledge that lots of LDL is bad?

I guess my insulin levels are fine, too? Blood pressure was 110/72 for what it's worth.

Anyway, if you've never given Paleo a try. I highly recommend it. Besides the weight loss, there are plenty of other benefits like not being hungry as often and more stable energy levels. I'm not a medical professional or dietician, but I'd be happy to answer any questions about the lifestyle. Looking at all these pictures of myself, I've realized that the light plays cruel tricks on the eye.

Go West, young man, go West and grow up with the country.

- Horace Greeley

Despite being said in 1865, there has never been so many people heeding Mr. Greeley's advice as there are now. Every week, I find that one of my college classmate's has decided to go West. And why not? There's riches to be made thanks to a booming tech industry, and the weather is good. Unlike the olden days, where you had to suffer the Oregon Trail, there's not much risk at all.

As it turns out, when your hobbies consist of math, reading, and picking heaving things up and putting them down, every city is virtually the same. Despite Seattle being colder and cloudier and having nice mountains like Mount Shuskan picture above, I don't feel markedly different than I did in Philadelphia.

The biggest change in my life has been going from living in a house with 6 guys to having my own bedroom with only 1 roommate. With the quiet, I definitely find myself getting more work done, but I'll miss having the guys around for sure.

Some mint chocolate chip ice cream that I made with Masato before I left. See The Only Ice Cream Recipe You’ll Ever Need for the recipe.

Now, that I've moved several times in my life (Hatfield, PA $\rightarrow$ Durham, NC $\rightarrow$ Cambridge, MA $\rightarrow$ Philadelphia, PA $\rightarrow$ Seattle, WA), I've started to reflect on any regrets and what I miss. Certainly, there are all those restaurants and eats that I forgot to try. I never did eat at Craigie on Main or try a cheesesteak from Pat's or Gino's. There are missed opportunities like never having gone to the top of the Chapel or taking a certain class. However, what always haunts me the most are the people that I wish that I had gotten to know better. It always seemed that so many connections were missed. People were just busy, feelings were misinterpreted, or the timing was just bad, and as a result, nothing ever happened. Of course, there's the very real possibility that the people that I wanted to get to know better had no interest in getting to know me, so maybe, I'm just talking nonsense.

There was some part of me that did want to stay in Philadelphia, but no opportunity ever came up to do so. It's probably best that I left, though. While I was comfortable, my life did seem to be stagnating in some respects. In the month leading up to my going away, I found myself mostly just watching Korean dramas with my brother. That is to say, I wasn't accomplishing much of anything useful with my time, and with the exception of the guys in the house, I didn't have much other community. Essentially, I was a ghost. Perhaps, moving to a new city will reinvigorate me.

A couple of weeks ago, Michael Vo needed to cook for a Renewal College Fellowship (RCF) potluck, and I decided to help out a bit. Mike is mostly known for his famous chicken alfredo, so it was no surprise at all when his spaghetti and meatballs turned out to be *the best spaghetti and meatballs ever* according to my brother.

We based it off of the recipe Spaghetti and Drop Meatballs With Tomato Sauce. Now, there are a couple of modifications needed for this recipe. First to serve 4, you'll need at least 24 ounces of meat, not 12, and therefore, an extra egg. We also thought that it would be a good idea to use the scrapings from the bottom of the skillet and the oil from searing the meatballs and mix it into the sauce. Mike always goes big, and we ended up quadrupling the recipe and using 7 pounds of ground beef. Now, this massive quantity required special techniques to preserve the oil and scrapings at the bottom of the skillet. After every batch of meatballs, we needed to scrape the skillets and run the oil through a sieve.

Here is a picture of the second part, where we cooked the meatballs in the sauce.

The only complaint that some people had about the recipe was the mixing in of the cheese into the meatballs. It was a matter of opinion whether this tasted good or bad. I personally like it.

Anyway, if you ever find yourself in the deserts of West Philly, you should do yourself a favor a stop by the BAD house to get some of Michael Vo's Spaghetti and Meatballs for nourishment.