Random Walk

Team Members: Tanmay Shah, Alan Liu, Scott Ni, Benjamin Zhuang

What is random walk?

Random Walk is the process of randomly-moving objects wandering away from their starting points in an n-dimensional space. We call random walk a discrete-time stochastic process (DTSP) of a sequence of random variables $X_0, X_1, ..., X_n$. It is also a very special case of a more general Markov chain process. At a certain time step, the probability of the randomly-moving objects ending up at a certain state is always fixated.

The term 'Random walk' was first suggested by Karl Pearson in 1905 to model mosquito infestation in a forest, and further work on the topic was done by Lord Rayleigh and Louis Bachelier. The big breakthrough came in the same year (1905) when Einstien modeled Brownian motion as random walks.

Random Walks are important for us to understand as they can be found ubiquitously in real-life models. For example, they underpin the brownian motion that allow the air particles to flow around us or in containers randomly. In fact, they are especially useful when we want to model games (e.g. baseball games, chess, betting, etc.) and stock market. A famous application of Random Walks is the Gambler's Ruin Problem, which involves a gambler tossing an unfair coin with probability $p$ of winning \$$1$ and $1-p$ of losing \$1. They start with \$1 and they either become infinitely rich, or become infinitely bankrupt (ruined) as the game runs forever. The deciding factor of their fate is the value of $p$, where $p > 0.5 $ will lead to them getting all the wealth, while a $p \le 0.5$ will lead to their ruin. Here, the path to their infinite fortune/ruin is exactly modeled by a random walk, a sequence of wins and loses from tossing the coin.

How do we describe random walk in mathmetical terms?

An elementary example of a random walk is the random walk on the 1-dimensional interger axis $\mathbb{Z}$. This random walk starts at 0 and moves +1/-1 at each time step with equal probability. Consider a random variable $X$ such that $$ P(X) = \begin{cases} \frac{1}{2}, & X = +1 \\ \frac{1}{2}, & X = -1 \\ \end{cases} $$ To take a real life example, we model a fair coin toss with $5$ flips as a random walk, we move $+1/-1$ with a head/tail respectively. The sum of flips could land on $-5, -3, -1, 1, 3, 5$. With $3$ heads and $2$ tails, in any order, the sum of flips will end on $1$. There are $10$ ways of landing on $1$ (by $3$ heads and $2$ tails, $C(5, 2)$), $10$ ways of landing on $-1$ (by $2$ heads and $3$ tails), $5$ ways of landing on $3$ (by $4$ heads and $1$ tail, $C(5, 4)$), $5$ ways of landing on $-3$ (by $1$ head and $4$ tails), $1$ way of landing on $5$, and $1$ way of landing on $-5$.
We define this simple random walk as $n$ independent random variables $X_1, ..., X_n$ and set $S_0 = 0$ and $S_n = \sum_{i=1}^{n} X_i$. The series $\{S_n\}$ gives the net distnace walked, and it's called a simple random walk on $\mathbb{Z}$. By linearity of expectation, $$ \begin{align}\mathbb{E}(S_n) &= \sum\limits_{i = 0}^{n} \mathbb{E}(X_i) \\ &= \sum\limits_{i = 0}^{n} X_i(x) \cdot P_i(x) \\ &= n \cdot (0.5 \cdot 1 + 0.5 \cdot -1) \\ &= 0\end{align} $$ In other words, we expect the random walk should return to where it has started after $n$ steps.

Random Walk 1D

The following visualization simulates a single random walk spanning $n$ steps. The circle represents the sum of coin flips and the bars record the number of occurrences that the circle has passed the point. You can also observe a smoother probability distribution by toggling the kernel density estimation (kde) button and adjusting the bandwidth. The number on each bar represents the likelihood of reaching the corresponding position for the instance of the currently running random walk.

Stationary Distribution

In the following visualization, we simulate $1000$ trials of 1D random walk given $n$ steps to walk and record each of net distance walked (end locations). By Central Limit Theorem, if we have $X_1, ..., X_n$ independent random variables summed up, their overall normalized distribution usually tends towards a normal distribution. Here, we can observe the probability distribution of the 1000 end locations are always approximately normal.

In the below visualization, provide the number of steps each random walk and press submit. By default, we have $n=1$ . Random walk indexed (0 to 999, 1000 total) will be plotted. The number on each bar represents the likelihood of that location being the end location for the random walk, given the number of random walks plotted. By varying the values, it can be observed that the visualization would be more refined for larger values of $n$ , and it would resemble a normal distribution.

In previous example, we expect a simple random walk return to where it has started after $n$ steps. If we repeat this experiment many times, denote $\overline{S_n} = \overline{X_1} + ... + \overline{X_2}$ as the mean net distance traveled, we'd have something like $\overline{S_n} = 0 + ... + 0 = 0$ because for each random walk, it's equally likely to move +1 or -1. But how far would it actually travel after $n$ steps in most scenarios? Instead of modeling the mean net distance traveled $\overline{S_n}$, we model the mean squared net distance traveled, $$ \begin{align} \overline{S_n^2} &= \overline{(X_1 + ... + X_n)^2} \\ &= \overline{(X_1 + ... + X_n) \cdot (X_1 + ... + X_n)} \\ &= (\overline{X_1^2} + ... + \overline{X_n^2}) + 2(\overline{X_1 X_2} + ... + \overline{X_1 X_n} + \overline{X_2 X_3} + ... ) \end{align} $$ If we think about what does $\overline{X_1^2}$ evaluates to, because X_1 can be either +1 or -1, $X_1^2 = 1$ and $\overline{X_1^2} = 1$. Therefore, any squared term will have an average value of 1. Since $X_1$ and $X_2$ both have equal probability moving +1 and -1, $\overline{X_1 X_2} = 0$, and the same goes for any term containing two independent random variables. Therefore, $$ \overline{S_n^2} = (1 + ... + 1) + 2 (0 + ... + 0) = n $$ If we take the square root of $\overline{S_n^2}$, $\sqrt{\overline{S_n^2}} = \sqrt{n}$. In other words, we expect that the random walk would achieve a distance of roughly $\sqrt{n}$ from the origin after $n$ steps on average.

As seen above, the curve versus number of steps taken of net displacement is parabolic, but of the net displacement squared is a straight line. Thus, the net displacement across a large number of random walks is proportional to the square root of the number of steps taken. In our case, we start from location 0, meaning we expect the object to reach roughly around $ - \sqrt{n} $ or $\sqrt{n}$ after $n$ steps on an average.

Random Walk 2D

Random walk can also be in 2-dimensional space, or even higher dimensions. The actual 2D random walk can travel any direction with any distance in a 2D space at a time (the angle of movement from X axis is sampled from a distribution over $[0,2 \pi]$ and step length can be sampled from any other distribution). But for simplicity, we visualize a 2D random walk in a cartesian coordinate system with our movement being in straight lines signifying a unit step in one of the four directions.

The below visualization shows a 2D random walk for 50 steps. You can assign probabilities of moving in any of the four directions (North is up), submit them and simulate the random walk.

Application: Image Segmentation

Random walks can be helpful in segmenting images - where by we do a biased random walk on the pixels of an image to seperate its objects and background. The pixels are modeled as a weighted graph and only the neighboring pixels are connected to each other. The edges represent the feature similarity (e.g. intensity, color, etc.) among neighboring pixels.
At the beginning, the user will interactively select a small set of markers, distinguishing the objects and background of the image. Then, the algorithm will randomly walk on the unlabeled pixels, starting from the known labels. The highest probability of random walkers ending up at each unlabeld pixel will determine its label. The more initial markers we have, the better segmentation will be drawn. Depending on the image, a set of labels would correspond to the foreground of the image and the remaining set is the background (Eg. labels 4 and 5 for the leaf image covers the majority of the foreground/leaf and other labels can be considered background).
Because the image segmentation algortihm is implemented in Python and because we do not have a backend in this project, we are unable to let the user select their own labels nor show the intermediate results of random walk. Instead, we defined our own set of labels for each image, containing three levels of granularity. A lower level of granularity corresponds to a smaller set of initial markers, and the segmentation result should be worse.

Below, you can choose the image and segmentation granularity. Results are generated using the random walk algorithm described above. The first image displayed from the left is the orignial image, followed by assigned markers and finally the segmented image. From the dropdown, you can choose to view just the corresponding labeled portions of the particular segmented image. Note that to view the portion of a label, you have to clear the previously visualized label using the 'hide labels' button and then press 'show labels' again.

Appendix:

Credits:

This project was done by team members Benjamin Zhuang, Tanmay Shah, Alan Liu and Scott Ni. The introduction was drafted by Scott and Tanmay. Ben formulated the mathmetical explanation and 1D random walk visualization. The Stationary Distribution visualizations were created by Ben and Alan, with additions and documentation by Tanmay. The 2D random walk was worked on by Alan. The Image segmentation was worked on by Ben, Alan and Tanmay. Tanmay added rest of the documentation and fixes and Scott coordinated the development process across branches.

Sources:

Here are the Sources used in this project: