Stochastic Matrices and Markov Chains

Section 11.3 Stochastic Matrices and Markov Chains

Subsection 11.3.1 Graphs and Probability

Dynamical systems are good at modelling certainly probability situations. Like all dynamical systems, there is a vector \(v \in \RR^n\) of the states of the system. In a probability model, the states represent possibilities of the system, and in each time step, the system can transition from one state to another. (This is different from previous examples, where the states represented different but interacting quantities). The action of the matrix \(A\) represents the probability of shifting from one state to another in a time step. This is best illustrated by a graph with labelled edges. Figure 11.3.1 is an example with three vertices.

Figure 11.3.1. Probability Graph 1

In Figure 11.3.1, the arrows are labelled with the transition probabilities. This means that if the system is in vertex \(1\text{,}\) there is a \(20\%\) probability that it stays in vertex \(1\) (the arrow pointing from the vertex back to itself); there is a \(40\%\) probability that it transitions to vertex \(2\) (the arrow pointing from vertex \(1\) to vertex \(2\text{;}\) and a \(40\%\) probability that it transition to vertex \(3\) (the arrow pointing from vertex \(1\) to vertex \(3\)). The total probability of going out of vertex \(1\) is \(100\%\text{.}\) You can check the same is true for the other vertices: the outgoing probabilities always add to \(1\text{.}\) The incoming probability does not need to satisfy this.

Now let me write this as a matrix action. The vector \(v_k = \begin{pmatrix} x_k \\ y_k \\ z_k \end{pmatrix}\) is a probability vector: \(x_k\) is the probability that we are currently in vertex \(1\text{,}\) \(y_k\) for vertex \(2\) and \(z_k\) for vertex \(3\text{.}\) Since the total probability must be 1, the vector entries must satisfy \(x + y + z = 3\text{.}\) Then the matrix \(A\) acts on the vector, taking \(v_k\) to \(v_{k+1}\) in the next timestep. The entry \(A_{ij}\) in the matrix is the probability of transition from vertex \(j\) to vertex \(i\text{,}\) so the number on the arrow from \(j\) to \(i\text{.}\)

\begin{equation*} \begin{pmatrix} 0.2 \amp 0.5 \amp 0.2 \\ 0.4 \amp 0.2 \amp 0.4 \\ 0.4 \amp 0.3 \amp 0.4 \end{pmatrix} \begin{pmatrix} x_k \\ y_k \\ z_k \end{pmatrix} = \begin{pmatrix} x_{k+1} \\ y_{k+1} \\ z_{k+1} \end{pmatrix} \end{equation*}

The columns of this matrix add to \(1\text{.}\) That’s the same fact that I mentioned in the previous paragraph, that the outgoing probabilities from any vertex must add to one. The columns are all the outgoing probabilities from a vertex. The rows are all the incoming probabilities for a vertex.

Let me be even more explicit with this example. Say that we start on vertex \(2\) with probability \(1\text{.}\) This means the starting vector is \(v_0 = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix}\text{.}\) What happens to this as the system changes over a series of timesteps. Calculating the matrix action, here is the sequence of states. (I’ve not written the matrix actions, which I calculated by computer.)

\begin{align*} \amp v_0 = \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} \amp \amp v_1 = Av_0 = \begin{pmatrix} 0.5 \\ 0.2 \\ 0.3 \end{pmatrix} \\ \amp v_2 = A^2v_0 = \begin{pmatrix} 0.26 \\ 0.36 \\ 0.38 \end{pmatrix} \amp \amp v_3 = A^3v_0 = \begin{pmatrix} 0.308 \\ 0.328 \\ 0.364 \end{pmatrix} \\ \amp v_4 = A^4v_0 = \begin{pmatrix} 0.2984 \\ 0.3344 \\ 0.3672 \end{pmatrix} \amp \amp v_5 = A^5v_0 = \begin{pmatrix} 0.30032 \\ 0.33312 \\ 0.36656 \end{pmatrix}, \ldots \end{align*}

It looks like this might be stabilizing into a set list of probabilities, with a roughly \(30\%\) chance that we end up at vertex \(1\text{;}\) a roughly \(33\%\) chance that we end up back at vertex \(2\text{;}\) and a roughly \(37\%\) chance that we end up at vertex \(3\)

Hopefully this example helped to illustrate the idea of modelling probability on graph transitions. I’m going to take this example and turn it into a series of formal definitions.

Subsection 11.3.2 Definitions

I am going to start with formalizing the type of vector and the type of matrix that I used in the example.

Definition 11.3.2.

A vector \(v \in \RR^n\) is a probability vector or a stochastic vector if all its entries \(v_i\) are non-negative and all these entires sum to \(1\text{.}\)

Definition 11.3.3.

A non-negative \(n \times n\) matrix where all the columns sum to one is called a (left) stochastic matrix. (There is a similar definition of a right stochastic matrix where all the rows sum to one; it is, in fact, the more common definition. However, it doesn’t mesh that well with our conventions of column vectors and matrix actions, so we will work with left stochastic matrices and suppress the word ‘left’ in what follows.)

These vectors and matrices are the appropriate matrices for probability models for the following reason (which I present without proof for now).

Proposition 11.3.4.

Let \(A\) be a stochastic \(n \times n\) matrix and let \(v \in \RR^n\) be a stochastic vector. Then \(Av\) is also a stochastic vector.

Stochastic vectors are those which make sense for probabilities. This proposition tells us that the matrix action of a stochastic matrix preserves this, so the vectors can still be interpreted as probabilities after the matrix action. With this proposition, I can define the particular type of dynamical system I am presenting in this section.

Definition 11.3.5.

A Markov chain is a linear dynamical system where the the matrix \(A\) is a stochastic matrix, and the starting state \(v\) is a stochastic vector.

Now that they are defined, I want to understand the dynamics and the long term behaviour of Markov chains.

Subsection 11.3.3 Markov Longterm Behaviour

In the example that I did above, it looked like the probabilities were approaching a stable configuration. Can I know that they are doing so? Can I expect that they are always doing so? The answers to this depend on the long-term behaviour of the Markov chain, which I analyze using eigenvalues and eigenvectors.

A stochastic matrix has all non-negative entries, so it satisfies the weak form of Perron-Frobenius. It is not guaranteed to be irreducible. However, for the purposes of this course, I can mostly restrict to only irreducible stochastic matrices. Thinking in terms of probability graphs, like the starting example, the stochastic matrix will be irreducible as long as the graph is connected and has no dead ends: that is, there is a path from any vertex to any other vertex. If the graph is disconnected, there the various disconnected pieces are essentially independent probability problems, and I could treat them individually. I will have an example later with dead ends, which I’ll deal with when I get there. Otherwise, I will assume irreducible matrices.

The Perron-Frobenius theorem tells us that an irreducible stochastic matrix will have a unique largest positive eigenvalue with a matching vector with all positive entries. However, since stochastic matrices are a very special matrix form, there is an even stronger result, which I present here without proof.

Proposition 11.3.6.

Let \(A\) be an irreducible stochastic matrix. Then the dominant eigenvalue is \(\lambda = 1\text{,}\) and all other eigenvalues (real or complex) have \(|\lambda| \lt 1\text{.}\)

Since I can scale the eigenvector \(v\) matching \(\lambda = 1\) as I wish, I can choose \(v\) such that it is a stochastic vector. Since all other eigenvalues have \(|\lambda| \lt 1\text{,}\) their contributions to the system will decay over time. The long term probability behaviour is given by the coefficients of this vector \(v\text{.}\) This is independent of the starting situation. Markov chains (at least with connected probability graphs) will stabilize to a fixed probability distribution regardless of the starting state (though it might take longer to converge depending on where we start).

This is a pretty good description of the long term behaviour of a Markov chain. I just need to calculate and interpret the dominant eigenvector.

Subsection 11.3.4 Markov Chain Examples

Example 11.3.7.

Figure 11.3.8. Probability Graph Example 2

Figure 11.3.8 shows a probability graph. Calculate the long term probability of ending up in each vertex.

Solution.

First, I need to write down the stochastic matrix.

\begin{equation*} \begin{pmatrix} 0 \amp 0.9 \amp 0.8 \\ 0.3 \amp 0 \amp 0 \\ 0.7 \amp 0.1 \amp 0.2 \end{pmatrix} \end{equation*}

Then I calculate (by computer) the dominant eigenvector. I scale the result the computer gave me to give a stochastic vector.

\begin{equation*} \begin{pmatrix} 1.09589 \\ 0.328786 \\ 1 \end{pmatrix} \mapsto \begin{pmatrix} 0.452 \\ 0.136 \\ 0.412 \end{pmatrix} \end{equation*}

This gives me the long-term probability distribution. There is an approximately \(45.2\%\) chance of ending up on vertex \(1\text{,}\) an approximately \(13.6\%\) chance of ending up on vertex \(2\text{,}\) and an approximately \(41.2\%\) chance of ending up on vertex \(3\text{.}\)

Example 11.3.9.

Figure 11.3.10. Probability Graph Example 3

Figure 11.3.10 shows a probability graph. (Any connections without lines between vertices have probability zero). Calculate the long-term probability of ending up in each vertex.

Solution.

First, I need to write down the stochastic matrix.

\begin{equation*} \begin{pmatrix} 0.1 \amp 0.3 \amp 0 \amp 0 \amp 0.4 \\ 0.6 \amp 0.3 \amp 0.8 \amp 0 \amp 0.5 \\ 0 \amp 0 \amp 0 \amp 0.3 \amp 0 \\ 0.1 \amp 0 \amp 0.2 \amp 0.2 \amp 0 \\ 0.2 \amp 0.4 \amp 0 \amp 0.5 \amp 0.1 \end{pmatrix} \end{equation*}

Then I calculate (by computer) the dominant eigenvector. (I scale the result given to me from the computer to get a stochastic vector.

\begin{equation*} \begin{pmatrix} 0.976673 \\ 1.59669 \\ 0.0395948 \\ 0.131983 \\ 1 \end{pmatrix} \mapsto \begin{pmatrix} 0.261 \\ 0.426 \\ 0.011 \\ 0.035 \\ 0.267 \end{pmatrix} \end{equation*}

This gives me the long-term probability distribution. The approximate percentage chance of ending at each vertex is, in order, \(26.1\%\text{,}\) \(42.6\%\text{,}\) \(1.1\%\text{,}\) \(3.5\%\) and \(26.7\%\text{.}\)

Subsection 11.3.5 Games of Chance

Probablistic models like Markov chains are very common in game theory. In this section, I want to look at very simple games of chance (though the theory extends well to more complicated games). I want to build to a useful (and possibly morally interesting) example called the Gambler’s Ruin.

I am going to model a pretty simple game of chance. In this game, there is a set stake that can be won or lost in each round (each time step, in the model). In each round, there is a winning and losing probability, which doesn’t depend on anything else. The player starts with some multiple of the stake, which will give the vertices on the graph. Then there is a stopping condition: the player stops either when they get to zero stakes, or they get to some winning number of stakes.

I’ll start with a winning condition of \(4\) stakes. That gives me a five-stake model: the player can be at \(0\text{.}\) \(1\text{,}\) \(2\text{,}\) \(3\) or \(4\) stakes. At \(0\) and \(4\) stakes, the game is finished; in the Markov chain, this is represented by a probability of \(1\) for staying at that vertex. For the middle three vertices, there is a probability of winning (going up a stake) and losing (going down a stake). Figure 11.3.11 shows the probability graph for this game, with \(W\) being the win probability and \(L\) being the loss probability.

Figure 11.3.11. Probability Graph for a Game of Chance

Here is the accompanying stochastic matrix.

\begin{equation*} \begin{pmatrix} 1 \amp L \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp L \amp 0 \amp 0 \\ 0 \amp W \amp 0 \amp L \amp 0 \\ 0 \amp 0 \amp W \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp W \amp 1 \\ \end{pmatrix} \end{equation*}

Now we need to work out the model with actual winning and losing percentages. First, I’ll start with a completly fair game so that \(W = 0.5\) and \(L = 0.5\text{.}\) Here is that stochastic matrix.

\begin{equation*} \begin{pmatrix} 1 \amp 0.5 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0.5 \amp 0 \amp 0.5 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.5 \amp 1 \\ \end{pmatrix} \end{equation*}

I asked a computer for the dominant eigenvalues. This is a little tricky, since there are actually two linearly independent eigenvalueS. (The problem is that this matrix is not actually irreducible, since there is no path from vertex \(0\) to vertex \(4\) this is a dead-end). Here are the two eigenvectors.

\begin{align*} \amp \begin{pmatrix} 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{pmatrix} \amp \amp \begin{pmatrix} 0 \\ 0 \\ 0 \\ 0 \\ 1 \end{pmatrix} \end{align*}

These make sense. If the player is at the final losing or winning state, they stay there. However, what if they start in the middle? The final state will be some linear combination of these eigenvectors: some probability of the final winning and the final losing state. What linear combination? For this Markov model, I’ll have to rely on actually calculating the matrix action. What I’ll do is calculate the state after twenty iterations. That won’t be a perfect final result but will give a pretty good idea of what is going on.

For this fair game, the player can start with \(1\text{,}\) \(2\) or \(3\) stakes. I asked a computer to calculate the probabilities after twenty iterations for each of these starting points.

\begin{align*} \amp \begin{pmatrix} 1 \amp 0.5 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0.5 \amp 0 \amp 0.5 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.5 \amp 1 \\ \end{pmatrix}^{20} \begin{pmatrix} 0 \\ 1 \\ 0 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.7495 \\ 0.0005 \\ 0 \\ 0.0005 \\ 0.2495 \end{pmatrix} \\ \amp \begin{pmatrix} 1 \amp 0.5 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0.5 \amp 0 \amp 0.5 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.5 \amp 1 \\ \end{pmatrix}^{20} \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.4995 \\ 0 \\ 0.0010 \\ 0 \\ 0.4995 \end{pmatrix} \\ \amp \begin{pmatrix} 1 \amp 0.5 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0.5 \amp 0 \amp 0.5 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.5 \amp 1 \\ \end{pmatrix}^{20} \begin{pmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.2495 \\ 0.0005 \\ 0 \\ 0.0005 \\ 0.7495 \end{pmatrix} \end{align*}

This is something I can interpret. If I round to whole percentages (or even tenths), the situation is pretty clear. Starting with one stake, there is a \(25\%\) chance of winning and a \(75\%\) chance of losing. Starting with two stakes, there is an equal chance of winning and losing. Starting with three stakes, there is a \(75\%\) chance of winning and a \(25\%\) chance of losing. This seems pretty reasonable.

Now I can ask the interesting question: what is the effect of changing the individual round winning/losing probabilities on the final winning/losing percentages. If I make the game slightly unfair, say \(W = 0.48\) and \(L - 0.52\text{,}\) how much will the other percentages change? I can run the model again.

\begin{align*} \amp \begin{pmatrix} 1 \amp 0.52 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.52 \amp 0 \amp 0 \\ 0 \amp 0.48 \amp 0 \amp 0.52 \amp 0 \\ 0 \amp 0 \amp 0.48 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.48 \amp 1 \\ \end{pmatrix}^{20} \begin{pmatrix} 0 \\ 1 \\ 0 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.7787 \\ 0.0005 \\ 0 \\ 0.0004 \\ 0.2204 \end{pmatrix} \\ \amp \begin{pmatrix} 1 \amp 0.52 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.52 \amp 0 \amp 0 \\ 0 \amp 0.48 \amp 0 \amp 0.52 \amp 0 \\ 0 \amp 0 \amp 0.48 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.48 \amp 1 \\ \end{pmatrix}^{20} \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.5394 \\ 0 \\ 0.0010 \\ 0 \\ 0.4596 \end{pmatrix} \\ \amp \begin{pmatrix} 1 \amp 0.52 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.52 \amp 0 \amp 0 \\ 0 \amp 0.48 \amp 0 \amp 0.52 \amp 0 \\ 0 \amp 0 \amp 0.48 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.48 \amp 1 \\ \end{pmatrix}^{20} \begin{pmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.2802 \\ 0.0005 \\ 0 \\ 0.0005 \\ 0.7188 \end{pmatrix} \end{align*}

I changed the fairnes of the game by \(2\%\text{.}\) The effect on the final winning percentages is about \(3\%\) starting with one or three stakes, and \(4\%\) for starting with two stakes. The effect is slightly exaggerated.

What about if I make the game very unfair. Let me set \(W = 0.35\) and \(L = 0.65\) and run the system again.

\begin{align*} \amp \begin{pmatrix} 1 \amp 0.65 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.65 \amp 0 \amp 0 \\ 0 \amp 0.35 \amp 0 \amp 0.65 \amp 0 \\ 0 \amp 0 \amp 0.35 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.35 \amp 1 \\ \end{pmatrix}^{20} \begin{pmatrix} 0 \\ 1 \\ 0 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.9211 \\ 0.0002 \\ 0 \\ 0.0001 \\ 0.0786 \end{pmatrix} \\ \amp \begin{pmatrix} 1 \amp 0.65 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.65 \amp 0 \amp 0 \\ 0 \amp 0.48 \amp 0 \amp 0.65 \amp 0 \\ 0 \amp 0 \amp 0.35 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.35 \amp 1 \\ \end{pmatrix}^{20} \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.7750 \\ 0 \\ 0.0004 \\ 0 \\ 0.2247 \end{pmatrix} \\ \amp \begin{pmatrix} 1 \amp 0.65 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.65 \amp 0 \amp 0 \\ 0 \amp 0.35 \amp 0 \amp 0.65 \amp 0 \\ 0 \amp 0 \amp 0.35 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.35 \amp 1 \\ \end{pmatrix}^{20} \begin{pmatrix} 0 \\ 0 \\ 0 \\ 1 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.5035 \\ 0.0003 \\ 0 \\ 0.0002 \\ 0.4960 \end{pmatrix} \end{align*}

Again, the result are exagerated. A game that is \(65\%\) to lose in each round leads to a \(78\%\) chance of losing overall when starting in the middle.

Subsection 11.3.6 The Gambler’s Ruin

Now I am going to return to the fair game, with \(W = L = 0.5\text{.}\) Instead of changing the fairness of the game, I’m going to change the stopping condition. Let’s say the players always start swith \(2\) stakes, but I’ll change the number of stakes at which the player stops playing. In the original game, having \(2\) led to a winning percentage of \(50\%\text{.}\) Let me increase the stopping condition to five stakes. Here is the new matrix, with the action (with twenty iterations) on the starting vector with \(w\) stakes.

\begin{equation*} \begin{pmatrix} 1 \amp 0.5 \amp 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0.5 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0.5 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0 \amp 0.5 \amp 1 \end{pmatrix}^{20} \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.5924 \\ 0 \\ 0.0104 \\ 0 \\ 0.0065 \\ 0.3907 \end{pmatrix} \end{equation*}

This hasn’t entirely stabilized after twenty iterations, but I can see that the winning percentage has dropped from \(50\%\) to somewhere around \(39\%\text{.}\) Now I will increase the stopping condition to six stakes and rerun the system (I’ll need to go to thirty iterations to get it to stabilize to a tenth of a percent).

\begin{equation*} \begin{pmatrix} 1 \amp 0.5 \amp 0 \amp 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0.5 \amp 0 \amp 0.5 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0.5 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0.5 \amp 0 \amp 0.5 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0 \amp 0.5 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0 \amp 0 \amp 0.5 \amp 1 \end{pmatrix}^{30} \begin{pmatrix} 0 \\ 0 \\ 1 \\ 0 \\ 0 \\ 0 \\ 0 \end{pmatrix} = \begin{pmatrix} 0.6600 \\ 0 \\ 0.007 \\ 0 \\ 0.007 \\ 0 \\ 0.3267 \end{pmatrix} \end{equation*}

Now the winning percentage has dropped again to about \(33\%\text{.}\)

I can continue this process (though doing the calculations, even by computer, becomes pretty tedious; however, I hope this is enough to make the point). If I play a fair game with fixed starting stakes, as the stopping point increases, the winning chance from the fixed starting point drops. In the limit (as the stopping condition goes to infinity) the winning chance will drop to zero. If I take the limit and let the stopping condition go to infinity, it actually doesn’t matter what the starting number of stakes is -- the winning percentage will still eventually drop to zero.

This is called the Gambler’s Ruin. It says that even if you play a perfectly fair game indefinitely, you will eventually lose with \(100\%\) probability.

Prev Top Next