Saturday, September 17, 2016

EDX 6.008.1X Computational Probability and Inference Notes (Week 1)

EDX 6.008.1x Computational Probability and Inference Notes (Week 1)

Recently, I started doing the EDX 6.008.1x Computational Probability and Inference online course on EDX. This course is offered as part of the MITx and is similar to the MIT 6.008 Introduction to Inference course offered at MIT by Prof. Polina Golland. As a way to motivate myself keeping on track with the course, I decided to save my notes on the course content and publish them online. I hope you they can be useful for other students who are doing the course now, or in later sessions, and for anyone looking for information about topics related to the course subjects.

Course topics:

The course provides an introduction to probability theory and probabilistic graphical models and how to use to perform inferences. Probabilistic models are used in different application domains including information retrieval, data mining, computer vision, natural language processing, financial analysis, robotics, medical diagnosis, …….. The list goes on and on.

In addition to the theoretical topics covered by the course, it also includes 3 mini projects and one final project in Python programming language.  The programming assignments and project were a strong reason for me to decide doing this course :)

Course Requirements

The course assumes good knowledge of calculus and being comfortable with python programming language. No prior knowledge of probability is required.

Week 1


What is probability?


Probability is the science of representing and computing with uncertainty. It appears in our everyday life, especially when making decisions. For example, In the morning you may decide to take an umbrella because there is high chance that it will rain today.

The goal of the course is to learn how to build computer programs that can perform reasoning under uncertainties by using probabilities.

Simplest example of probability is flipping a fair coin, fair coin is a coin that has equal chance of landing heads or tails when being tossed, the probability of head = ½ and the probability of tail is also ½.  But what does this mean ?

There is two interpretations for probability:

Frequentist interpretation: If you repeat the process of tossing the coin for N times (where N is a very large number, say 10,000 times), Approximately N/2 times the result will be heads and approximately N/2 times the result will be tails.

Bayesian Interpretation: It is hard to use the frequentist approach to explain the meaning of probability in some scenarios where the experiment cannot be repeated. For example, the probability that the a patient will die after being given a certain drug. Of course, this experiment cannot be repeated more than one time. Therefore, the frequentist approach does not help a lot in this scenario. The bayesian interpretation of probability is the that probability value is equal to the state of belief in the experiment outcome.
For example, the probability of heads when tossing a coin = ½ means that if you toss a coin once, then before tossing the coin your belief that the result will be heads is equal to ½.

Luckily, it does not make any difference which interpretation you choose to follow, because all probability laws are the same under the two interpretations.

Two ingredients to model uncertainty

When we represent an uncertain world, we call the process that we observe its result an experiment. To model the uncertainty of the experiment we need to define two things:

  • Sample Space : The set of all possible outcomes of the experiment. The sample space can be either finite or infinite set.
  • Probability of each outcome: Define a probability function that assigns a probability value for each possible outcome .
For example, in the experiment of tossing a fair coin:
  • Sample space:
  • Probability values:


Any subset of the sample Space is called an Event.

The probability of an event   is the sum of probabilities of the possible outcomes that belong to .

Programming Note: representing a probability model


The course recommends using the Python dictionary to represent a probability model, where the keys of the dictionary are the elements of the sample space and the values assigned to them are their probability values.
Example: representing the probability model of a fair coin.
model = {'heads' : 0.5, 'tails' : 0.5}

The following can be used to sample 10 outputs from the preceding model

import numpy as np

items, p_values = zip(*model.items())
samples = np.random.choice(a = items, size = 10, p = p_values)

Three axioms of probability

The probability value assignment has to comply with the following three rules (known as three axioms of probability)
  • If A and B are disjoint events ( ) , then

Random Variables


Random variables, strangely enough despite their name, are functions that map the experiment outcomes to another set of  values.
The course uses the python dictionary to represent a random variable that maps a finite sample space to another finite set of values.

weather_model = {'sunny': 0.7, 'rainy' : 0.25, 'snowy': 0.05}
random_variable_mapping = {'sunny': 1, 'rainy' : 1, 'snowy': 0 }