The normal (or Gaussian) distribution

The continuous random variable X follows a normal distribution N(μ,σ), being μ its mean and σ its standard deviation, if it satisfies that:

  • It can take any real value: (,+)
  • The probability density function (pdf) follows a gaussian curve:

f(x)=1σ2πe12(xμσ)2

Example

Find the PDF of a continuous variable with mean 1,75 and standard deviation 0,2 and represent it.

What could this distribution well represent?

f(x)=10,22πe12(x1,750,2)2

imagen

The given mean and standard deviation make this variable to be a good model of the heights of men in Barcelona.

To interpret the graph it is necessary to understand the probability that the variable takes a certain range of values in the area below the pdf curve in the given range or interval.

  • The entire area of the pdf is 1:+f(x) dx=1
  • The pdf is symmetrical with respect to μ, that is to say, the area of each side of μ is 0,5. Or, in the previous example, the number of persons over 1,75m is the same as the number of people below the mean.μf(x) dx=μf(x) dx=12
  • Also the number of people taller than 1,75+a is equal to the number of people under 1,75a

μaf(x) dx=μ+af(x) dx

imagen

The standard normal distribution

The standard normal distribution is the one that has mean μ=0 and standard deviation σ=1:

N(0,1)

Its PDF is:

f(x)=12πex22

In the following graph we see its representation:

imagen

For the standard normal distribution it is possible to state:

0f(x) dx=0+f(x) dx=12af(x) dx=a+f(x) dx

And it also satisfies all the properties of an even function, f(x)=f(x). Since the integral previously shown does not have an analytical solution, we use tables to calculate it.

Next, we can see the table corresponding to the values of the PDF, that is to say:

p(Zz)

The first position of the table indicates the probability for the result of the experiment to give a value lower than zero (the average or mean) and we can see that this probability is 0,5. The table shows that the probability of a result under a given value z grows as z grows.

imagen

To read the table we must see that the column indicates the unit and the first decimal of z, while the row indicates the second decimal. Namely, in the first box of the first row the probability that we see is

p(Z0,00)=0,5000

while in the last box of the first row we see:

p(Z0,09)=0,5359

It can be seen that the table only gives the probabilities for positive values of Z. For values of Z<0 we will make use of geometry, as will be seen in the following examples.

It should be noted that after 3 (3 times the standard deviation) the probability is very close to one (0,9987). Symmetrically, for values close to 3, the probability will be almost zero.

Example

Find the probability of a random variable Z modelled as N(0,1) being lower than 0,94.

imagen

We have to look at the row of 0,9 and the column of 0,04:

p(Z0,94)=0,8264

Example

Find the probability of a random variable Z modelled as $N (0,1)beinggreaterthan0,94$$.

imagen

P(Z0,94)=AreatotalP(Z0,94)P(Z0,94)=1.0,8264=0,1736

Example

Find the probability of Z being between 0,94 and 1,14

imagen

p(0,94Z1,14)=p(Z1,14)p(Z0,94)p(0,94Z1,14)=0,87280,8264=0,0465

Converting the standard normal distribution to any other normal distribution

What must we do to work with a normal distribution different from N(0,1)?

If Z is a random variable N(0,1) and X is N(μ,σ), they are related by the following expression:

Z=XμσX=σZ+μ

Example

We have a random variable with mean 4 and standard deviation 2.

What is its probability to be greater than 6,21?

p(X6,21)=p(σZ+μ6,21)p(X6,21)=p(Z6,2142)=p(Z1,105)p(Z1,105)=1p(Z1,105)=10,8531=0,1469

This way, only the tables for the standard normal N(0,1) will be necessary.

Approximation of the binomial distribution from the normal distribution

For n rather large, calculating a binomial B(n,p) can turn out to be complicated. Then a normal distribution is used:N(μ=np,σ=npq)B(n,p)

So, to deal with the binomial distribution corresponding to 100 tosses of a coin, we can use:

N(1000,5,1000,50,5)=N(50,5)

Thus, we can avoid the calculation with high exponents that the binomial distribution would require and we will be able to use the tables for the normal N(0,1).