Calculate probability of 5 customers in 10 mins

The Poisson distribution is one of the most important probability distributions in statistics and data science. It has wide-ranging applications from biology to business whenever we want to model count data and rare events. In this guide, we‘ll take a comprehensive look at the Poisson distribution formula and how to use it to calculate probabilities. We‘ll also dive into some of the deeper mathematical intuition and provide Python code examples.

What is the Poisson Distribution?

The Poisson distribution models the probability of a given number of events occurring in a fixed interval of time or space, if these events occur with a known constant mean rate and independently of the time since the last event. Some key characteristics of the Poisson distribution:

  • It is a discrete probability distribution, meaning it defines probabilities for integer values only
  • The random variable is the number of occurrences (count) of an event in a given time or space interval
  • The expected number of occurrences in the given interval is designated lambda (λ)
  • Each occurrence is independent of the other occurrences

The classic textbook example of the Poisson distribution is the number of soldiers in the Prussian army killed by horse kicks each year. But there are many modern applications as well, such as:

  • Number of cars arriving at a toll booth in an hour
  • Number of patients arriving at an emergency room between 10 pm and midnight
  • Number of no-shows for airline flights
  • Number of typos per page in a book
  • Number of emails you receive in a day

In general, the Poisson distribution is applied to situations where we are counting the occurrences of rare events in a large population.

Poisson Distribution Formula and Assumptions

The probability mass function (PMF) for the Poisson distribution is:

P(X = k) = (λ^k * e^-λ) / k!

where:

  • λ is the expected number of occurrences (the event rate)
  • e is the number 2.71828… (Euler‘s number)
  • k is the number of occurrences
  • k! is the factorial of k

The key assumptions behind the Poisson distribution:

  1. The occurrence of one event does not affect the probability that a second event will occur. That is, events occur independently.
  2. The average rate at which events occur is independent of any occurrences.
  3. Two events cannot occur at exactly the same instant; instead, at each very small sub-interval exactly one event either occurs or does not.
  4. The probability of an event in a small sub-interval is proportional to the length of the sub-interval.

If these assumptions hold, then the count data can be modeled with the Poisson distribution. Note that the Poisson distribution only has one parameter, λ, which is the expected number of occurrences in the given interval.

Relationship to Other Distributions

The Poisson distribution is closely related to other probability distributions:

  • It is a limiting case of the binomial distribution where the number of trials goes to infinity while the expected number of successes remains fixed.
  • The sum of independent Poisson-distributed random variables is also a Poisson-distributed random variable. So if X ~ Poisson(λ1) and Y ~ Poisson(λ2) then X+Y ~ Poisson(λ1+λ2).
  • The time between events in a Poisson process follows the exponential distribution with the same parameter.

These relationships allow us to derive certain properties of the Poisson distribution from these other distributions.

Applications and Examples

Let‘s look at a concrete example of applying the Poisson distribution. Suppose a certain fast food restaurant gets an average of 3 customers every 10 minutes between 2pm and 4pm. Let‘s model the number of customers arriving in any given 10 minute window as a Poisson random variable X with λ = 3.

We can use the Poisson PMF to calculate probabilities. For example, what is the probability that exactly 5 customers arrive in a 10 minute period?

P(X = 5) = (3^5 e^-3) / 5!
= (243
0.0498) / 120
= 0.101

So there is about a 10% chance that 5 customers arrive in a 10 minute window. We could also ask what is the probability of 10 or more customers arriving:

P(X ≥ 10) = 1 – P(X ≤ 9)
= 1 – ppois(9, 3) # using the ppois function in R or scipy.stats in Python
= 0.0008

Less than 1% chance of getting 10 or more customers in 10 minutes.

Here‘s how you could simulate this example in Python:

import numpy as np
from scipy.stats import poisson

lambda_ = 3
numcustomers = poisson.rvs(mu=lambda, size=1000)

prob5 = poisson.pmf(5, lambda)
print(f‘Probability of 5 customers in 10 mins: {prob_5:.3f}‘)

prob_10_ormore = 1 – poisson.cdf(9, lambda)
print(f‘Probability of 10+ customers in 10 mins: {prob_10_or_more:.4f}‘)

Result:
Probability of 5 customers in 10 mins: 0.101
Probability of 10+ customers in 10 mins: 0.0008

Some other examples where the Poisson distribution can be applied:

  • Modeling the number of arrivals per hour at a grocery store, where λ is estimated from historical data
  • Estimating the probability of a certain number of buses arriving at a bus stop in a 30 minute period
  • Predicting the number of network failures per day, based on past system performance
  • Calculating the probability of observing a specific number of radioactive decay events in a fixed time period

Estimating the Poisson Parameter

In most real-world applications, the Poisson parameter λ is not known and must be estimated from empirical data. Given a sample of count data, the maximum likelihood estimate (MLE) for λ is simply the sample mean.

For example, let‘s say we observe the number of people arriving at a store in 20 one-hour periods:
12, 15, 8, 10, 11, 13, 10, 15, 17, 9, 12, 7, 11, 12, 13, 17, 16, 18, 11, 14

import numpy as np

arrivals = np.array([12, 15, 8, 10, 11, 13, 10, 15, 17, 9, 12, 7, 11, 12, 13, 17, 16, 18, 11, 14])
lambda_mle = arrivals.mean()

print(f‘MLE estimate of lambda: {lambda_mle:.2f}‘)

Result:
MLE estimate of lambda: 12.60

So the MLE estimate for the average number of arrivals per hour is 12.6. We can then use this estimate in the Poisson PMF to model probabilities going forward.

Deeper Mathematical Details

Finally, let‘s dive a bit into the mathematical underpinnings of the Poisson distribution and derive the PMF. Imagine we divide time into a large number of small intervals, each of length Δt. We assume the probability of an event in each interval is λΔt, and the probability of more than one event is negligible.

The probability of k events in n intervals of length Δt can be modeled by the binomial distribution:

P(X = k) = C(n, k) (λΔt)^k (1 – λΔt)^(n-k)

where C(n, k) is the binomial coefficient. Now let Δt → 0 and n → ∞, while keeping the expected number of events nλΔt = λt constant. Using the limit definition of e^x, we can show the binomial PMF converges to the Poisson PMF:

       (λt)^k     

P(X = k) = ——- * e^(-λt)
k!

Intuitively, we are saying an event is unlikely in any given small sub-interval, but if we have enough sub-intervals we will see a random number of events that follow the Poisson distribution.

Conclusion

To summarize, the Poisson distribution is a probability model for count data that meets certain assumptions. It has a single parameter λ representing the expected number of events in an interval, which can be estimated from historical data. The Poisson distribution has wide-ranging applications and a close connection to other statistical distributions.

Hopefully this guide provided you a comprehensive overview, along with the mathematical intuition and some example Python code. The Poisson distribution is a powerful tool to have in your data science toolkit for modeling count data and rare events.

Similar Posts