An important part of probability theory concerns with the sums of independent random variables. In many situations, the number of terms in the independent sum is itself a random variable.
Introduction
Let be a sequence of independent and identically distributed random variables with the common cumulative distribution function (CDF)
. We consider the sum
…(1)…
where the number of terms is a random variable independent of the
. The random variable
is said to be a compound random variable and its distribution is a compound distribution. To make the compounding more mathematically tractable, we assume the distribution of
does not depend in any way on the values of
.
Here, is a count variable counting the occurrence of a certain type of events. When
, the variable
,
, represents some measurement we wish to record for the
th occurrence of the event. A concrete example is from an insurance perspective. Let
be the number of insurance claims occurring in a fixed time period (e.g., a year) on a group of insurance contracts or even a single contract. Let
be the payment made by the insurer on the
th claim. Then compound random variable
as defined in (1) is a model for the total payments by an insurer. In this insurance application, we assume that the claim size variables
are independent and identically distributed and that the claim count
and the claim sizes
are independent. The random variable
in the context of total insurance payments is often referred to as the aggregate claim random variable or aggregate loss random variable. See here for a more concise introduction of the notion of compound distributions.
We focus on some of the mathematical properties of the compound random variables . We first examine an example before proceeding further.
Example 1
Consider the random variable with probabilities
,
and
. Note that
has a binomial distribution with 2 trials and probability of success
. The independent random variables
are uniformly distributed on the interval
. When we define
according to (1),
would be an insurance payment model with the claim count modeled by a binomial distribution and the claim size modeled by the uniform distribution on the unit interval
.
Because there can be only 0, 1 or 2 terms, it is quite easy to visualize the compound variable . When
,
(a single point mass). When
,
has a uniform distribution. When
,
is simply an independent sum of two unform random variables. The CDF of
in this case is the weighted average of the following three CDFs.
….
….
….
The CDF is for the point mass at 0, the case of probability 1 being assigned to
conditioning that
. The CDF
is simply the CDF of the uniform distribution reflecting the case of
, in which case
has the uniform distribution. The CDF
is that of the independent sum of two uniform distributions on the interval
. The CDF of the compound
is the weighted average of these CDFs.
….(*)….
The explicit description of the CDF is
….
Note that the support of is the interval
. In the interval
, we take the weighted average of 1 (with probability
),
(with probability
) and
(with probability
). In the interval
, we take the weighted average of 1 (with probability
), 1 (with probability
) and
(with probability
). Also note that the distribution of
is neither a continuous nor a discrete distribution. It is a mixed distribution. It has a point mass at
with probability
(note the jump of size
in the CDF). This point mass corresponds with the case of
. It also has a positive density curve in the interval
with no points in this interval having positive probability. The probability density function of
is
….
Once the CDF and the density function are known, many distributional quantities of can be obtained. For example, the mean and variance of
are
and
.
CDF of the Compound Random Variable
According to the idea outlined in (*) in Example 1, the CDF of a compound random variable is the weighted average of the CDFs of the independent sums of the variables. The CDF of the compound random variable as defined in (1) is
….(2)….
where is the CDF of the independent sum
for all
and
is the CDF of a single point mass at
defined below.
….
The CDF is called the
-fold convolution and defined recursively. In the next section, we discuss briefly the notion of convolution.
Convolution – Continuous Case
In probability theory, convolution is a mathematical operation that allows us to derive the distribution of a sum of two independent random variables from the two individual distributions. We consider the continuous case in this section and the discrete case in the next section.
Suppose and
are independent continuous random variables probability density functions (PDF)
and
and cumulative distribution function (CDF)
and
, respectively. The PDF and the CDF of the independent sum
are
….(3)….
….(4)….
The following gives the derivation of the first form of the CDF in (4). The second form is derived analogously.
….
Even though the integral in (4) is stated to be from to
, in practice it only needs to be over the support of
, which could be some subinterval of
. The PDF in (3) is obtained by differentiating the CDF in (4).
….
The convolution of more than two independent random variables is derived recursively. For example, the distribution of the sum of three independent random variables is the convolution of the distribution of the first random variable and the distribution of the sum of the second and third variables. Analogously, the distribution of the sum of independent random variables is the convolution of the first random variable and the distribution of the independent sum of the
random variables from the second to the
th one. Suppose
are independent and identically distributed with common CDF
and PDF
. We denote the CDF and PDF of the independent sum
by
and
, respectively. For
, they are
….(5)….
….(6)….
The results in (5) and (6) follow from (3) and (4). If the random variables only take on positive values, then the convolution results in (5) and (6) can be simplified as the following.
….(7)….
….(8)….
The convolution results in (7) and (8) would be appropriate for the insurance context of total payments since the claim size would only take on positive values.
Convolution – Discrete Case
In the discrete case, we derive the convolution of two probability functions. Suppose that and
are two independent random variables taking on the integers (or some appropriate subset) with probability functions
and
, respectively. The probability function of the independent sum
is
….(9)….
The following gives the derivation of the first form of (9). the second form is derived analogously.
….
Once the probability function is obtained, the CDF of
is obtained by definition, i.e.,
. In the discrete case (as in the continuous case), the convolution of more than two random variables is also derived recursively. Suppose
are independent and identically distributed random variables with common probability function
. We denote the probability function of the independent sum
be
. For
, it is
….(10)….
Note that in (10) the summation ranges over the integer values. If the random variables only take on positive integers, then (10) can be simplified as
….(11)….
The convolution result in (11) would be appropriate in the insurance context of total claim payments. If the variables only take on non-negative integers, then the summation in (11) starts at
. Once the convolution result of the probability function
is derived, the CDF is defined accordingly,
….(12)….
Back to Distribution of the Compound Random Variable
The result (2) above shows that the CDF of the compound random variable is the weighted average of the convolution CDFs of independent sums of the
variables. The weighting is the probability function of the count random variable
. The CDF in (2) is applicable when the
random variable is discrete and when
is continuous. In the remainder of this discussion, we assume that
only takes on positive values if it is continuous and only takes on positive integers (when
is continuous). To repeat, the CDF of
is
….(13)….
Now that only takes on positive values or positive integers, note that when
is continuous, the convolution CDF
is based on (8). When
is discrete,
is based on (12). The PDF of
(when
is continuous) and the probability function of
(when
is discrete) are
….(14)….
….(15)….
Note that in both (13) and (14), there is a point mass at , modeling the case where there is no occurrence of the event of interest (
. Thus the size of the jump at the point mass in the CDF is
.
The mean and variance of the compound random variable are
….(16)….
….(17)….
Because CDF of is a weighted average of the CDFs of distributions, many distributional quantities such as mean and higher moments can be obtained by performing weighted averaging. Here’s the derivation of
.
….
The above derivation makes use of two levels of expectation, one for the random variable and one for the random variable
. It also makes use of the assumption that
and
are independent. The expected value of
has a natural interpretation. It is the product of the expected number of events (e.g., occurrences of claims) and the expected individual measurements (e.g., claim sizes).
The variance of also has a natural interpretation. It is the sum of two components such that the first component stems from the variability of
and the second component stems from the variability of
. See here for the derivation of (17). The moment generating function of
is
….(18)….
where is the moment generating function of
and
is the moment generating function of
. See here for a derivation.
More Examples
Example 2
We use the insurance context discussed above. The number of claims in a policy period for an insurance policy is modeled by the binomial distribution with
trials and probability of success
. The individual claim size
is modeled by the exponential distribution with mean
. Derive the CDF and the PDF of the aggregate payment
as well as the mean and variance and the moment generating function.
Since can only be 0, 1 or 2, the CDF of
is
….;….
The CDF is the weighted average of three quantities, 1 (with probability ), the CDF of the exponential distribution (with probability
), and the CDF of a gamma distribution that is the independent sum of two exponential distributions (with probability
). The PDF of
is
….
Note that there is a point mass at with probability
. In the continuous part, the density function is a weighted average of the exponential and gamma density functions. The mean and variance are calculated according to (16) and (17).
….
….
The moment generating functions of ,
and
are
….
….;….
….
As expected, the moment generating of the compound random variable is the weighted average of three appropriate moment generating functions. For a more detailed discussed of this example, see here.
Example 3
The compound random variable in both Example 1 and Example 2 can be completely described since the count variable can only take on 0, 1 or 2 and since the
variable is a familiar distribution that is mathematically tractable. We now present a compound distribution that cannot be completely described analytically. Let
be a Poisson random variable with mean
. Suppose that
is the random variable such that
and
. Let
be the total insurance payment variable
. The distribution of
would be a discrete distribution taking on the non-negative integers. Determine
for
.
Since has a Poisson distribution, the random variable
is said to have a compound Poisson distribution. See here for a discussion of this example.
Remarks
Compound distributions have many natural applications. Compound distributions also represent a versatile modeling tool. For further information on the topic of compound distributions, see these articles in a companion site: here (a more concise introduction), here (examples), here (compound Poisson distribution), here (example of compound Poisson), here (compound negative binomial distribution), here (examples of mixed distributions), here (compound Poisson mixed distribution), and here (Bayesian prediction).
Dan Ma Compound Distribution
Daniel Ma Compound Distribution
Dan Ma Compound Random Variable
Daniel Ma Compound Random Variable
Dan Ma Aggregate Loss Random Variable
Daniel Ma Aggregate Loss Random Variable
Dan Ma Convolution
Daniel Ma Convolution
Dan Ma Convolution of Probability Distributions
Daniel Ma Convolution of Probability Distributions
2025 – Dan Ma
Posted: March 21, 2025