If a wholesaler sells to 500 stores and one store shows a 50% uptick in sales, how can the wholesaler determine if this uptick is significant or if it is expected for a few stores to randomly see an uptick of 50%?

1 Answer
Mar 20, 2016

There is no single simple answer. It depends on additional parameters that are not given.
See explanation below.

Explanation:

Important parameters that are not given in this problem are distribution of goods among stores and the number of customers buying in these stores.

Let's try to address a problem generally, and then we will make certain reasonable assumptions.

The distribution of goods among stores is related to probability of customers to buy goods in each specific store.
Assume that the probability of a single item to be bought at store S_1 is p_1, at store S_2 is p_2, ... at store S_i is p_i,... at store S_500 is p_500.

Assume further that the total number of items purchased is n.

Consider now a store S_i. Introduce a random variable xi_i that is equal to 1 when an item is bought at store S_i (with probability p_i) and is equal to 0 otherwise (with probability 1-p_i).
This is a Bernoulli random variable.
Its mathematical expectation is
E(xi_i)=1*p_i+0*(1-p_i)=p_i,
its variance is
Var(xi_i)=(1-p_i)^2*p_i+(0-p_i)^2*(1-p_i)=p_i(1-p_i),
its standard deviation is
sigma(xi_i)=sqrt(p_i(1-p_i))

The wholesaler has certain number n of items of his goods that he distributes among 500 stores. It's reasonable to assume that the number of items n is rather big to cover all stores and must be significantly higher than the number of stores.
For instance, if we are talking about bottles of soda, it must be thousands per store.

Consider now n random variables independent of each other and each distributed identically with xi_i:
xi_(i1), xi_(i2),...xi_(i n)
Here random variable xi_(ij) indicates whether jth item was bought at ith store.
Obviously, the sum of the above random variable is a random variable equal to the number of items bought at ith store:
eta_i=xi_(i1)+xi_(i2)+...+xi_(i n)

Let's analyse the distribution of probabilities of eta_i.
First of all, according to the Central Limit Theorem, this distribution should be very close to Normal.
Since it's a sum of independent identically distributed random variables, its expectation is a sum of expectations of its components and its variance is a sum of variances:
E(eta_i)=p_i*n
Var(eta_i)=p_i*(1-p_i)*n
sigma(eta_i)=sqrt(p_i*(1-p_i)*n)

It's time to make some additional assumption. To simplify the problem, let's assume that all stores are approximately equal in the number of customers who buy there. Therefore, the probability of a single item to be bought in store S_i is independent of store and, therefore, equal to 1/500=0.002.
That makes all eta_i to have the same distribution of probabilities - Normal with expectation E(eta_i)=0.002*n and standard deviation sigma(eta_i)~=0.0447*sqrt(n)

Let's say, we want to determine the probability of purchases in store S_1 (or any other fixed store for this matter) to be within reasonable limits around average with total number of items distributed among all stores n=10,000.
In this case
E(eta_1)=0.002*10000=20,
sigma(eta_1)~=0.0447*sqrt(10000)=4.47

According to the "rule of 2sigma", with 95% certainty we can say that deviation of the value of our random variable eta_1 from its mathematical expectation E(eta_1) should not exceed 2*sigma(eta_1)~=9, which is slightly less than 50% of its average value 20.
So, under the condition of equal probabilities of purchase in different stores p_i=1/500 and about 10,000 items purchased in all stores combined, the probability of the number of items purchased in store S_1 (or any other fixed store) not to exceed 50% of average is greater than 95%.

The second part of this problem is related to probability of ANY store purchase not to exceed 50% of its average. With certain degree of precision it can be calculated as the product of corresponding probabilities in EACH store.
To achieve 95% certainty that number of purchases in any store would not exceed 95%, we need the probability of each store to be
0.95^(1/500)~=0.9999=99.99%

To achieve this probability for each store we need the number of purchases to be very high. "Rule of 3sigma" states that Normal random variable takes values not further than 3sigma from its average with probability 99.7%. To achieve 99.99% certainty we have increase the interval around average to 6sigma.

Thus, with n=100,000 we have
E(eta_1)=0.002*100000=200
sigma(eta_1)~=0.0447*sqrt(100000)=14.14
6sigma(eta_1)~=85,
which is about 43% of the average, so it's sufficient to have 100,000 items to distribute to make sure that none of the store would have more than 50% extra purchases with certainty of 95%.

If, evenly distributing 100,000 items among 500 relatively equivalent (in average number of purchases) stores, at least one store exceeded its sale by more than 50%, something abnormal and unexpected happened.

Please refer to Unizor for details on probabilities and statistics.