Sonia Vieira: July 2024

Friday, July 12, 2024

Hypothesis Tests on a Single Standard Deviation

Chi-square distribution

If Z is a random variable with a standardized normal distribution (mean zero and variance 1), then Z² has a chi-square distribution with 1 degree of freedom. If Z₁, Z₂, ……, Z_k, are a set of independent and identically distributed standard normal variables, then the sum of their squares has a chi-square distribution with k degrees of freedom.

NOTE: Let X_i be independent and normally distributed random variables with mean µ and variance 1. Then (X_i - µ) has mean zero and has a chi-square distribution with k degrees of freedom.

EXAMPLES

1) Car speed is measured using a radar unit. In an urban area, the radar readings X_iare normally distributed with a mean of 25 mph and a standard deviation of 3 mph. A recent source claims that the standard deviation of his radar is 1. Imagine your manager wishes to test the hypothesis H₀: σ = 1 versus the alternative hypothesis H₁: σ ≠ 1. To carry out this test, your manager has asked you to take five speed measurements using a test car that was programmed to travel at 25 mph. This was done and the speed measurements you recorded were 25, 24, 27, 25, 26. Should the hypothesis H₀: σ = 1 be rejected at the significance level α = 10%?

Let's calculate the variance of X:

The variance is

and the standard deviation is

You've found out that the sum of squared deviations, Σ (X_i-µ)²equals 6. For a significance test at α = 10% and 5 degrees of freedom you will find in a chi-square distribution table:

The calculated value of 6 is inside the acceptance region. H₀: σ = 1 cannot be rejected.

2) Your manager was not pleased since he doubts that sigma can be less than 3. As a result, he insisted you should take a new sample to test the null hypothesis H₀: σ = 1 against the alternative hypothesis H₁: σ ≠ 1. What should you do? Obtain five new speed measurements using the same test car set to 25 mph, in the same conditions. Suppose the recorded speeds were 28, 27, 30, 28, 29. The hypothesis H₀: σ = 1 should be rejected at the significance level α = 10%?

Let's calculate the variance of X:

You found out that the sum of squared deviations, Σ (X_i-µ)²equals 63. For a significance test at α = 10% and 5 degrees of freedom you will find in a chi-square distribution table:

Since Σ (X- µ)²= 63 >11.07, the hypothesis the radar has standard deviation σ =1 was rejected at the 10% significance level. You have two samples and two different conclusions.

In fact, these two examples illustrate how random sampling variability can lead to different conclusions when testing a hypothesis with samples of the same size and the same significance level, from the same population in the same conditions. This occurs because sample statistics are not perfect estimates of their corresponding population parameters.

3) But your manager argued, pointing out that radar readings in urban areas usually follow a normal distribution with a mean speed of 25 mph and a standard deviation of 3 mph. He suggests taking another five measurements and test again the hypotheses H₀: σ = 1 against H₁: σ ≠ 1. Following this suggestion, you record five additional speeds: 29, 27, 29, 26, and 29. Should the hypothesis H₀: σ = 1 be rejected at a significance level α = 10%?

The numerator of the fraction should fall between 1.145 and 11.07 to support the hypothesis H₀: σ = 1, but 53 is unquestionably out. Therefore, you must reject the null hypothesis.

Are you satisfied with your manager's sugestions? Perhaps you should look the given examples from a different perspective. To test H₀: σ =1, what if you use the sample mean instead of the hypothesized mean µ = 25 mph? Let's calculate the sample mean and the sample variance:

By using the sample mean, the degrees of freedom are reduced to 1. Consequently, the sum of squared deviations follows a chi-square distribution with 4 degrees of freedom. Refer to the chi-square table to find the critical values for with 4 degrees of freedom.

Given that the sum of squared deviations equals 8 is less than 9.48, it is reasonable to conclude that at the 10% significance level, as claimed.

If you accept

You’ve made the assumption that the radar readings follow a normal distribution with a mean μ = 25 and a standard deviation σ = 1. Then follow a normal distribution with a mean μ = 25 and a standard deviation

The random sample of 5 observations you’ve collected has mean 28. Hence, you are able to test the hypothesis that the mean of the radar readings is 25 against the alternative hypothesis that it is greater than 25 using a z-test.

To calculate the z-score, we use the formula

Given that the z-score is greater than 1.645, you must reject the null hypothesis that the mean is 25 at 10 % level of significance.

Therefore, based on this sample of radar readings (29, 27, 29, 26, 29), it is reasonable to conclude that the new radar is not calibrated (mean μ = 25 and a standard deviation σ = 1).

Now, you could think: would a radar with σ = 3 produce measurements as above? If σ = 3, σ² = 9. How will the numerator of

behave? Remember: chi-square only tell what happens when E(X-µ)² =1. But here E(X-µ)² =9. What to do?

The new test statistic is:

where s² comes from the sample and σ² comes from H₀. The degrees of freedom associated with the test statistic (for finding the critical statistic) is (n-1). For this test to be valid, the population must be normally distributed. So, calculate

The test statistic chi square equals which is in the 95% region of acceptance: [0.4844: 11.1433]. H₀ cannot be rejected.

IMPORTANT: This is an exercise of statistics; it doesn’t tell you radar calibration procedures. It is therefore useful to have a set of accepted standards and protocols for maintaining the quality of radars.

Thursday, July 04, 2024

Normal distribution: exercises

The graphical representation of a normal distribution is a bell-shaped curve that is symmetrical about the mean. Thus, half of the values of the random variable X are equal to or greater than the mean and half are equal to or less than the mean. The curve covers 100% of the population. All possible values that the random variable can assume lie under the curve. A normal distribution is defined by two parameters: the mean, denoted by µ and the standard deviation, denoted by σ.

Graphical representation of the normal distribution

EXERCISES

1. On a certain road, the speed limit is 40 km/h with a tolerance of 7 km/h. The speed at which the driver travels on this road varies, with an average speed μ=40km/h and a standard deviation σ = 4 km/h. What is the probability that the driver will exceed 47 km/h and get a ticket? To determine this, we must standardize the variable (µ = 0, σ =1. Then we can consult a standardized normal distribution table, typically located at the end of statistical textbooks. Calculate:

Tip: You can utilize software to determine probabilities..

2. In regular operation conditions, radar readings is a random variable normally distributed with mean μ = 25 mph and standard deviation σ = 3 mph. To test the calibration of the radar, a test car traveling at 25 mph is used. Assuming the radar is correctly calibrated, i.e., with μ = 25 mph and σ = 3 mph, what is the probability that the radar detects the test car's speed to be: a) 28 mph or higher? b) 27½ mph or higher? c) At what speed should the radar record for the probability of a value exceeding this speed to be 5%?

a) Probability of detecting 28 mph or more.

b) Probability of detecting 27½ mph or more.

c) To have a 5% probability of the radar measuring a higher speed.

Remember that

More control means fewer mistakes. You need to keep this in mind when you want more quality. Compare the probabilities obtained in examples 2b, 3 and 4.

3. A calibration test will be carried out on 4 radars that operate together. A test car with a speed set at 25 mph will be used. Under optimal conditions, the speed of each radar is a random variable following a normal distribution with mean µ=25 mph and standard deviation σ=3mph. If the radars are calibrated, what is the probability that the collected measurements will give an average speed of 27½ mph or higher?

4. The idea of using 4 radars is good; however, having 9 is advantageous. Increased control (in this case, through a larger quantity of radars) decreases the risk of error. There are 9 calibrated radars in operation simultaneously. After calibration, the speed of each radar unit is a normally distributed random variable with mean µ = 25 mph and standard deviation σ = 3 mph. Using a test car traveling at a constant speed of 25 mph, what is the probability that the collected measurements will yield an average speed 27½ mph or higher?

OPERATIONS OF RANDOM VARIABLES

If X and Y are independent random variables with means µ₁ and µ₂ and variances σ₁² and σ₂² respectively, then:

Z = X + Y is a random variable with mean µ₁ + µ₂ and variance σ₁² + σ₂².

W = X – Y is a random variable with mean µ₁ - µ₂ and variance σ₁² + σ₂².

Var (a Z) = a² Var (Z)

σ (a Z) = a σ (Z).

EXERCISE

5. If we consider the subway arrival time as a random variable with mean µ₁ = 8h10m and standard deviation σ₁ = 40s, and your arrival at the station as a random variable with mean µ₂ = 8h08m and standard deviation σ₂ =30s, how likely is it that you will miss the train? Let S represent the subway arrival time and Y represent your arrival time at the station. Therefore, W = S - Y, your waiting time, is a random variable with a normal distribution, having an average of 8:10 – 8:08 = 2 m = 120 s, variance of 402 + 302 =1600 + 900 = 2500 s, and a standard deviation of 50. Missing the train would occur if W ≤ 0. To calculate the probability of W ≤ 0:

Decision rule

Set critical values to reject specific results. This is crucial for quality control.

EXERCISE

6. In the canning industry, the pH (the acidity) of the product in each can is a random variable that is normally distributed with mean 7 and standard deviation of 0.5. If the pH of the product falls below 6.0 or exceeds 8.0, the can is rejected. What is the probability of this happening?

Doutora em Estatística pela USP

Livre Docente em Estatística pela Unicamp

Pós doctor na Universidade da California, Berkeley e Universidade Yale.

Além de diversos artigos em revistas nacionais e estrangeiras, publicou os livros:

1. Pela Editora Elsevier: Introdução à Bioestatística (5ed), Bioestatística: tópicos avançados (3ed), Estatística para a Qualidade 3ed), Metodologia Científica para a Área de Saúde (3ed), este último em co-autoria com William Saad Hossne.

2. Pela Editora Atlas: Elementos de Estatística (6ed), Como elaborar um questionário.

3. Pela Editora Cencage Learning: Estatística Básica.

4. Pela Editora Brasiliense: O que é Estatística (3ed).

Estão esgotados: Experimentação com seres humanos (Moderna), Como escrever uma tese (Atlas), Análise de variância (Atlas), Primeiro a gente chora (Cultura).

Possui um domínio online, no qual estão disponibilizadas algumas aulas de estatística elementar: https://profasoniavieira.wixsite.com/estatistica