Hypothesis testing:

Mayuri Karne
5 min readFeb 6, 2021

--

Most data scientists come across this term in their journey, but fails when it comes to doing it practically. In this blog, I will cover some hypothesis testing, their assumption types, and their use cases.

So first basic question what is a hypothesis?

“ It a supposition or proposed explanation made on the basis of limited evidence as a starting point for further investigation”

What is hypothesis testing and why we need it?

Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data. Hypothesis Testing is basically an assumption that we make about the population parameter.In simple words we make a Yes (Significant) or No (Not Significant) decision using Statastics using a sample of population data to check significance between features.we have to make decisions about the hypothesis. These decisions include deciding if we should accept the null hypothesis or if we should reject the null hypothesis. Every test in hypothesis testing produces the significance value for that particular test. In Hypothesis testing, if the significance value of the test is greater than the predetermined significance level, then we accept the null hypothesis. If the significance value is less than the predetermined value, then we should reject the null hypothesis.

Two types of hypothesis :

  1. composite : when hypothesis specifies the range of the values
  2. Simple : when hypothesis spcifies the an exact value of the parameter then it is simple hypothesis

Some basic term you should know before jumping to hypothesis testing :

Null hypothesis (H0) :A null hypothesis is a theory that assumes there is no statistical importance between the two variables in the hypothesis.

H0 = u1 -u2 (There is no differnce between two population mean )

Alternative Hypothesis(H1) :The alternative hypothesis complements the Null hypothesis. It is opposite of the null hypothesis such that both Alternate and null hypothesis together cover all the possible values of the population parameter.

Level of significance / P-value:

Refers to the degree of significance in which we accept or reject the null-hypothesis. 100% accuracy is not possible for accepting or rejecting a hypothesis, so we therefore select a level of significance that is usually 5%.

Type I error:

When we reject the null hypothesis, although that hypothesis was true. Type I error is denoted by alpha.In hypothesis testing, the normal curve that shows the critical region is called the alpha region.

Type II errors

When we accept the null hypothesis but it is false. Type II errors are denoted by beta. In Hypothesis testing, the normal curve that shows the acceptance region is called the beta region.

One tailed test and two tailed tests :

If the alternate hypothesis gives the alternate in both direction ( less than and grater than ) of the value of the parameter specified in null hypothesis , it is called Two tailed test .

If the alternate hypothesis gives alternate only in one direction (either less that or grater than ) of the value of the parameter specified in null hypothesis .

e.g if H0 = mean 100 , H 1 : mean not equal to zero

Accroding to H1 , mean can be greater than or less than 100 .This is the example of two tailed t test

Similarly , if H0 : mean ≥100 then H1 : mean < 100

Here , mean is less than 100 , it is called as One tailed test.

Critical Region:

The critical region is that region in the sample space in which if the calculated value lies then we reject the null hypothesis.

lets understand with example :

Suppose you are looking to rent an apartment. You listed out all the available apartments from different real state websites. You have budget of Rs. 15000/ month. You cannot spend more than that. The list of apartments you have made have price ranging from 7000/month to 30,000/month.You select a random apartment from the list and assume below hypothesis:

H0: You will rent the apartment.

H1: You won’t rent the apartment.

Now, since your budget is 15000, you have to reject all the apartments above that price.

Here all the Prices greater than 15000 becomes your critical region. If the random apartment’s price lies in this region, you have to reject your null hypothesis and if the random apartment’s price doesn’t lie in this region, you do not reject your null hypothesis.The critical region lies in one tail or two tails on the probability distribution curve according to the alternative hypothesis. Critical region is pre-defined area corresponding to a cut off value in probability distribution curve. It is denoted by α.

Case 1) This is double tailed test.

Case 2) This scenario is also called Left-tailed test.

Case 3) This scenario is also called Right-tailed test.

Steps involved in Hypothesis testing

1) Setup the null hypothesis and the alternate hypothesis.

2) Decide a level of significance i.e. alpha = 5% or 1%

3) Choose the type of test you want to perform as per the sample data (z test, t test, chi squared etc.)

4) Calculate the test statistics (z-score, t-score etc.) using the respective formula of test chosen

5) Obtain the critical value for in the sampling distribution to construct the rejection region of size alpha using z-table, t-table, chi table etc.

6) Compare the test statistics with the critical value and locate the position of the calculated test statistics i.e. is it in rejection region or non-rejection region.

7) I) If the critical value lies in the rejection region, we will reject the hypothesis i.e. sample data provides sufficient evidence against the null hypothesis and there is significant difference between hypothesized value and observed value of the parameter.

II) If the critical value lies in the non- rejection region, we will not reject the hypothesis i.e. sample data does not provide sufficient evidence against the null hypothesis and the difference between hypothesized value and observed value of the parameter is due to fluctuation of the sample.

--

--

Mayuri Karne
Mayuri Karne

No responses yet