Tails in Hypothesis Testing
Hypothesis Testing Overview
I learned Hypothesis Testing in R from Datacamp this afternoon. To be frank, I found myself struggling to catch up with grasping the idea of hypothesis testing. But I will try to digest what I have learned in this writing.
As Datacamp illustrates it, hypothesis testing is like a criminal trial: the defendant is assumed to be NOT GUILTY as the default verdict. The defendant will be pronounced GUILTY if the evidence is undoubtedly proving crime. These illustrations are not sticking in my head. So I tried to come up with another way of understanding it.
Hypothesis testing, as I understand it, is a way of deciding whether something is ACCEPTED or REJECTED. Think of it as inquiring. Imagine you are in a situation where you forgot where you put your glasses. You start to guess where your glasses is. The moment you are guessing, that is hypothesis testing. You might guess you left your glasses on the table. To see if this is true, you walked to the table and searched there. Unfortunately, your glasses is nowhere to be found on the table. Let’s pause searching your glasses for a moment.
You asked youself where you put your glasses, that is the equivalent of RESEARCH QUESTION. Then, you started to guess where your glasses might be. This is the equivalent of HYPOTHESIS. When you are hypothesizing, there is only two possible outcomes, it’s either there or not there. Since you are hypothesizing that “my glasses must be on the table”, this is the NULL HYPOTHESIS (\(H_0\)), and the inverse of your hypothesis is “my glasses is not on the table” and this is called the ALTERNATIVE HYPOTHESIS (\(H_a\)). So, you tried to prove it by walking and searching your glasses on the table. To no avail, you found nothing. This means that your null hypothsis (\(H_0\)) is REJECTED. Therefore, you started to look somewhere else. This whole process is called HYPOTHESIS TESTING.
In hypothesis testing, the way you compose the research question determines how the null and alternative hypotheses appears. If you believe that something to be “there”, then the null hypothesis (\(H_0\)) would be: “there is”. Likewise, since you already have your null hypothesis (\(H_0\)), your alternative hypothesis (\(H_a\)) would be: “there is no”. If we are wondering something to be negative, then the null hypothesis (\(H_0\)) would be: “there is no”, and for the alternative hypothesis (\(H_a\)) would be: “there is”.
The \(z\)-score and \(p\)-value
The term \(z\)-score may haunt anyone who is not accustomed to statistics. In fact, it also frightened me. Well, actually, as the time I write this, I’m still frigghtened by this so called \(z\)-score. But I do know how to get this \(z\)-score. Allow me to elaborate here.
Bootstrap Distribution
First, we need to sample the data from our table/dataset. How many do we need? Good question. We need to sample as many as the number of observation that is available in our dataset. So, if we have 1500 observations, we sample it as many as 1500. Now, you might be wondering “why do we have to sample 1500 times when the dataset is 1500 observations, isn’t just taking all the population?”. Alright, the 1500 samples are with replacement. So, there will be cases where we pick the same observation point during the sampling. Note that we tell the computer to sample the data, not us picking the observation. An example for this “with replacement sampling” would be that of picking observation number 5 thrice out of 6 possible outcomes. A better illustration would be:
pool: 1, 2, 3, 4, 5, 6
pick: 6
sampling: 1, 5, 5, 3, 5, 6
This is what “with replacement” looks like. Instead of picking 1, 2, 3, 4, 5, 6; we could get 5 thrice. On the other hand, if we, say, manually pick 1, 2, 3, 4, 5, 6 then this is not “with replacement sampling”, that is “without replacement sampling”.
Now, the sampling result 1, 5, 5, 3, 5, 6 in the previous example is then aggregated using mean: we sum it up and then divided it by the number of the datapoint that we picked.
The resul of the mean aggregation is what we actually need. Notice that from 5 data points we ended up with only a single data point: the mean. This process is only one-time sampling. We need more. After sampling once, we then repeat the sampling process as much as we need. Usually we take 1000 times or 5000 times.
If you are familiar with R programming (like I do), below is a code snippet of creating a bootstrap distribution. I wrapped “the mean of sampling with replacement” inside a replicate function to repeat the sampling process 5000 times (specific to this example; other casees may differ).
bootstrap_distribution <- replicate(n = 5000,
expr = {
table %>%
slice_sample(prop = 1, replace = TRUE) %>%
summarize(mean(x)) %>%
pull(x)
}
)
Enjoy Reading This Article?
Here are some more articles you might like to read next: