Version 0.2
This is an edited transcript of the data analyses performed during the lectures of the Empirical Methods in Software Engineering (EMSE) course.
This file has been generated from an R Markdown source file: AnalysisExample.Rmd
.
The analysis is performed on the data collected through a questionnaire1 filled in by the students at the beginning of the first lecture.
library(knitr)
data = read.csv("EmpiricalMethodQuest.csv",stringsAsFactors=F)
data$Q3 = factor(data$Q3,levels=c("Never heard of","Basic","Good","Expert"))
data$Q4 = factor(data$Q4,levels=c("Never heard of","Basic","Good","Expert"))
twenty.thirty = data$Q5 == "20/30"
data$Q5[twenty.thirty] = 25
data$Q5 = as.numeric(as.character(data$Q5))
data$Q6 = factor(data$Q6,levels=
c("It is the way to go","It is interesting","It is complex","It is useless"))
The question was: Did you know what the scientific method was?
The distribution of answers is
table(as.data.frame(data$Q1))
##
## No Yes
## 3 12
A better representation in RMarkdown is
Response | Freq |
---|---|
No | 3 |
Yes | 12 |
It can be represented also using a bar chart
barplot(table(data$Q1))
text(c(.7,1.9),Q1.t/2,Q1.t)
boxplot(data$Q5)
Let’s start with a simple (one sample) null hypothesis:
\(H_{0}\): \(\mu_{Q5} = 50\)
and the relative alternative hypothesis
\(H_{a}\): \(\mu_{Q5} \neq 50\)
The test works in this way:
we assume an \(\alpha=5%\) (=probability of type I error), that means a confidence level \(1-\alpha=95%\).
alpha <- .05
we compute the \(t\) statistic as:
\[ t = \frac{\bar{Q5}-\mu}{s{Q5}/\sqrt{n}}= \frac{49.6153846)-50}{21.6469338/\sqrt{13}} = -0.0640622\]
we assume the null hypothesis is true (\(\mu = 50\)), then
\(t\) will be distributed according to a Student’s t distribution with $df=n-1=12
as a comparison, the values having a probability smaller of equal to \(\alpha\) correspond to a a \(t_{critical}\) computed from the t distribution:
t.crit <- qt(1-alpha/2,df=n-1)
t.crit
## [1] 2.178813
the probability of having such an (absolute) extreme value or larger is:
p.value <- (1-pt(abs(t.Q5),df=n-1))*2
p.value
## [1] 0.9499755
the decision about rejecting or not the null hypothesis can be taken
on the basis of the critical value: reject if \(|t_{Q5}| > t_{critical} = 0.0640622 > 2.1788128\)
on the basis of the confidence level v: reject if \(|p.value| < \alpha = 0.9499755 < 5%\)
The procedure above is performed by the function t.test
:
t.test(data$Q5,mu=50)
##
## One Sample t-test
##
## data: data$Q5
## t = -0.064062, df = 12, p-value = 0.95
## alternative hypothesis: true mean is not equal to 50
## 95 percent confidence interval:
## 36.53427 62.69650
## sample estimates:
## mean of x
## 49.61538
If \(H_a\) simply says the two means will be different, but doesn’t predict a direction to the difference, then you would use the default form of t-test (two tailed).
If \(H_a\) predicts a difference in a particular direction (the mean being larger than a reference value), then you would use a one-tailed t-test.
For instance we could perform the test for the following hypothesis:
\(H_{0}\): \(\mu_{Q5} = 50\)
t.test(data$Q5,mu=30,alternative="greater")
##
## One Sample t-test
##
## data: data$Q5
## t = 3.2672, df = 12, p-value = 0.003369
## alternative hypothesis: true mean is greater than 30
## 95 percent confidence interval:
## 38.91492 Inf
## sample estimates:
## mean of x
## 49.61538
The p-value of a one tailed test is typically twice that of the equivalent one tailed one.
t.test(data$Q5,mu=30,alternative="greater")
##
## One Sample t-test
##
## data: data$Q5
## t = 3.2672, df = 12, p-value = 0.003369
## alternative hypothesis: true mean is greater than 30
## 95 percent confidence interval:
## 38.91492 Inf
## sample estimates:
## mean of x
## 49.61538
We want to compare the responses to Q5 based on the response given to question Q2.
The two samples of responses can be visualized using a box plot:
boxplot( Q5 ~ Q2, data=data)
The two sample t-test is performed similarly to the one sample version:
t.test( Q5 ~ Q2, data=data)
##
## Welch Two Sample t-test
##
## data: Q5 by Q2
## t = 1.2118, df = 8.3143, p-value = 0.2589
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.14504 42.66885
## sample estimates:
## mean in group No mean in group Yes
## 56.42857 41.66667
The Wilcoxon signed rank test is the non-parametric equivalent to the t-test. It can be performed using the function wilcox.test
:
wilcox.test( data$Q5 ,mu=50)
##
## Wilcoxon signed rank test with continuity correction
##
## data: data$Q5
## V = 24.5, p-value = 0.7942
## alternative hypothesis: true location is not equal to 50
The two sample extension is the Mann-Whiteny U test, that can be executed using the same function:
wilcox.test( Q5 ~ Q2, data=data)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Q5 by Q2
## W = 30, p-value = 0.2169
## alternative hypothesis: true location shift is not equal to 0
It is possible to test the independence between two categorical variables using the \(\chi^2\) test.
The test operates on a contingency table that reports the combined frequencies2:
Never heard of | Basic | Good | |
---|---|---|---|
Never heard of | 0 | 2 | 0 |
Basic | 1 | 7 | 0 |
Good | 0 | 3 | 2 |
The \(\chi^2\) test compares the frequencies in the contingency table of the observed frequencies \(O\) to a table with expected frequencies, which can be computed on the basis of the marginals using the following formula:
\[E_{i,j} = \frac{O_{i,*} \cdot O_{*,j}}{N}\]
where:
E = (margin.table(O,1) %*% t(margin.table(O,2)))/sum(O)
Never heard of | Basic | Good | |
---|---|---|---|
Never heard of | 0.1333333 | 1.6 | 0.2666667 |
Basic | 0.5333333 | 6.4 | 1.0666667 |
Good | 0.3333333 | 4.0 | 0.6666667 |
The \(\chi^2\) statistic is computed as:
\[ \chi^2 = \sum_{i,j} \frac{(E_{i,j} - O_{i,j})^2}{E_{i,j}} = 5.28125\]
The statistic is distributed according to the \(\chi^2\) distribution with \(df = n - p = (r-1)\cdot(c-1)\) degrees of freedom, where
The observed \(\chi^2\) statistic value (5.28125) has to be compared to the critical value for the predefined \(\alpha\) levels (5%) \(\chi^2_{critical}=11.1432868\)
Alternatively we can directly compare the p-value that is 0.2596373 to the reference \(\alpha\) (5%).
In presence of 2 x 2 contingency tables it is possible to use the Fisher exact test. The test is based on the total number of possible permutation (keeping the observed marginals) that are more extrem (in terms of odds ratio) w.r.t the observed table.
We start with a 2x2 table, e.g. the one reporting the frequency of the observed combinations of Q1 and Q2:
No | Yes | |
---|---|---|
No | 3 | 0 |
Yes | 4 | 8 |
The Fisher exact test checks the null hypothesis that the Odds Ratio is equal to 1.
The test can be performed using the fisher.test
function:
fisher.test(table(data$Q1,data$Q2))
##
## Fisher's Exact Test for Count Data
##
## data: table(data$Q1, data$Q2)
## p-value = 0.07692
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.5349074 Inf
## sample estimates:
## odds ratio
## Inf
The questionnaire consisted of the following items and the relative possible responses.
Considering you knowledge before the previous lecture on the experimental method:
Q1. Did you know what the scientific method was?
Yes
No
Q2. Did you know the key role of falsification in the scientific method?
Yes
No
Q3. What was your knowledge of the logic argumentation?
Q4 What was your knowledge of statistical hypothesis testing?
In general, thinking about the experimental method:
Q5. In the articles you will write for your PhD work, how often do you plan to use hypothesis testing?
Q6. What is you opinion about the empirical method?
In this case we excluded from the contingency table the level Expert
because it never occurred↩