Validity and reliability
Last updated
Was this helpful?
Last updated
Was this helpful?
This is not ready yet
There are several ways to increase and measure the validity of your survey. of your survey improves its validity. You can also add extra bits to your survey, like a control question or adding the "Other" option. Finally, statistical analysis of the survey results allows you to measure its validity.
Survey results can be reliable but invalid.
For example, if a set of weighing scales consistently measured the weight of an object as 500 grams over the true weight, then the scale would be very reliable, but it would not be valid (as the returned weight is not the true weight). For the scale to be valid, it should return the true weight of an object. This example demonstrates that a perfectly reliable measure is not necessarily valid, but that a valid measure necessarily must be reliable. (source: )
Reliable results mean that the results are consistent. They are reproducible and have a low rate of errors.
Validity is "". Even if the results are , your survey can still be invalid if its results are not an answer to the .
Your first concern must be: will the survey result tell you what you want to know? To make sure, you must and ensure the to the participant. You can only know for certain if you beforehand.
Think about what you want to know and who you want to know it from. If you want to know how experienced users feel about certain features, recruit experienced users. If you want to know from accounting experts how they feel about your new accounting app's features, recruit accounting experts.
It sounds obvious, but I've seen Kano surveys that were open to everyone and his dog. I don't want to be the manager who makes decisions based on the outcome of those surveys.
One way of assuring yourself of the validity of the results is by adding a control question to your survey. If there's a feature you're absolutely certain is a Must-Be feature, add it to the survey. Be absolute sure it is a Must-Be feature however. Don't use it as a control question if you can think of any reason why someone may give a pair of answers that categorize the features differently.
If after you've done the survey, your control question turns out to be a Must-Be feature indeed, you'll have more confidence about the validity of the other features' categories.
Are they really? Show examples of statistical significance where numbers seem all over the place
Three major reasons:
Results will be invalid if they are useless to you to begin with.
TODO
z-test and t-test opzoeken.
Lee and Newcomb (1997) state that there needs to be a minimum difference of 6% between the top two categories for the survey results to be statistically significant. Statistical significance means that there the result is not the same as a random distribution, but a usable indication of customer attitudes towards the feature.
Looking back at the our example survey, this means that for feature 1, the category strength is 4%. 10 out of 25 participants attributed the One-Dimensional category to the feature (40%), while 9 out of 25 (36%) attributed it the Must-Be category. The difference is 4%, and that’s too little to confidently determine that the feature is a Must-Be feature.
Reliability refers to the scoring consistency among groups of participants. Many studies use Cronbach's alpha to measure item (question) consistency.
You can however apply the test to customer satisfaction coefficients.
Also check this answer by Claude.AI
Certainly, I'd be happy to elaborate on each of these statistical tests that could be applied to Kano survey data. Let's break them down:
Chi-square test:
The chi-square test for goodness of fit can be used to determine if the observed distribution of responses across Kano categories differs significantly from what would be expected by chance.
How it works: It compares the observed frequencies in each category to the expected frequencies if there were no preference.
When to use: When you want to test if certain Kano categories are more prevalent than others beyond random chance.
Example scenario: You might hypothesize that for a given feature, the responses are not evenly distributed across all Kano categories.
Confidence Intervals:
Confidence intervals provide a range of plausible values for the true proportion of responses in each Kano category.
How it works: It estimates a range within which the true population proportion is likely to fall, based on your sample data.
When to use: When you want to estimate the precision of your results and account for sampling error.
Example scenario: You might report that 30% of respondents classified a feature as "Attractive," with a 95% confidence interval of 25% to 35%.
Fisher's Exact Test:
This test is used to determine if there are nonrandom associations between two categorical variables.
How it works: It calculates the probability of getting the observed (or more extreme) results if there were no real association between the variables.
When to use: When comparing Kano results between two groups, especially with smaller sample sizes.
Example scenario: You might use this to compare if there's a significant difference in how two customer segments (e.g., young vs. old) categorize a particular feature.
Chi-square test for independence:
Similar to Fisher's exact test, but typically used for larger sample sizes.
How it works: It compares the observed frequencies in a contingency table to what would be expected if there were no association between the variables.
When to use: When comparing Kano results across multiple groups or features.
Example scenario: You could use this to test if there's a significant relationship between demographic variables (e.g., age groups, gender) and Kano classifications for a set of features.
It's important to note that while these statistical tests can add rigor to your analysis, they should be used thoughtfully in the context of Kano model results. The Kano model is primarily a qualitative tool, and overreliance on statistical significance could potentially overshadow important qualitative insights.
Would you like me to go into more detail on how to apply any of these tests to Kano survey data, or discuss their limitations in this context?
If you have added "Other" as a , you can use the number of "Other" responses as a measure of confidence. Kano 2001 states that "if the number of "Other" responses does not exceed 1% for every survey item, it can be certified that the survey results are extremely confident".
There is a fundamental problem with this approach. One of the parameters in the calculation of Cronbach's alpha is the average standard deviation between answers. But the answers on a Kano survey are . You cannot assign a numerical value to an answer. One answer is not higher or lower than another. Using Cronbach's alpha to determine consistency and reliability is therefore wrong.