Validity and reliability

This is not ready yet

There are several ways to increase and measure the validity of your survey. Careful preparation of your survey improves its validity. You can also add extra bits to your survey, like a control question or adding the "Other" option. Finally, statistical analysis of the survey results allows you to measure its validity.

The difference between validity and reliability

Survey results can be reliable but invalid.

For example, if a set of weighing scales consistently measured the weight of an object as 500 grams over the true weight, then the scale would be very reliable, but it would not be valid (as the returned weight is not the true weight). For the scale to be valid, it should return the true weight of an object. This example demonstrates that a perfectly reliable measure is not necessarily valid, but that a valid measure necessarily must be reliable. (source: Wikipedia)

Reliable results mean that the results are consistent. They are reproducible and have a low rate of errors.

Validity is "the degree to which a test measures what it claims, or purports, to be measuring". Even if the results are reliable, your survey can still be invalid if its results are not an answer to the real questions you want to see answered.

Improving validity before you administer your survey

Survey preparation

Your first concern must be: will the survey result tell you what you want to know? To make sure, you must formulate your questions well and ensure the possible answers make sense to the participant. You can only know for certain if you test your survey beforehand.

Selecting the right audience

Think about what you want to know and who you want to know it from. If you want to know how experienced users feel about certain features, recruit experienced users. If you want to know from accounting experts how they feel about your new accounting app's features, recruit accounting experts.

It sounds obvious, but I've seen Kano surveys that were open to everyone and his dog. I don't want to be the manager who makes decisions based on the outcome of those surveys.

Introducing a control question

One way of assuring yourself of the validity of the results is by adding a control question to your survey. If there's a feature you're absolutely certain is a Must-Be feature, add it to the survey. Be absolute sure it is a Must-Be feature however. Don't use it as a control question if you can think of any reason why someone may give a pair of answers that categorize the features differently.

If after you've done the survey, your control question turns out to be a Must-Be feature indeed, you'll have more confidence about the validity of the other features' categories.

Using "Other" as a measure of confidence

If you have added "Other" as a sixth choice for your questions, you can use the number of "Other" responses as a measure of confidence. Kano 2001 states that "if the number of "Other" responses does not exceed 1% for every survey item, it can be certified that the survey results are extremely confident".

Judging the validity of the results

https://www.mathsisfun.com/data/chi-square-test.html

Help, the answers are all over the place!

Are they really? Show examples of statistical significance where numbers seem all over the place

Three major reasons:

Segments in audience (use k-means to find out?

Determining validity

Results will be invalid if they are useless to you to begin with.

Category reliability

Category statistical significance

TODO

z-test and t-test opzoeken.

Category strength

Lee and Newcomb (1997) state that there needs to be a minimum difference of 6% between the top two categories for the survey results to be statistically significant. Statistical significance means that there the result is not the same as a random distribution, but a usable indication of customer attitudes towards the feature.

Looking back at the our example survey, this means that for feature 1, the category strength is 4%. 10 out of 25 participants attributed the One-Dimensional category to the feature (40%), while 9 out of 25 (36%) attributed it the Must-Be category. The difference is 4%, and that’s too little to confidently determine that the feature is a Must-Be feature.

Answer reliability

Reliability refers to the scoring consistency among groups of participants. Many studies use Cronbach's alpha to measure item (question) consistency.

There is a fundamental problem with this approach. One of the parameters in the calculation of Cronbach's alpha is the average standard deviation between answers. But the answers on a Kano survey are not part of a scale. You cannot assign a numerical value to an answer. One answer is not higher or lower than another. Using Cronbach's alpha to determine consistency and reliability is therefore wrong.

You can however apply the test to customer satisfaction coefficients.

Also check this answer by Claude.AI

Certainly, I'd be happy to elaborate on each of these statistical tests that could be applied to Kano survey data. Let's break them down:

  1. Chi-square test:

The chi-square test for goodness of fit can be used to determine if the observed distribution of responses across Kano categories differs significantly from what would be expected by chance.

  • How it works: It compares the observed frequencies in each category to the expected frequencies if there were no preference.

  • When to use: When you want to test if certain Kano categories are more prevalent than others beyond random chance.

  • Example scenario: You might hypothesize that for a given feature, the responses are not evenly distributed across all Kano categories.

  • Confidence Intervals:

Confidence intervals provide a range of plausible values for the true proportion of responses in each Kano category.

  • How it works: It estimates a range within which the true population proportion is likely to fall, based on your sample data.

  • When to use: When you want to estimate the precision of your results and account for sampling error.

  • Example scenario: You might report that 30% of respondents classified a feature as "Attractive," with a 95% confidence interval of 25% to 35%.

  • Fisher's Exact Test:

This test is used to determine if there are nonrandom associations between two categorical variables.

  • How it works: It calculates the probability of getting the observed (or more extreme) results if there were no real association between the variables.

  • When to use: When comparing Kano results between two groups, especially with smaller sample sizes.

  • Example scenario: You might use this to compare if there's a significant difference in how two customer segments (e.g., young vs. old) categorize a particular feature.

  • Chi-square test for independence:

Similar to Fisher's exact test, but typically used for larger sample sizes.

  • How it works: It compares the observed frequencies in a contingency table to what would be expected if there were no association between the variables.

  • When to use: When comparing Kano results across multiple groups or features.

  • Example scenario: You could use this to test if there's a significant relationship between demographic variables (e.g., age groups, gender) and Kano classifications for a set of features.

It's important to note that while these statistical tests can add rigor to your analysis, they should be used thoughtfully in the context of Kano model results. The Kano model is primarily a qualitative tool, and overreliance on statistical significance could potentially overshadow important qualitative insights.

Would you like me to go into more detail on how to apply any of these tests to Kano survey data, or discuss their limitations in this context?

Last updated