Whether you’re new or experienced with concept testing, it’s always nice to be familiar with some of the “tricks of the trade” or best practices. Below, in Q&A format, you’ll find insight into some of the frequently asked questions around concept testing.
1. Should brand name be used in a concept test?
In general, yes. We know that the presence or absence of a brand name DOES impact peoples’ expectations for the product/concept, and therefore is a critical element in determining the overall appeal of a concept. Another aspect of a strong concept is “congruity.” Does the whole concept “hang together?” The brand name is critical in establishing what the concept stands for.
2. Is there an ideal concept testing length? Meaning – is there a certain number of concept “elements” you recommend testing at once? Or is it simply “less is better?”
There is a “concept architecture” that people expect to see when they read about a new concept/product:
- Brand name
- Positioning line (what makes this concept unique)
- Unique product attribute(s) – i.e. “reason to believe”
- Consumer benefit – what does the product attribute really deliver in terms of experience
- Call to action: where to go to find it/promotional offer
These elements represent a complete “commercial concept” and probably should be included in a concept statement. If we’re doing a conjoint exercise, we usually limit the number of elements to 4 or 5 at a time.
3. What are some of the common biases that can come from different concept testing methods? And how do we help eliminate these biases?
Order bias – the first concept in a sequence is evaluated independent of other concepts, while the rest are evaluated in reference to the first one. To help offset for order bias we randomize the order of presentation so that each concept can be shown in any position.
Overstatement bias – respondents tend to overstate their interest in trying a new concept – especially when there is no price listed on the concept. Even then, they aren’t REALLY paying real money in a concept test, so people tend to minimize the purchase risk and overstate their interest. We can correct for this overstatement bias by using a weighting algorithm to adjust.
4. Is there a standard set of concept testing approaches?
You can categorize on several different dimensions. If you’re looking at it from the perspective of how many concepts are shown to an individual respondent, then monadic, sequential monadic, and paired comparison are common approaches.
5. Generally, in what scenario is it best to use the 3 concept testing methods mentioned above?
Monadic gives you the purest read of “how strong is this concept?” If sample sizes are sufficient, you will be able to measure slight differences between concepts, but it’s not very cost efficient to do so.
Sequential monadic is not as “pure” a read. It is a bit messy because you do have the possibility of interactions between the concepts (e.g. exposure to concept A effects the reaction to concept B). However, it is good for finding if there are a few concepts that are stronger than the rest, and often provides the best bang for the research dollar.
Paired comparison and other techniques like “MaxDiff” are useful in developing a rank-order based on relative concept appeal. This is good when you want to prioritize development dollars against a few concepts with the strongest chance of being successful. These rank-order approaches are not as solid in making “go/no-go” decisions on a specific concept. For “go/no-go” – we would recommend a monadic design.
6. How do sample specs change with each concept testing strategy (monadic, sequential monadic, paired comparison)? For example, does a monadic test require a larger sample size because respondents only see one concept at a time?
‘Who’ should be included in a concept test depends entirely on the strategy. Is the new concept/product tasked with bringing in new category users? Or stealing share from competitors? Or increasing volume from existing customers?
A monadic design requires a larger total sample size because each respondent sees only one concept. So if you have 3 concepts you are evaluating, you need 3x the desired sample. If you use a sequential monadic design, each respondent sees multiple concepts, so you are “re-using” respondents for greater cost-efficiency. There are also “partial” designs – for example, if you have 15 concepts, each respondent might only see 6 of the 15. This is usually done to help reduce respondent fatigue.
7. How is potential cannibalization measured in a concept test? (i.e. how much of the parent brand’s share could be cannibalized by this new product?)
This becomes complicated quickly. A common approach is to have 2 cells. Cell 1 is a set of products that includes the parent but not the new item. This serves as a benchmark or “control.” The second cell has both the parent and the new item in the choice set. What you analyze is the source of volume for the new item – how much of the share that the new item takes comes from each of the competitors – including the parent. This source of volume analysis establishes how much cannibalization could exist. The sum of the parent’s share plus the new item’s share shows if there is a significant incremental share gain for the brand.
8. What are the different scales used to evaluate concepts and when is each scale best used?
The 11-point scale, originally created by Dr. Thomas Juster at the U.S. Department of Commerce, combines written descriptions (e.g. certain will purchase, almost certain will purchase, very probably will purchase, etc.) with probability estimates to produce a scale that has been shown (primarily in the work of Kevin J. Clancy and his associates) to be very predictive of actual purchase behavior. The response to the 11-point scale is weighted to create a corrected “potential trial percentage” that takes into account the tendency to over-estimate interest/purchase intention in most concept evaluation research.
Other common scales include the 5-point “Def/Prob” scale (e.g. definitely would buy, probably would buy, might/might not, probably would not, definitely would not). This was very common when most of the data was being collected via telephone … it is short, simple and easy to understand when communicated verbally. It is still used “by tradition” and when you want to compare the results of a concept test to a database of concepts that all use the 5-point scale. (Should be noted that there are slight differences in the natural distribution of a 5-point scale collected by phone vs. by internet, but adjustments can be made to improve comparability).
If you have any other questions about concept testing, please let us know in the comments and we will answer them.