Is there a standard approach to converting a Likert Scale to a CSAT%?

David Newman
David Newman Member
edited February 2023 in Services Delivery & Operations

Depending on the source, I get strong opinions with one of three options that seem to be commonly used.

In our case, we have survey options that are roughly worded in this manner:

"The service that you received was excellent"

With the following options:

Strongly Agree, Agree, neither agree nor disagree, disagree, and strongly disagree.

We assign a 1-5 value to those options.

Since our executive team likes to see things in percentage terms, we use those value to convert to a percentage and I currently disagree with our internal approach.

Effectively, I have found the following three calculations used in the industry:

1.) Take the total selected values and divide by the total available. In other words, if there were three survey response and a 3 and two 5s were chosen, you'd have the sum of the responses (13) divided by the total possible (15) to give you a % of 86.7%.

2.) Since the lowest possible percentage is 20% with the conversion outlined in 1 is 20%, some prefer to convert the 1-5 Likert scale to a 0-100% scale with steps at 0, 25, 50, 75 and 100. With the same response set illustrated above, you'd take the sum of the % values (50%+100%+100%=250%) and divide by the total available percentage (3x100%-300%) giving you 83.3%.

3.) The other method that I've seen employed often is to take the number of positive responses (in this case, choosing agree (4) or strongly agree (5) and divide it by the total responses in which case, you'd have 2/3 for 66%.

As a bit of a math and statistics nerd, I find 1 very unappealing as I believe it artificially biases scores higher than they should be. I'm not a huge fan of option 2, but feel if you are going to convert straight to percentages from values, it's the more accurate representation of the data.

I'm a fan of option three, in this situation, because of the way the survey is worded and the options that are available. If a customer is asked to evaluate whether or not they experienced excellence, then an agree or strongly agree should both be considered a "make" whereas anything evaluated as less than agree should be a miss.

Is there a standard that TSIA advocates for as one of the benchmarkers of the tech services industry?

Best Answer

  • Dave Baca
    Dave Baca Member | Guru ✭✭✭✭✭
    Answer ✓

    Hi David - TSIA's benchmarks normalize member CSAT survey data using a CSAT conversion calculator that takes either a Likert scale that is other than 1 to 5 or a Percentage value and converts those to a 1 to 5 scale for ASSISTED SUPPORT SERVICE transactional CSAT surveys and a 1 to 5 scale for Self-Service CSAT surveys. The most common CSAT scale for these types of surveys is the 1 to 5 scale. So while I have followed all of the great discussion in your Exchange discussion, when a Likert CSAT survey scale needs to be converted to a %, the Top-Box method is what I would favor and this same Top-Box method is also what many of our Members use to report their CSAT results to the C-Suite.

Answers

  • steve tennant
    steve tennant Member | Scholar ✭✭

    In practice, I’ve seen #3 used most and labeled “Top 2 %” or “Top 2 boxes %” at Fortune 50 tech companies.

    SurveyMonkey has a page here https://www.surveymonkey.com/mp/top-2-box-scores/

  • Alexander Ziegler
    Alexander Ziegler Founding Member | Expert ✭✭✭

    I like your thoughts. 1) and 2) are correct mathematical conversions of the numbers as you explained, and from my experience which one you use is dependent on your personal decision. I have seen the usage of both, and I personally always insist to be consistent inside a company and not using both in parallel. 3) looks to me as if somebody heard about NPS, and adds an NPS flavour to the numbers, I would not do this.

    My big question to you is: What are you doing with the numbers? What are the actions? What happens if the result/percentage is low? What do your Executives ask you, once the number is low?

    I am asking those questions as I spent a lot of work during the last years around NPS in services as a replacement (or add-on) to old KPIs measured as you do this. In the industry, there are statements that NPS works or does not work, and some people get quite emotional. I did a big review of which research exists, and my conclusion is that the key is to have NPS plus one additional question (usually "What is the reason for your evaluation" or "What do you suggest to improve"). I introduced based on this research NPS in my area five years ago (Education Services). Prior to NPS our CSAT was constantly above 95%, and we were satisfied with our quality. My motivation to use NPS was that IBM overall made excellent experiences in other areas. And the surprise for you: within the first 6 months we were able to improve our service and we discovered challenges that CSAT did not show. And since then it is a success story and I would never go back to judge a business without NPS. If you want to have more details please have a look to https://www.springerprofessional.de/en/the-value-of-a-net-promoter-score-in-driving-a-company-s-bottom-/18804634. I documented the case and had already interesting discussions around it. In the case you can't access the paper in the link please ping me and I'll send it to you.

  • Thanks Steve, this is what I've seen most often but I've never seen anything from TSIA or similar that recommended a particular practice, particularly when the question is phrased the way ours is. We had a technical support rep who never received less than an Agree to receiving excellent service but he almost never received a strongly agree. Consequently, he failed to come close to the target we set with option 1 as our benchmark but would have wildly exceeded it with 3. Thanks for your input, I like the wording for the categorization, too.

  • Alexander, first of all, thanks for taking the time to share your thoughts, I was in charge of our NPS program for a few years and will share some thoughts in a bit. As far as 1 and 2 go, I suppose in the end, it's just a matter of scale and consistency. A scale that doesn't go lower than 20% just didn't set right with me. I ran the numbers with all three methods over the course of 3 years worth of case responses. 1 and 3, with our dataset were similar on a whole, but for some individuals, it would have made the difference from being put on a performance plan and being considered a model employee. 3, when it's an outlier is almost always an outlier on the good side and verbatims and subjective performance evaluation of their case work seemed to always lead in this direction. I'd hate to see a quarter of data get evaluated as below acceptable performance when there wasn't a single case evaluated less than excellent. The differences between 1 and 2 were consistent, which I think supports your point that they are interchangeable at a corporate level when evaluating performance, but when TSIA benchmarks your company against peers, how is this reconciled?

    We use the numbers in a number of different ways. I'd run data based on aging, modules in our software, module groups, teams, etc, and we'd try to identify the areas that needed the most improvement and try to tackle those areas, whether it was with adding team members, looking for training opportunities, working with engineering, etc.

    As far as NPS goes, it sounds like we've got some similar experience in our past. I've read an awful lot of material on it and some of it's flaws and my personal conclusion has been that it's effectiveness as a "predictor of future revenue" is a better fit in some industries than it is others. Out NPS approach very much led to something similar to what you are describing, but with a few questions. After the NPS question, we then asked for the primary reason from a list of the most common responses from historical analysis, then we drop down into some categorization where appropriate to drill down to the area of the software that their comment most relates to. We give them an opportunity to then provide a verbatim comment which is great for additional context but difficult to analyze.

    Since I work in the world of enterprise software, a trend I often see is that unlike a consumer app/product, the level of the scores can tend to be on the lower level. We have people in our customer community that our without any doubt, strong supporters of our product who would routinely give us 5-6 scores because to them, that was a perfectly acceptable score. This situation led me to something called the Word of Mouth Index (WoMI). I found this particularly effective because it allowed me to filter out the situation where a strong advocate was labeled as a detractor, it correctly identified the type of detractor that can cause harm in the market, and it also had a strong stabilizing effect on the score. Our NPS, depending on who took it in a given half of the year could vary wildly swinging +/- 30 points. Individual scores typically remained consistent, but who responded was not. In essence, WoMI adds one more question to the NPS approach. When someone selects a 0-6 score, a second question asks how likely it is that they would detract from the brand/product to establish who the true detractors were because just assuming that a 5-6 is detracting from your word of mouth evaluation in the market can be overstating things significantly. I think it was ForeSee who introduced WoMI and I've found it to be a far better approach. Unfortunately, convincing others in the organization of it's value vs NPS seems to be a battle I'm not likely to win. They appease my reporting, though.

  • I guess going back to my original post, how does TSIA do benchmarking when it seems all three approaches are accepted but result in wildly different percentages?