Predicting Speech Intelligibility for Individual Cochlear Im... : The Hearing Journal -

Predicting Speech Intelligibility for Individual Cochlear Im… : The Hearing Journal

The topic of remote and customizable programming for cochlear implants (CIs) is enormously broad. From a service perspective, it can include remote provision of audiological testing or device programming services. At an engineering level, technologies include those required for the reliable communication with remote devices, programming of new software, remote diagnosis, and error-free configuration of sound processing parameters. Each of these aspects present challenges and opportunities. Customizable programming is a more complex topic. Programming of CIs has always included customizable aspects; setting or adjusting the threshold and comfort levels has resulted in excellent hearing outcomes for many recipients. Further customization of device programming, at an individual level, might well result in better hearing outcomes, or achieve similar hearing outcomes more quickly. Two important questions are which parameters should be customized, and how can we have a level of confidence that the customization might improve the hearing outcomes of a recipient?

FU1 Audiology, cochlear implants, speech.

Figure 1:

Data set 3 (Khing et al., 2013) shows mean subject sentence scores mapped to ISNR (A), OSNR (B), STOI (C) and VSTOI (D) with psychometric functions fitted for each test condition (solid lines), and the aggregated test conditions (dashed line). The Khing study evaluated intelligibility for two different AGC types, and two SNRs, as the presentation level was varied. SNR and STOI, calculated at the CI input, were unable to predict intelligibility. OSNR was shown to be an excellent predictor for this data set. Reproduced with permission from (Watkins et al., 2018). Audiology, cochlear implants, speech.

Figure 2:

For two of three data sets of clinical speech scores, OSNR was found as accurate a predictor as SNR, and more accurate than VSTOI. OSNR provided accurate predictions for data set 3, when SNR and STOI were unable to provide a prediction. This finding highlighted the importance of including the CI non-linear processing in the calculation of a prediction metric. For D and RMSE* low values indicate the most accurate predictions; for PSIG higher values indicate the most accurate predictions. (*p < 0.05, **p < 0.01, ***p < 0.001). Reproduced with permission from (Watkins et al., 2018). Audiology, cochlear implants, speech.

Figure 3:

Predicted speech intelligibility (red line) for Subject 4 for two conditions in data set 2 (Dawson et al., 2011); Noise Reduction 1 with SWN, and Noise Reduction 2 with SWN. The RMSE of the predicted scores with respect to the clinical scores was 14 percentage points for both conditions. There is a small but observable change in the SRT (vertical dashed line). Mean clinical scores (circles) note the number of tests at each ISNR as numbers within the circle. Error bars indicate the 95% confidence interval of the mean of the clinical scores. Reproduced with permission from (Watkins et al., 2020). Audiology, cochlear implants, speech.

Table 1:

Median prediction error and reference variability for each data set. The median prediction error was not more than 6 percentage points more than the reference variability. This variability was indicative of the variability of the underlying reference clinical scores. (Data from Watkins et al., 2020).

In this article, I present two studies that investigated the ability of a metric, the Output Signal to Noise Ratio (OSNR), to predict speech intelligibility for CI recipients. If individual recipient speech understanding could be predicted, this would provide opportunity to compare the performance of different sound processing algorithms, and potentially customize CI configurations to perform best for individuals.


Before presenting the studies, I would first like to share a little of my background. I started to develop hearing loss more than 15 years ago. At first, I found this a little confronting. I was relatively young, and this was something that I thought mostly happened at a more advanced age. Hearing aids were quite effective; however, my hearing continued to decline. Although I had a family history of hearing loss, including cochlear implants, my loss was described as idiopathic. I had trained as an electrical engineer and was working in the development and delivery of telecommunications technology. I wondered if my technical experience and my disability experience might combine to provide some new perspective on some of the challenges faced by people with hearing loss. This led to me commencing a PhD in Cochlear Implant Sound Processing in 2014, and the investigation of OSNR.

Many studies have shown a correlation between the Signal-to-Noise Ratio (SNR) and speech intelligibility for people without hearing impairment. As the power of target speech (the speech a person is trying to understand) declines compared to other sounds, it becomes more difficult for the target speech to be understood. I investigated whether the intelligibility of speech to a CI recipient was correlated with the SNR that they perceived at the output of the CI.

Modern hearing aids and CIs implement sophisticated, non-linear sound processing algorithms to enhance intelligibility for those with hearing disability. These algorithms perform many functions, such as reducing noise, and emphasizing important speech frequencies. An example of non-linear processing is the Automatic Gain Control (AGC) used to protect CI recipients from excessively loud sounds. This means that the volume of the sound presented to the recipient changes in a non-linear manner depending on the volume of sounds presented to the CI microphones. A consequence of this processing is that the SNR presented at the output of a hearing aid or CI, the OSNR, can be changed from the input SNR (ISNR). The OSNR might be better or worse than the ISNR, but is rarely the same.

The definition of a non-linear system is that the output of the system when processing the sum of two input signals is not the same as the sum of the output of the system when the two signals are processed separately. If G() is the transfer function, which describes the way in which the CI sound processor transforms the input speech (s) and noise (n), then

G (s + n) ≠ G (s) + G (n)

This means that OSNR cannot be determined by simply passing speech and noise separately through the sound processor and comparing their processed values. Building on earlier work by Khing et al., 1 a technique was developed where the combined speech and noise were passed through the CI sound processor, and the time varying gains of the system captured. The captured gains were then applied to the separate speech and noise signals, to generate the Output Speech, and Output Noise signals. The OSNR could be determined from these two output signals. As a CI does not have an audio output, OSNR was calculated from the magnitude of the stimulation that was applied to the CI electrodes.


An initial retrospective study 2 was conducted to evaluate the accuracy of OSNR as a predictor of CI speech intelligibility. In this study, the predictive accuracy of four different pre-diction metrics was evaluated against data sets of clinical speech scores from three previous studies. The data sets included speech scores for a range of noise types, signal processing algorithms, ISNRs, and presentation levels. Mean speech scores were plotted against each of the metrics, and a psychometric function fitted to the scores. Three different figures of merit were then calculated to assess the predictive accuracy of the metrics.

  • Metrics – Four metrics were evaluated. The first three were ISNR, OSNR, and the Short Time Objective Indication. 3 STOI was selected, as it had been shown to be a good predictor of CI speech intelligibility in a study by Falk et al. 4 STOI implements “CI like” processing; however, it does not implement actual CI algorithms. For the fourth metric, the output of the CI was reconstructed as an audio signal using vocoder techniques, and STOI was calculated with the reconstructed signal. This metric was named Vocoder STOI (VSTOI).
  • Figures of merit – Three figures of merit were calculated to evaluate the accuracy of the prediction metrics. Two figures of merit were selected from the Falk et al. study; 4 the Pearson sigmoidal correlation (PSIG) and epsilon insensitive root mean square error (RMSE*). These measures assess the goodness of fit of the psychometric function to the clinical scores. RMSE* acknowledges that a metric cannot be less variable than the data it predicts, and only allocates a prediction error when the psychometric function does not lie withing a 95% confidence interval of the mean speech score. The third goodness of fit figure of merit, statistical deviance (D), was taken from a study by Khing et al. (2013). 1

Full details of the study method and the reference studies are available in Watkins et al. (2018). 2

An example of the mapping goodness of fit of the psychometric function to the clinical scores for one study (Data set 3) 1 is shown in Figure 1. Mean speech scores for two different AGC types, and two SNRs, as the presentation level was varied are shown. SNR and STOI, calculated at the CI input, were insensitive to changes in the presentation level. These metrics, along with many others, are designed to ignore the effect of presentation level on speech intelligibility. Changes in level can affect intelligibility for CI recipients. OSNR, which included the full CI signal processing path, was shown to be an excellent predictor for this data set.

The figures of merit calculated for each of the prediction metrics are shown in Figure 2. SNR and OSNR were equally accurate predictors for data sets 1 and 2, where ISNR was varied with a range of processing algorithms. They were significantly more accurate than STOI for data set 1. OSNR provided accurate predictions for data set 3, where the presentation level was varied, for two ISNRs and a number of different AGC algorithms. STOI and SNR were unable to provide intelligibility predictions for data set. The predictions provided by VSTOI were inaccurate, perhaps due to distortions introduced by the vocoding process.

In Study 1, OSNR was shown to be at least as accurate as other prediction metrics evaluated, and was able to provide predictions of intelligibility in conditions where other metrics could not.


Hearing outcomes achieved by CI recipients vary significantly. The variation has been linked to a range of factors, including duration of deafness before implantation, age at onset of hearing loss, survival of target neurons, cognitive function, and underlying etiology. As a result of this individual variability, CI studies, including Study 1 above, typically investigate the mean performance of a group of recipients. This approach provides valuable information on the average benefit of new sound processor algorithms or configurations. However, the benefit to individual recipients will vary.

In a follow-on study, the ability of OSNR to predict individual speech scores in a range of conditions was investigated. 5 Data sets of speech scores in a range of processing conditions, ISNR, and presentation levels were obtained from three previous clinical studies. Each of the data sets included speech scores in multiple sound processing conditions. Noise types included multi-talker babble, party noise, cocktail noise, and speech weighted noise (SWN).

To predict individual recipient speech scores, the condition closest to a recipient’s “every day” condition was selected as a reference condition. OSNR was calculated for each of the test points in the reference condition, e.g., for each ISNR tested, and a recipient’s speech scores plotted against those OSNR values. The OSNR calculations used the same speech and noise material, and sound processing models (Cochlear Ltd., Sydney) as the original studies. A psychometric function was fitted and used as a map between individual speech scores and the OSNR values in the reference condition. To predict speech scores in another condition, e.g., with noise reduction enabled, the OSNR value was calculated for that condition, and the reference psychometric function was used to determine a corresponding speech score. Although the OSNR value for a particular condition was the same for all recipients, the reference psychometric function contained the speech scores specific to each recipient.

The accuracy of predictions was measured by calculating the Root Mean Square Error (RMSE) between the clinical speech scores and the predicted psychometric function. To provide some relative measure of accuracy, psychometric functions were also predicted for each reference condition. That is, the reference was used to predict itself. The prediction error in this scenario was described as the reference variability. It was reasoned that predictions in other conditions should not be expected to be more accurate than the reference variability.

Full details of the study method, and the reference studies are available in Watkins et al. (2020). 5

Using the method described above, psychometric functions were predicted for all conditions and each recipient in the three clinical data sets. An example of predictions for an individual recipient in SWN and two different noise reduction algorithms (NR1, NR2) is shown in Figure 3. The vertical dash line indicates the speech recognition threshold (SRT). For this recipient, a small improvement in SRT with the NR1 algorithm enabled, compared to that with NR2 enabled, can be observed.

The median prediction accuracy and reference variability for each data set in the study are shown in Table 1. The prediction errors were within a few percentage points of the reference variability for data sets 1 and 3, and within 6 percentage points for data set 2. For the most aggressive noise reduction condition in data set 1, the predicted scores were quite inaccurate. Investigation found that the noise reduction algorithm had reduced the target speech by up to 30 dB, likely causing audibility challenges. This highlighted a limitation in the OSNR predictor, and also identified the opportunity for a hybrid metric that considered both speech power and the OSNR. 8

Overall, Study 2 demonstrated that an OSNR-based model was able to predict individual scores, relatively accurately, for a wide range of conditions.


The motivation for the two studies presented was to investigate the feasibility of predicting speech intelligibility for individual recipients using the OSNR metric. Study 1 demonstrated the accuracy of OSNR as a predictor, and its effectiveness in including actual CI processing algorithms. Study 2, for the first time, used clinical scores in a reference condition as a basis for predicting complete psychometric functions for individual CI recipients in a wide range of other sound processing conditions. Further investigation is ongoing to understand more about OSNR, its limitations, possible improvements, and possible applications.

One potential application of OSNR is as a bench-test tool to evaluate new CI sound processing ideas. The traditional approach of evaluating new processing strategies in clinical studies has been effective, but is time consuming and limits the number of ideas and the range of algorithm parameters that can be tested. An accurate metric would provide a means of screening a range of ideas and selecting those thought to be most effective for testing with recipients.

Another possible application is the testing and selection of sound processor algorithms and configuration parameter sets that work best for an individual recipient. A set of hearing in noise scores is available for many recipients, or is relatively easily acquired. These scores, combined with OSNR, have been shown to provide accurate predictions of intelligibility in quite different listening conditions. There is no such thing as a perfect prediction metric (yet!); each metric has its strengths and weaknesses, and few metrics have been developed specifically for use with CIs. Nevertheless, OSNR has been demonstrated to have promise as a predictor of individual speech intelligibility for CI recipients. This might be one small step on the journey to individually optimised CI configurations, and better hearing outcomes.

Much important technical work remains as we continue to improve outcomes achieved by CI recipients. However, these critical technical aspects are only one part of the CI story. As an engineer, I can describe the operation of CI systems in terms of maths and physics. As a recipient, CIs have been a life-changing experience that has transformed how I am able to interact with the world. The support of family, friends, clinicians, and colleagues has been a key part of my journey. I was welcomed by, and learned from, the HI-ARO network of deaf and hard-of-hearing engineers and scientists whose work was recently described in Frontiers. 9 I am comfortable asking for help when I have hearing challenges, and almost everyone I meet is keen to help. However, some people with hearing loss choose not to discuss their challenges. This is where we, as a community, can offer information and support as we normalize discussions around hearing disability. CIs are not for everyone, but it is important that every candidate has access to information so that they can make an informed choice.

Note: Views expressed are those of the author and not of Cochlear Limited.

The Creative Commons license does not apply to this content. Use of the material in any format is prohibited without written permission from the publisher, Wolters Kluwer Health, Inc. Please contact [email protected] for further information.

Thoughts on something you read here? Write to us at [email protected]


1. Khing P P, Ambikairajah E, Swanson B A 2013 Predicting the effect of AGC on speech intelligibility of cochlear implant recipients in noise. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8061-8065). from

2. Watkins G D, Swanson B A, Suaning G J 2018 An Evaluation of Output Signal to Noise Ratio as a Predictor of Cochlear Implant Speech Intelligibility Ear Hear 39 958 968 from

3. Taal C H, Hendriks R C, Heusdens R, et al. 2011 An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech IEEE Transactions on Audio, Speech, and Language Processing 19 2125 2136 from

4. Falk T H, Parsa V, Santos J F, et al. 2015 Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices IEEE Signal Process Mag 32 114 124 from

5. Watkins G D, Swanson B A, Suaning G J 2020 Prediction of Individual Cochlear Implant Recipient Speech Perception With the Output Signal to Noise Ratio Metric Ear Hear 41 1270 1281 from

6. Mauger S J, Dawson P W, Hersbach A A 2012 Perceptually optimized gain function for cochlear implant signal-to-noise ratio based noise reduction J Acoust Soc Am 131 327 336 from

7. Dawson P W, Mauger S J, Hersbach A A 2011 Clinical evaluation of signal-to-noise ratio-based noise reduction in Nucleus(R) cochlear implant recipients Ear Hear 32 382 390 from

8. Watkins G D, Swanson B A, Suaning G J 2019 An Investigation of Audibility Effects on Cochlear Implant Speech Perception Prediction. In Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 1801-1804). Berlin.

9. Huyck J J, Anbuhl K L, Buran B N, et al. 2021 Supporting Equity and Inclusion of Deaf and Hard-of-Hearing Individuals in Professional Organizations. Frontiers in Education, 6. from

Leave a Comment