Reports & Publications
8x8 CX Platform AI Transcription Accuracy vs. Dialpad, & RingCentral
Login or create an account to download this report
Abstract
Tolly evaluated the speech-to-text accuracy of 8x8’s built-in transcription engine against comparable services from Dialpad and RingCentral. Fifteen English-language audio files (3-7 min each) covering typical customer-support topics were played into every platform four times, using a controlled loop-back setup. Word-error-rate (WER) was the primary metric, calculated with the open-source jiwer library; lower values indicate higher accuracy.
In the best-case analysis (lowest WER per sample, then averaged), 8x8 produced a WER of 3.43 %, less than half the error of Dialpad (≈8.03 %) and RingCentral (≈8.08 %). Transcripts from 8x8 become available about 50 seconds after a call ends, whereas the other two services stream captions almost immediately, but with markedly lower accuracy.
Looking at the average of all four runs, 8x8 still led with 4.54 %, while Dialpad and RingCentral averaged 8.53 % and 9.20 %, respectively. Speaker accent influenced every system—Scottish and Welsh proved most challenging—but 8x8 consistently handled accent variability better than its rivals.
Key metrics
| Metric | 8x8 | Dialpad | RingCentral |
|---|---|---|---|
| Best-case average WER | 3.43 % | 8.03 % | 8.08 % |
| Average WER across all runs | 4.54 % | 8.53 % | 9.20 % |
| Transcript availability | ~50 s post-call | Near-real-time | Near-real-time |