Reports & Publications

8x8 CX Platform AI Transcription Accuracy vs. Dialpad, & RingCentral

Sponsor: 8x8
8x8 CX Platform AI Transcription Accuracy vs. Dialpad, & RingCentral

Abstract

Tolly evaluated the speech-to-text accuracy of 8x8’s built-in transcription engine against comparable services from Dialpad and RingCentral. Fifteen English-language audio files (3-7 min each) covering typical customer-support topics were played into every platform four times, using a controlled loop-back setup. Word-error-rate (WER) was the primary metric, calculated with the open-source jiwer library; lower values indicate higher accuracy.


In the best-case analysis (lowest WER per sample, then averaged), 8x8 produced a WER of 3.43 %, less than half the error of Dialpad (≈8.03 %) and RingCentral (≈8.08 %). Transcripts from 8x8 become available about 50 seconds after a call ends, whereas the other two services stream captions almost immediately, but with markedly lower accuracy.


Looking at the average of all four runs, 8x8 still led with 4.54 %, while Dialpad and RingCentral averaged 8.53 % and 9.20 %, respectively. Speaker accent influenced every system—Scottish and Welsh proved most challenging—but 8x8 consistently handled accent variability better than its rivals.


Key metrics

Metric

8x8

Dialpad

RingCentral

Best-case average WER

3.43 %

8.03 %

8.08 %

Average WER across all runs

4.54 %

8.53 %

9.20 %

Transcript availability

~50 s post-call

Near-real-time

Near-real-time