HomeTechnologyArtificial IntelligenceHow good are ultra-low bitrate speech codecs?

    How good are ultra-low bitrate speech codecs?

    Courtesy: Rhode and Schwarz

    Quality Evaluation of Speech Coding Technologies

    A comprehensive quality test was conducted to evaluate the perceived quality of various speech coding technologies under realistic conditions. The study compared current mobile network codecs with traditional low-bitrate codecs and emerging AI-based ultra-low bitrate speech coding solutions.

    In the test, a set of German speech samples spoken by various speakers was processed through each codec type. A controlled listening experiment was applied to assess overall speech quality with respect to the naturalness of reproduced speech, combined with typical transmission impairments such as packet loss and bandwidth constraints. The evaluation aimed to reflect real-world usage scenarios, including mobile calls, popular IP-based voice services, and speech transmission over satellite links.

    To achieve statistically meaningful results, a formal listening test was conducted in a standardised acoustic environment following the ITU-T P.800 methodology using the Absolute Category Rating (ACR) approach. A total of 32 participants – men and women from various age groups – were invited to rate the speech samples. The test ensured balanced demographic representation and controlled conditions to obtain reliable subjective quality scores. Participants evaluated multiple samples per codec type, and the results were statistically analysed to identify significant differences in perceived quality.

    Key categories included:

    • Modern Mobile Codecs: Including EVS and AMR-WB, which are widely deployed in LTE and 5G networks. Additionally, OPUS (used in WhatsApp) and Satin (used in MS Teams) were considered under real transmission conditions. These codecs offer high fidelity and robustness, especially under variable network conditions.
    • Legacy Low-Bitrate Codecs: Such as MELP and LPC-10, and the amateur radio codec Codec2, representing earlier generations of strong speech compression. These codecs were originally designed for extremely bandwidth-constrained environments and are still used in specialised applications.
    • Ultra-Low Bitrate AI-Based Codecs: Leveraging deep learning models for end-to-end speech representation and reconstruction. The tested codecs operate in the bitrate range of approximately 600 bit/s to 3 kbit/s. For comparison, 600 bit/s is only one hundredth of the well-known ISDN transmission rate (64 kbit/s) and just one fortieth of the bitrate typically used in VoLTE (24 kbit/s).

    Ultra-low bitrate codecs are of particular interest for use in satellite-based communication systems (e.g., Non-Terrestrial Networks, NTN) in Direct-to-Cell or Direct-to-Device mode (smartphones receive signals directly from satellites), where bandwidth is highly constrained, and latency is critical. They are also relevant in military and tactical communication scenarios, where efficient spectrum usage and resilience to transmission errors are essential.

    Performance of AI-Based Codecs

    The new AI-based codecs support 8 kHz wideband and 12 kHz super-wideband audio and demonstrate a significant leap in perceived speech quality and naturalness compared to classical low-bitrate codecs. Some AI-based solutions approached the performance level of high-quality codecs such as AMR-WB and EVS, making them promising candidates for future communication systems under strong bitrate constraints or high network load situations. The computational complexity of these codecs was not investigated in this study; however, some implementations introduce only a short delay that is acceptable for use in real-time communication.

    These codecs deliver speech that sounds natural and pleasant to the listener without question. However, they do not always reproduce all speaker-specific characteristics with full accuracy. For example, pitch and intonation may be slightly altered, and in some cases, initial phonemes or consonants may be replaced or smoothed. While this may be acceptable for everyday conversation, it can limit their applicability in scenarios requiring speaker identification, authentication, or mission-critical communication.

    The following table shows some representative results of the listening experiment; the Mean Opinion Score (MOS) rates the subjectively perceived quality on a scale from 1 (bad) to 5 (excellent):

    The detailed results of this evaluation, including statistical analysis, codec performance rankings, and listener feedback, are presented at the ITU-T SG12 meeting in September 2025. These insights are expected to contribute to ongoing discussions around codec standardisation, the definition of “quality,” and its automated prediction, particularly in the context of future mobile and satellite communication systems.

    ELE Times Research Desk
    ELE Times Research Deskhttps://www.eletimes.ai
    ELE Times provides extensive global coverage of Electronics, Technology and the Market. In addition to providing in-depth articles, ELE Times attracts the industry’s largest, qualified and highly engaged audiences, who appreciate our timely, relevant content and popular formats. ELE Times helps you build experience, drive traffic, communicate your contributions to the right audience, generate leads and market your products favourably.

    Related News

    Must Read

    NXP CoreRide Puts Automakers on Fast Path to 48 V Scalable Zonal Architectures

    NXP Semiconductors introduced its NXP CoreRide Z248 zonal reference...

    Microchip Helps Manufacturers Meet Cybersecurity Regulations, Expands Security Services in the Trust Platform

    As cybersecurity regulations tighten worldwide, product manufacturers must embed...

    Everspin Launches New Generation of Unified Memory for Embedded Systems

    Everspin Technologies, a leading developer and manufacturer of magnetoresistive...

    TI’s microcontroller portfolio and software ecosystem expanded to enable edge AI in every device

    Texas Instruments (TI) introduced two new microcontroller (MCU) families...

    R&S to showcase future-proof EMC testing solutions at EMV 2026

    Rohde & Schwarz will participate in EMV 2026, Europe’s...

    Infineon extends leadership position in global microcontroller market

    Infineon Technologies further extends its number one position in...

    Traction Inverter: Keys to understanding the inverter, the traction, and why X-in-1 solutions are increasingly popular

    Courtesy: STMicroelectronics Traction inverters are at the heart of electric...

    5 Upcoming AIoT Trends to Lookout for in 2026

    Courtesy: Hikvision As we enter 2026, the convergence of artificial...

    Motor Vehicle Motors Without Rare Earths: Chara Technologies’ Reluctance Motor Bet

    Six years ago, when rare earth magnets were still...