Video conference audio mixing algorithm and its implementation

With the rapid advancement of interconnected technologies, the amount of data processed across networked systems is continuously growing. Video conferencing systems have emerged as a critical tool for communication, particularly in scenarios where real-time audio and visual interaction is essential. The transmission of clear voice signals has become one of the most important performance indicators in these systems. Additionally, the mixing algorithm plays a crucial role in ensuring that multi-channel audio is properly combined and played back without distortion. One of the main challenges faced by terminals during the processing of audio signals is how to mix and play back multi-channel audio locally while maintaining synchronization with video and other effects. In practice, buffer overflow on the sound card after mixing is a significant issue that can lead to audio distortion or loss. To address this, an improved mixing algorithm is introduced, focusing on aspects such as mix quality, stagnation rate, delay, and scalability. Compared to existing algorithms, the proposed method demonstrates better performance in small-scale applications, effectively reducing buffer overflow, minimizing harsh mixing, and decreasing latency. This makes it a promising solution for real-world video conferencing environments. **1. Analysis of Mixing Algorithms** Sound is generated by vibrations, and its key characteristics include loudness, pitch, and tone. Naturally, human hearing perceives sound as a combination of multiple frequencies. In video conferencing systems, audio from different sources must be mixed in the time domain. Speech signals are typically sampled and quantized using 16-bit sound cards, which have a range from -32768 to 32767. When multiple channels are mixed, the resulting amplitude may exceed this range, leading to clipping and distortion. Several common solutions exist to handle this issue: - **Direct Clamping Method**: After mixing, if the signal exceeds the buffer range, it is clamped to the maximum value. However, this can introduce artificial waveform distortion and noise. - **Normalization and Mixing**: Signals are normalized before being mixed, which helps reduce overflow. However, as the number of channels increases, this method can result in lower volume and reduced clarity. - **Alignment Mixing**: This involves adjusting the mixing weights based on the intensity of the incoming signals. While it can enhance louder sounds, it often results in weaker sounds being masked. Although these methods are simple, they all have limitations in preserving the best possible sound quality. To overcome these issues, a new and improved mixing algorithm is proposed. **2. Improved Mixing Algorithm** In SIP-based video conferencing systems, there are various approaches to media stream mixing, including centralized and terminal-based mixing. Here, a distributed mixing model is designed. Unlike centralized models, where the server handles all media processing, this model allows terminals to receive and decode audio data locally before mixing. This reduces echo and minimizes server load, making it more suitable for real-time applications. The algorithm is specifically tailored for video conferencing systems used in SMEs and schools, where the number of participants is limited (typically fewer than five). Since participants rarely speak simultaneously, the likelihood of strong signal overlap is low. The algorithm takes into account both overflow and smoothing, using frame-based processing to maintain audio quality. Key steps in the algorithm include: - Initialization of the attenuation factor. - Analysis of peak values and zero-crossing rates within each frame. - Dynamic adjustment of the attenuation factor based on signal characteristics. - Outputting the processed audio to the sound card buffer. This approach ensures smooth audio output, minimal distortion, and efficient use of system resources. **3. Embedded Implementation and Results Analysis** To evaluate the performance of the improved algorithm, it was tested in an embedded environment using the TI DaVinci DM6446-594 processor. The algorithm ran on the ARM core at 297 MHz. Three audio samples were collected, including background noise and human speech, with varying intensities. The test results showed that the improved algorithm significantly outperformed traditional methods in terms of audio quality, latency, and resource usage. Compared to fixed-step attenuation and sample-by-sample processing, the frame-based approach reduced computational overhead and improved overall performance. The algorithm demonstrated smooth mixing, minimal noise, and no overflow, making it ideal for real-time video conferencing applications. **4. Conclusion** By leveraging the characteristics of speech signals and frame-based processing, the proposed algorithm effectively addresses the issue of mixing overflow. It uses short-term energy and zero-crossing rate analysis to dynamically adjust the attenuation factor, improving overall mixing quality. Users can choose between different algorithms based on their network conditions. The implementation on the ARM9 processor and the results confirm the algorithm's effectiveness and practical potential for real-world deployment.

Transparent LED Display

P3.91-7.82 Transparent LED Display

Features:

*Ventilated light
*Free air conditioning heat saving energy
*Environmental protection- it uses only a third of the power of a conventional Led screen
* Convenient installation High compatibility

* Nova MSD 300 sending card and Nova A5S receiving card

* Cabinet size:1000x500mm

* Kinglight/Nationstar LED Lamp, Refresh rate:1920-3840hz

* Ultra-Light designing, less than 6.5kg/pcs, hanging or floor mounted installation, no need the heavy steel structure.

* High brightness up to 3500cd/sq.m, even in the sunlight conditions can see clear, but low power consumption to save the electric power cost.

* High debugging brightness and no damage to gray scale, achieving the debugging technology for nice image.

* Passed the TÃœV,FCC,ROHS,CE cetification.

Our company have 13 years experience of led display and Stage Lights , our company mainly produce Indoor Rental LED Display, Outdoor Rental LED Display, Transparent LED Display,Indoor Fixed Indoor LED Display, Outdoor Fixed LED Display, Poster LED Display , Dance LED Display ... In additional, we also produce stage lights, such as beam lights Series, moving head lights Series, LED Par Light Series and son on...

Application:

* Business Organizations:
Supermarket, large-scale shopping malls, star-rated hotels, travel agencies
* Financial Organizations:
Banks, insurance companies, post offices, hospital, schools
* Public Places:
Subway, airports, stations, parks, exhibition halls, stadiums, museums, commercial buildings, meeting rooms
* Entertainments:

Movie theaters, clubs, stages.

Transparent LED Display,Indoor Transparent Led Display,Transparent Poster Led Display,Led Screen Panel

Guangzhou Chengwen Photoelectric Technology co.,ltd , https://www.cwleddisplay.com