

Main themes in the 2024 edition
13th October
14th October

















Description:
Our best-selling workshop from RTC.ON 2023 is coming back! In this workshop, we’re gathering all the Multimedia basics to fully prepare you for the upcoming audio & video talks.
Level: Beginner
Description:
An end-to-end workshop on building your own Speech-to-Text and LLM service in C++ using ExecuTorch. Learn how to export models from the HuggingFace Transformers library and run them locally in C++. By the end, you'll be able to create your own fully offline AI solution, powered entirely on-device.
Level: Intermediate
Description:
Running real-time media applications in the cloud is notoriously hard – especially when Kubernetes is involved. This full-day, hands-on workshop is designed to demystify the complexities of deploying WebRTC-based services in Kubernetes environments. We'll start from the ground up with a practical introduction to Kubernetes fundamentals and build toward a deep understanding of the networking challenges unique to real-time communication systems.
Level: Intermediate
This talk will provide an overview of CyanView’s architecture and how Elixir is used throughout the system. We’ll explain how the codebase is organized and how builds are tailored for different device targets. The session will detail the number and variety of camera integrations we support, as well as the specific controls we are able to implement on those cameras. We'll walk through examples of how these integrations are handled in practice. Connectivity examples will also be covered, including setups such as RCP to camera, RCP to VP4, and RCP to RCP. Our use of MQTT as the main message broker will be discussed, along with custom extensions and the scale of event processing observed during the Olympic Games. We’ll then focus on our LiveView-based applications, including how we achieved reactive performance on devices with dual 650 MHz processors and how this approach is being used in other areas. The talk will conclude with a look at future plans, including streaming capabilities and cloud integration.
This talk explores how real-time calling technologies – specifically WhatsApp and WebRTC – are being used to support impactful social programmes in health across the Global South. It will cover why scalability, both technical and financial, is essential for meaningful impact, and examine the importance of security and privacy, especially in light of regional policy and governance considerations when deploying audio and video calling for health. The session will also touch on how AI is shaping conversational systems in this context, from both technical and policy standpoints, highlighting the opportunities and implementation challenges. Additionally, it will share why Elixir and the Elixir WebRTC project enabled seamless integration with low operational overhead, along with key technical learnings from the implementation process, including networking considerations.
This talk introduces the audience to the latest developments in Media Over QUIC Transport, WARP Streaming format, CAT-4-MOQT and WebTransport. We'll examine the new features and capabilities being developed in these standards, their applicability to RTC applications and view some demos of the newest protocols in action.
WebRTC statistics offer a powerful window into media performance—but collecting, analyzing, and interpreting them accurately is far from straightforward. In this talk, we’ll explore the real-world challenges of building observability, debugging tools, and issue detection systems using WebRTC stats. We’ll look at how key metrics behave in practice, the common pitfalls developers run into, and how these challenges surface in real-world environments. Drawing from lessons learned through open-source projects and hands-on experience with production systems, this session will highlight what’s proven useful, what hasn’t, and what to keep in mind as WebRTC continues to evolve.
In this talk we present Juturna, a Python library for creating and managing parallel pipelines for real-time, AI-oriented data applications. While specifically conceived as a companion framework for the Janus WebRTC Server, and already used in production for providing real-time transcriptions of IETF meetings, Juturna quickly proved to be a flexible and generic component, suitable for a variety of customisable tasks. Juturna leverages the built-in Python multithreading library, so it can be used to instantiate multiple pipelines that consume different data sources simultaneously. The Juturna core framework is fully parallel, modular, and real-time oriented. Those characteristics make Juturna a very flexible and versatile tool in a scenario where audio and video streams from distributed sources have to be processed in a lively fashion according to a variety of heterogeneous AI tasks. Juturna offers native components explicitly designed for real-time applications. In particular, a set of RTP source nodes can be deployed to consume remote media streams and make them available to processing nodes. However, Juturna components are open-ended entities that can be designed to address any type of task. Because of this flexibility, nodes can easily incorporate ML and AI resources, and apply them to the streaming data they consume.
This talk will present the challenges and benefits of building a video composition pipeline using Vulkan Video to achieve a GPU-only workflow. After a quick introduction to Vulkan, we'll discuss current state of Vulkan Video. Then, we'll dive into an overview of the intersections between the video codec and Vulkan Video specifications. We'll also briefly talk about other approaches to using the GPU's (de/en)coding hardware with a rendering API, and the performance benefits of GPU-only workflows compared to more popular approaches to video composition, such as using Chromium.
The WebCodecs API is close to become cross browser supported with Safari being the last browser with missing audio support. But the API is only dealing with encoding and decoding media. However in practice encoded media is usually wrapped in some sort of container format and these containers come in a myriad of different flavors. Therefore it's no surprise that the GitHub issue discussing container support for WebCodecs was one of the oldest and most commented on. It was ultimately closed as out of scope. Luckily dealing with media containers is not a new problem and there are many long standing libraries which have been ported to JavaScript or can be compiled to Web Assembly. But most of these libraries have been written without thinking about tree shaking, lazy loading, or custom builds. This talk is about what can be done with existing libraries to minimize the bundle size or to load them on demand. I also want to present some ideas for a new container library explicitly designed for the web with extensibility and flexibility in mind.
Have you ever imagined programming your own song using nothing but code? In this talk, we’ll explore the world of the Web Audio API — a powerful tool for sound synthesis, manipulation, and real-time audio programming. I’ll walk you through what makes it so flexible, what makes it challenging, and what makes it downright weird. We'll look at its architecture under the layer of javascript interface, and what are the pitfalls that come with re-creating Web Audio API in environments other than browser.
This talk is a fun and practical exploration of how far we can push WebRTC on low-powered and inexpensive hardware. From Raspberry Pis to unexpected embedded platforms, we’ll dive into the journey of getting real-time communication working on constrained devices. We'll explore how to make it work, what breaks, and how to hook up real hardware to make it all useful. Expect demos, pitfalls, hacks, and plenty of surprises along the way.
Remote operation or supervision of Vehicles requires a consistent low latency video feed. This is especially difficult when using commodity components and public 5g networks. This talk is based on our experience gained with our 5g WebRTC camera at racetracks, test tracks and city streets around the world. I will describe and demonstrate our target latency, how we came up with it and how we measure it. I'll cover the various layers that contribute to latency and how we can reduce it in each layer, some of which are obvious and some were a surprise.
We will present our implementation of secure, collaborative cloud application sharing, using WebRTC. Our customers include many film and television studios who have the need to rapidly present and review high-resolution HDR assets from dozens of diverse applications. Because of their strict content security requirements, they don't want copies of the pre-release assets being downloaded and shared for review sessions, and they want tight forensic tracking over who views the assets. We designed a solution that maintains content security while allowing session hosts to hand off control of applications in a fluid, collaborative way. Our general cloud-containerization solution is used to present web-based content systems such as DAMs (digital asset management, like Frame.io) and ECMs (enterprise content management, like Box.com) as well as applications for project management, animation, and graphics. We’ll talk about how we implemented our cloud infrastructure, sidecar instances, and demonstrate how the application sharing can be used on destination platforms such as web, desktop, and Apple Vision Pro.
Technical challenges encountered while developing and scaling real-time livestreaming to 4K resolution at 60 frames per second for audiences exceeding 20,000 subscribers. Key issues addressed included optimizing bandwidth usage, minimizing latency, achieving robust infrastructure scalability, and ensuring consistent, high-quality video delivery under heavy load. Discussing practical solutions and innovative approaches to overcome these challenges, ultimately enhancing user experience in large-scale, real-time streaming scenarios.
Jitsi Meet has had realtime transcriptions since around 2017. With the advent of gen AI technology we rebuilt our transcriber to leverage the state of the art tech. Not all the glitter is gold, however, and we also found ourserves building an "async" or "deferred" transcriber to optimize for cost and user expectations. In this presentation I'll go throughb our journey, weaving the tech and product aspects.
In this session, we’ll explore how large language models (LLMs) can be integrated into real-time video workflows to detect critical conditions within live streams—from traffic monitoring and crowd congestion control to forest fire detection, firearms recognition, and content moderation for user-generated video. I’ll walk through real-world use cases and technical architecture, focusing on how Red5 Pro’s Brew API enables raw video frame extraction for analysis. We'll also look ahead at the evolving role of AI in real-time video systems and the broader implications for safety, automation, and platform integrity.
From hacked together roots, the LLM based voice AI industry has gone from nowhere to quite a developed landscape of the the last two years. This talk is a developers eye view of the fundamentals of where we seem to be, and where we may be headed. Supported by tales from the trenches of large scale deployments and at least one live demo of new stuff that hasn't been seen before, it will attempt to demystify the architectural choices for building systems on mostly open source software.
TURN servers are a critical component of real-time communication infrastructure—but when misconfigured or overlooked, they can become a serious liability. In this talk, we’ll explore the security threat model of TURN deployments, including how attackers can leverage them for denial-of-service attacks, access to internal or cloud metadata services, and even to relay malicious traffic externally. We’ll share real-world examples of vulnerabilities encountered during security assessments and outline practical, effective hardening techniques. From network isolation and feature minimization to access control and DoS resilience, you’ll walk away with actionable guidance to ensure your TURN server is a secure and reliable part of your media architecture.
LLMs, image or video generators, speech to text and text to speech - simply speaking AI, has seen a meteoric rise in general adoption in just the past few years. It is most commonly hosted on large cloud servers with significant computing power available. But is that really the way forward? With edge devices becoming increasingly capable and a growing focus in software development on creating new frameworks for edge AI, we believe the future of AI lies at the edge. In this talk, we will walk through the main trade-offs of each approach and demonstrate what is possible today!
Using AI in realtime poses a number of challenges. These include handling multimodal input from chat, audio, video, community settings, and web searches; addressing latency since live discussions move on quickly; and ensuring scalability and reliability. We have built an AI platform that combines these datatypes and has a plug-and-play model to leverage any LLM. We can optimize for latency, quality, and costs all depending on the requirements for each feature. Information gets distributed to viewers using web sockets as well as WebRTC data channels. We will present an overview of this and dig into some example features. Gigaverse is a live interactive platform that uses AI to support the creator in realtime during a live session. We combine data from voice, chat, video, community settings to assist the creator to run live sessions that can scale to thousands of participants. The platform provides a live summary of chat, highlights topics being discussed, identifies questions that are asked by the audience, and finds the most engaged viewers so they can be invited to join the session. It augments the discussion with extra information for the viewers and fact checks statements in realtime. Creators can engage their audience by running a poll just by using their voice. Moderation is flexible, allowing creators to set the tone themselves for the discussion.
Tired of buffering ruining your binge-worthy brilliance? In this talk, we’re cooking up some fresh, low-latency streams with Media over QUIC—no half-measures. Join us as we break bad with legacy protocols and show you how to stream like a Heisen-pro.





