The conference for audio and video devs

WebRTC
Streaming
Broadcasting
Video
Audio
AI
MoQ
QUIC
and more

aBout

Main themes in the 2024 edition

Generative AI in Computer Vision

learn more
arrow

AI Image Processing

*The final agenda may vary and we’ll try to adjust it to the attendees’ level of experience and knowledge.

13th October

Talks & Networking

9:00 - 11:00

Łukasz Wala
Saúl Ibarra Corretgé
Rob Pickering
Bartosz Studnik

11:00 - 11:30

11:30 - 13:00

Lorenzo Miniero
Rishit Bansal
Javi B
Łukasz Kita

13:00 - 14:30

14:30 - 16:00

Jonny Burger
Wojtek Barczyński
Paula Osés
Wojciech Jasiński

16:00 - 16:30

16:30 - 17:30

Dan Jenkins
Ritvi Mishra
Michał Śledź

14th October

Workshops, hackathon & afterparty

9:00 - 11:00

Membrane training space
Training room

11:00 - 11:15

11:15 - 13:15

Membrane training space
Training room

13:30 - 14:30

14:30 - 20:00

20:00

Important!

You don’t have to buy a separate ticket for that day. Hackathon Day and Afterparty are included in the Combo and Conference tickets. You can sign up for specific workshop at later stage - we will contact every ticket holder personally.
attendee showing thumbs up

2025 workshops

Multimedia 101

Description:
Our best-selling workshop from RTC.ON 2023 is coming back! In this workshop, we’re gathering all the Multimedia basics to fully prepare you for the upcoming audio & video talks.

Level: Beginner

Local LLMs & STT with ExecuTorch in C++

Description:
An end-to-end workshop on building your own Speech-to-Text and LLM service in C++ using ExecuTorch. Learn how to export models from the HuggingFace Transformers library and run them locally in C++. By the end, you'll be able to create your own fully offline AI solution, powered entirely on-device.

Level: Intermediate

WebRTC on Kubernetes - From Pods to Production

Description:
Running real-time media applications in the cloud is notoriously hard – especially when Kubernetes is involved. This full-day, hands-on workshop is designed to demystify the complexities of deploying WebRTC-based services in Kubernetes environments. We'll start from the ground up with a practical introduction to Kubernetes fundamentals and build toward a deep understanding of the networking challenges unique to real-time communication systems.

Level: Intermediate

Thursday, 12th September

2025 TALKS

From Super Bowl to Olympics: How CyanView Powers the World's Biggest Broadcasts with Elixir
Watch the talk

This talk will provide an overview of CyanView’s architecture and how Elixir is used throughout the system. We’ll explain how the codebase is organized and how builds are tailored for different device targets. The session will detail the number and variety of camera integrations we support, as well as the specific controls we are able to implement on those cameras. We'll walk through examples of how these integrations are handled in practice. Connectivity examples will also be covered, including setups such as RCP to camera, RCP to VP4, and RCP to RCP. Our use of MQTT as the main message broker will be discussed, along with custom extensions and the scale of event processing observed during the Olympic Games. We’ll then focus on our LiveView-based applications, including how we achieved reactive performance on devices with dual 650 MHz processors and how this approach is being used in other areas. The talk will conclude with a look at future plans, including streaming capabilities and cloud integration.

WhatsApp realtime calling, WebRTC, and how it's being used to drive important social impact programmes in global south countries.
Watch the talk

This talk explores how real-time calling technologies – specifically WhatsApp and WebRTC – are being used to support impactful social programmes in health across the Global South. It will cover why scalability, both technical and financial, is essential for meaningful impact, and examine the importance of security and privacy, especially in light of regional policy and governance considerations when deploying audio and video calling for health. The session will also touch on how AI is shaping conversational systems in this context, from both technical and policy standpoints, highlighting the opportunities and implementation challenges. Additionally, it will share why Elixir and the Elixir WebRTC project enabled seamless integration with low operational overhead, along with key technical learnings from the implementation process, including networking considerations.

A QUIC update on MOQ and WebTransport
Watch the talk

This talk introduces the audience to the latest developments in Media Over QUIC Transport, WARP Streaming format, CAT-4-MOQT and WebTransport. We'll examine the new features and capabilities being developed in these standards, their applicability to RTC applications and view some demos of the newest protocols in action.

Observability in WebRTC: Between Metrics and Meaning
Watch the talk

WebRTC statistics offer a powerful window into media performance—but collecting, analyzing, and interpreting them accurately is far from straightforward. In this talk, we’ll explore the real-world challenges of building observability, debugging tools, and issue detection systems using WebRTC stats. We’ll look at how key metrics behave in practice, the common pitfalls developers run into, and how these challenges surface in real-world environments. Drawing from lessons learned through open-source projects and hands-on experience with production systems, this session will highlight what’s proven useful, what hasn’t, and what to keep in mind as WebRTC continues to evolve.

From RTP Streams to AI Insights: Building Real-Time AI Pipelines with Juturna and Janus
Watch the talk

In this talk we present Juturna, a Python library for creating and managing parallel pipelines for real-time, AI-oriented data applications. While specifically conceived as a companion framework for the Janus WebRTC Server, and already used in production for providing real-time transcriptions of IETF meetings, Juturna quickly proved to be a flexible and generic component, suitable for a variety of customisable tasks. Juturna leverages the built-in Python multithreading library, so it can be used to instantiate multiple pipelines that consume different data sources simultaneously. The Juturna core framework is fully parallel, modular, and real-time oriented. Those characteristics make Juturna a very flexible and versatile tool in a scenario where audio and video streams from distributed sources have to be processed in a lively fashion according to a variety of heterogeneous AI tasks. Juturna offers native components explicitly designed for real-time applications. In particular, a set of RTP source nodes can be deployed to consume remote media streams and make them available to processing nodes. However, Juturna components are open-ended entities that can be designed to address any type of task. Because of this flexibility, nodes can easily incorporate ML and AI resources, and apply them to the streaming data they consume.

Video composition using the GPU: a look at Vulkan Video
Watch the talk

This talk will present the challenges and benefits of building a video composition pipeline using Vulkan Video to achieve a GPU-only workflow. After a quick introduction to Vulkan, we'll discuss current state of Vulkan Video. Then, we'll dive into an overview of the intersections between the video codec and Vulkan Video specifications. We'll also briefly talk about other approaches to using the GPU's (de/en)coding hardware with a rendering API, and the performance benefits of GPU-only workflows compared to more popular approaches to video composition, such as using Chromium.

Designing a media container library for the web
Watch the talk

The WebCodecs API is close to become cross browser supported with Safari being the last browser with missing audio support. But the API is only dealing with encoding and decoding media. However in practice encoded media is usually wrapped in some sort of container format and these containers come in a myriad of different flavors. Therefore it's no surprise that the GitHub issue discussing container support for WebCodecs was one of the oldest and most commented on. It was ultimately closed as out of scope. Luckily dealing with media containers is not a new problem and there are many long standing libraries which have been ported to JavaScript or can be compiled to Web Assembly. But most of these libraries have been written without thinking about tree shaking, lazy loading, or custom builds. This talk is about what can be done with existing libraries to minimize the bundle size or to load them on demand. I also want to present some ideas for a new container library explicitly designed for the web with extensibility and flexibility in mind.

Finding the perfect balance between easy and flexible audio Interface – Web Audio API: The Good, the Bad, and the Ugly
Watch the talk

Have you ever imagined programming your own song using nothing but code? In this talk, we’ll explore the world of the Web Audio API — a powerful tool for sound synthesis, manipulation, and real-time audio programming. I’ll walk you through what makes it so flexible, what makes it challenging, and what makes it downright weird. We'll look at its architecture under the layer of javascript interface, and what are the pitfalls that come with re-creating Web Audio API in environments other than browser.

How Low Can You Go? Running WebRTC on Low-Powered (and Cheap) Devices
Watch the talk

This talk is a fun and practical exploration of how far we can push WebRTC on low-powered and inexpensive hardware. From Raspberry Pis to unexpected embedded platforms, we’ll dive into the journey of getting real-time communication working on constrained devices. We'll explore how to make it work, what breaks, and how to hook up real hardware to make it all useful. Expect demos, pitfalls, hacks, and plenty of surprises along the way.

Triming Glass to Glass latency of a Video stream one layer at a time.
Watch the talk

Remote operation or supervision of Vehicles requires a consistent low latency video feed. This is especially difficult when using commodity components and public 5g networks. This talk is based on our experience gained with our 5g WebRTC camera at racetracks, test tracks and city streets around the world. I will describe and demonstrate our target latency, how we came up with it and how we measure it. I'll cover the various layers that contribute to latency and how we can reduce it in each layer, some of which are obvious and some were a surprise.

Secure Collaborative Cloud Application Sharing with WebRTC
Watch the talk

We will present our implementation of secure, collaborative cloud application sharing, using WebRTC. Our customers include many film and television studios who have the need to rapidly present and review high-resolution HDR assets from dozens of diverse applications. Because of their strict content security requirements, they don't want copies of the pre-release assets being downloaded and shared for review sessions, and they want tight forensic tracking over who views the assets. We designed a solution that maintains content security while allowing session hosts to hand off control of applications in a fluid, collaborative way. Our general cloud-containerization solution is used to present web-based content systems such as DAMs (digital asset management, like Frame.io) and ECMs (enterprise content management, like Box.com) as well as applications for project management, animation, and graphics. We’ll talk about how we implemented our cloud infrastructure, sidecar instances, and demonstrate how the application sharing can be used on destination platforms such as web, desktop, and Apple Vision Pro.

Challenges in Realtime livestreaming at 4k / 60FPS
Watch the talk

Technical challenges encountered while developing and scaling real-time livestreaming to 4K resolution at 60 frames per second for audiences exceeding 20,000 subscribers. Key issues addressed included optimizing bandwidth usage, minimizing latency, achieving robust infrastructure scalability, and ensuring consistent, high-quality video delivery under heavy load. Discussing practical solutions and innovative approaches to overcome these challenges, ultimately enhancing user experience in large-scale, real-time streaming scenarios.

AI assisted transcriptions in Jitsi Meet: our journey
Watch the talk

Jitsi Meet has had realtime transcriptions since around 2017. With the advent of gen AI technology we rebuilt our transcriber to leverage the state of the art tech. Not all the glitter is gold, however, and we also found ourserves building an "async" or "deferred" transcriber to optimize for cost and user expectations. In this presentation I'll go throughb our journey, weaving the tech and product aspects.

The Future in Focus: AI and the NExt Wave of Real-Time Video Intelligence
Watch the talk

In this session, we’ll explore how large language models (LLMs) can be integrated into real-time video workflows to detect critical conditions within live streams—from traffic monitoring and crowd congestion control to forest fire detection, firearms recognition, and content moderation for user-generated video. I’ll walk through real-world use cases and technical architecture, focusing on how Red5 Pro’s Brew API enables raw video frame extraction for analysis. We'll also look ahead at the evolving role of AI in real-time video systems and the broader implications for safety, automation, and platform integrity.

Where are WebRTC and telephony voice agents headed?
Watch the talk

From hacked together roots, the LLM based voice AI industry has gone from nowhere to quite a developed landscape of the the last two years. This talk is a developers eye view of the fundamentals of where we seem to be, and where we may be headed. Supported by tales from the trenches of large scale deployments and at least one live demo of new stuff that hasn't been seen before, it will attempt to demystify the architectural choices for building systems on mostly open source software.

TURNed inside out: a hacker’s view of your media relay
Watch the talk

TURN servers are a critical component of real-time communication infrastructure—but when misconfigured or overlooked, they can become a serious liability. In this talk, we’ll explore the security threat model of TURN deployments, including how attackers can leverage them for denial-of-service attacks, access to internal or cloud metadata services, and even to relay malicious traffic externally. We’ll share real-world examples of vulnerabilities encountered during security assessments and outline practical, effective hardening techniques. From network isolation and feature minimization to access control and DoS resilience, you’ll walk away with actionable guidance to ensure your TURN server is a secure and reliable part of your media architecture.

The Future of AI Is Distributed: Tradeoffs in Performance, Privacy, and Power
Watch the talk

LLMs, image or video generators, speech to text and text to speech - simply speaking AI, has seen a meteoric rise in general adoption in just the past few years. It is most commonly hosted on large cloud servers with significant computing power available. But is that really the way forward? With edge devices becoming increasingly capable and a growing focus in software development on creating new frameworks for edge AI, we believe the future of AI lies at the edge. In this talk, we will walk through the main trade-offs of each approach and demonstrate what is possible today!

Multimodal AI for Real-Time Creator Experiences
Watch the talk

Using AI in realtime poses a number of challenges. These include handling multimodal input from chat, audio, video, community settings, and web searches; addressing latency since live discussions move on quickly; and ensuring scalability and reliability. We have built an AI platform that combines these datatypes and has a plug-and-play model to leverage any LLM. We can optimize for latency, quality, and costs all depending on the requirements for each feature. Information gets distributed to viewers using web sockets as well as WebRTC data channels. We will present an overview of this and dig into some example features. Gigaverse is a live interactive platform that uses AI to support the creator in realtime during a live session. We combine data from voice, chat, video, community settings to assist the creator to run live sessions that can scale to thousands of participants. The platform provides a live summary of chat, highlights topics being discussed, identifies questions that are asked by the audience, and finds the most engaged viewers so they can be invited to join the session. It augments the discussion with extra information for the viewers and fact checks statements in realtime. Creators can engage their audience by running a poll just by using their voice. Moderation is flexible, allowing creators to set the tone themselves for the discussion.

Streaming Bad: Breaking Latency with MOQ
Watch the talk

Tired of buffering ruining your binge-worthy brilliance? In this talk, we’re cooking up some fresh, low-latency streams with Media over QUIC—no half-measures. Join us as we break bad with legacy protocols and show you how to stream like a Heisen-pro.

catch up the 2024 talks

youtube thumbnail for Paulayoutube thumbnail for Paulayoutube thumbnail for Mateuszyoutube thumbnail for Rob
watch the talks from 2024
arrow

rtc.on on social media

follow us on x
arrow

Organizers

Meet the team
Hey there!
It’s us – the folks behind RTC.ON. If you’ve been to one of our past conferences, chances are we’ve already crossed paths!We might be a small crew, but we give it our all to make RTC.ON something special. Still, you – the amazing devs who show up each year – are what truly make it great.

Thanks for being part of RTC.ON.
See you next September!
~ Karolina, Maciej, Mateusz & the Software Mansion team