How to record multitrack hybrid podcasts

Managing remote and studio recordings is relatively straightforward - but things get tricky when you try to mix the two

These days, the actual process of recording a conversation is arguably one of the least challenging parts of launching a new podcast. Recording everyone sitting round a table is pretty easy once you make the right hardware choices, and while recording remotely can be more of a pain for editing, once you’ve got a virtual studio setup with Riverside or Zencastr working, it can run reasonably smoothly.

Where things start to get difficult, however, is where the two intersect. If you have multiple people in a studio trying to talk to one or more remote participants, getting high-quality audio that’s easy and flexible to edit is a challenge. 

If all you’re doing is dialling in one guest or a co-host who happens to be away one week, that’s usually decently easy – and we’ll cover that too – but as soon as there’s more than one person at either end of a hybrid recording, it can get decidedly tricky.

If it all seems a bit too difficult, you might be minded to fall back to recording multiple remote participants on a single stereo mix, such as by recording a Skype call, or indeed doing an entire recording as a single stereo track. Avoid this temptation, however, as you’ll have far fewer opportunities to clean up individual voices or unpick awkwardness and crosstalk than if you recorded each voice on its own track. In a fully hybrid recording solution, you’re aiming for all participants’ voices to be recorded separately.

That syncing feeling

A key challenge when dealing with hybrid recordings is keeping everything in sync. Of course, if you have three XLR mics attached to a recording desk in a physical location and you start and stop the recording on that, your three participants’ tracks look like this in simplified form:

And you might think that if you have two people in a studio and one joining remotely, the tracks might look something like this. Maybe your remote co-host pressed record a little after you did, and stopped a little later too, but basically, you’re expecting this:

But it’s possible the tracks actually come out looking a little like this, even if in theory the studio and the remote host pressed record and stop at precisely the same time.

Now, we’ve exaggerated the problem so you can see it, but what you’re looking at here is a phenomenon called audio drift. Even if you’re trying to keep things as similar as possible on both ends, such as by standardising on a frequency and bit-depth, because computers are imperfect at keeping time, recordings made on different computers can actually appear to be slightly longer or shorter than each other. That doesn’t sound like a significant issue until you realise that what it means is some voices, over time, gradually slide out of sync with the others.

This is often not noticeable on short recordings, but the longer you record, the more apparent it can be. If you’re recording multitrack, you can resolve this in the edit – manually trimming out the silences when your out-of-sync voice isn’t speaking, and then sliding the clips when they are back into place – but it can be a lot of tedious work.

Recording hybrid podcasts with one remote participant

The simplest hybrid recording setups are where there are multiple hosts around a table and just one remote participant – usually a guest.

The easiest way to achieve this is to use a dedicated recording desk such as the RØDECaster Pro or PodTrak P4, which have inputs to hook up a phone or computer as an external source, as well as outputs for all in-person guests to monitor the audio via headphones. This way, the mics connected up using XLR are each recorded onto their own track, and so is your remote guest, and everything will stay perfectly in sync (albeit with a slight delay to your caller).

The problem with this, though, is that even if you’re making that call to the remote guest over a high quality VoIP line such as FaceTime or Skype rather than a plain old phone line, not only are they less likely to be using a high quality mic, but what you’ll record is the sound of their voice after it’s been transmitted over the internet.

That’s potentially an issue if the connection dips or gets saturated - which will manifest as stutters, glitches or audio artifacting - but even if it doesn’t, it’s still a problem because the audio is compressed as part of the transmission process, so the quality is reduced.

Honestly, though, this is still a completely valid solution for a single remote guest, especially if it makes it easy for you to get people on the show, or if you want the vibe of a regular co-host joining you from somewhere exciting, say.

Recording fully hybrid podcasts

The best way to achieve true multitrack recordings when you have both local participants in a studio and remote guests dialling in virtually is by pairing a multitrack recorder in the studio with a virtual studio platform such as Riverside or Zencastr. You’ll record the studio voices on the hardware recorder, and the remote guests at each of their locations (so you avoid the issue of internet compression and bandwidth) using the virtual studio platform, and then marry them all up in the edit.

Audio drift is still a potential issue, but the virtual studio platform should help keep the remote participants in sync, and at the very least give you a reference for the studio tracks to make lining up simpler.

For this to work best of all, your studio recording desk should also be able to interface directly with the virtual studio on your PC or Mac, allowing remote participants to hear the local audio as recorded, rather than falling back on a laptop mic. This is useful for anyone in a producer role, but also gives you a sync track to help align the hardware and virtual studio recordings.

Here’s what that might look like. In this scenario, we have four participants: two hosts gathered around a table, with their hardware recorder acting as an interface to pipe their audio into (and the other participants’ audio out of) their computer. Hosts one and two get recorded onto the hardware by one of them pressing the record button.

We also have a third co-host joining remotely over a virtual studio platform, as well as a guest, and that virtual studio platform gets recorded when one of the hosts clicks the record button there.

There are five tracks here rather than four because there were technically three participants in the virtual studio: co-host three, the guest, plus co-hosts one and two who joined together from the same location – that’s track 5.

You can see that the two hardware-recorded tracks (1 and 2) start and stop at the same time as each other, of course, and the three virtual studio recordings likewise. But in this example at least, the virtual studio was started a little before the hardware recorder, and stopped a little later.

The important thing to keep straight in your head is which tracks are in sync with each other when you come to add them to your DAW. If you dropped them in and lined up the start of them all, then the hardware-recorded tracks (1 and 2) would be out of sync (too early) from where they should be. This is where you can use track 5, the mix of hosts one and two that was recorded through the virtual studio; select tracks 1 and 2 together and move them so that the waveforms in track 5 align visually. Once you’ve confirmed there’s minimal or no drift between the different tracks (scroll to near the end and check), you can discard track 5.

Why not just use track 5 alongside 3 and 4 in our example? That would require less manual alignment, but because hosts 1 and 2 are mixed down onto a single track, you can’t treat them separately. That mightn’t be a huge problem if you get levels and processing perfectly dialled-in in your recording desk, but even then it will give you less flexibility, such as fewer options for reducing unwanted sounds.

It should go without saying, but: everyone should always be wearing headphones, ideally big, closed-back, over-ear cans.

‘Double-ender’ recording

Before virtual studio platforms existed, podcasters who needed to record remotely or with a hybrid setup would simply have everyone record themselves in their location and then sync everything up later, and this is still an option that can work well if, for example, you have a remote participant with a high-quality recorder and microphone, but no way to connect it to a PC. 

To give yourself fewer headaches when working with ‘double-ender’ recordings, ensure everyone is using the same recording settings: 24-bit, 48KHz is a good common ground, but the specific settings you standardise on matter less, in the context of this discussion at least, than the fact that they’re all the same.

A common technique is to do a ‘sync clap’ when you’ve all pressed record – literally just one of you counting down and then everyone clapping their hands – so that you have a spike to line everything up to, and again at the end so you can check audio drift.

But since these days you’ll often be joining each other over a voice or video call on your computer anyway, using a proper virtual studio can take some of the pain out of this process.

Whatever option you choose, if you get the hardware, studio setup, editing and - most importantly of all - the vibes right, there’s no reason your audience should ever be able to guess that you’re not all sitting around the same table. And since the best podcasts make the listener feel like they’re sitting at that table themselves, that can only be a good thing.