Back to BlogMusic Technology

From Click Tracks to Rubato: The Hidden Architecture of Musical Time

By Lance GlasserMay 14, 20268 min read

Every song you’ve ever loved lives on a spectrum. On one end: rigid, metronomic precision. On the other: the kind of expressive timing freedom that makes a Chopin nocturne feel like it’s breathing. Understanding where different music falls on that spectrum—and why—turns out to be fundamental to the challenge of making music together remotely.

So let’s walk through it. From the click to the rubato, with a few stops along the way.

The Click Track: Music’s Invisible Backbone

A click track is essentially a high-tech metronome. It’s a series of audio cues—usually a sharp “click” or beep—that marks the tempo of a song. The audience never hears it. Musicians get it piped into their headphones (in the studio) or their in-ear monitors (on stage), and it keeps everyone locked to the same beats per minute from start to finish.

Why bother? Because humans naturally speed up during exciting choruses and slow down during sad verses. That’s beautiful in a live performance, but it creates real problems in the studio. If a drummer records their part on Monday and a guitarist overdubs on Friday, that click track ensures they’re playing at the exact same speed. It’s also what makes modern digital editing possible—when a song lives “on the grid,” you can copy-paste a chorus or fix a small timing error without the music sounding disjointed.

And it isn’t always literally a “click.” Most musicians customize the sound so it cuts through the mix in their headphones. Some use a cowbell tone (great for punching through heavy guitar frequencies). Others prefer a high-pitched sine wave on the downbeat with a lower tone on the other beats. For complex orchestral scores, you might even hear a recorded voice counting “one, two, three, four...” to help players track time signature changes.

Here’s the debate, though: some purists argue that click tracks kill the groove. Led Zeppelin and The Beatles rarely used them, which is partly why their songs “breathe”—they speed up and slow down in ways that feel organic and alive. But for most professional work today, the click is non-negotiable. It’s the invisible backbone.

The Backing Track: Your Virtual Band

If the click track is the skeleton, the backing track is the rest of the body. A backing track contains actual musical content—orchestral strings, synthesizer pads, layered harmonies, percussion loops—that fills in whatever a live performer or small band can’t (or doesn’t want to) reproduce on stage.

These two are best friends: for a backing track to work, the performers must follow a click track. If the drummer drifts even a fraction of a second, they fall out of sync with the pre-recorded audio. Musical train wreck. This is why you see modern drummers wearing substantial headphones during live shows—they’re locked into the click to stay “on the grid” with the backing audio.

Is it “cheating”? This is a heated debate in the music world. The pro-track camp calls it an artistic tool, no different from high-end lighting or pyrotechnics. The purist camp feels it diminishes the liveness of a performance. Reality check: almost every major pop tour relies heavily on sophisticated backing tracks. And there are different flavors—a karaoke track (the whole song minus the lead vocal), individual stems that a sound engineer can mix live, or subtle enhancement tracks that just thicken the sound of the live instruments.

Speaking of which—

Quick Detour: What’s a Stem?

A stem is a single audio file containing a “sub-mix” of multiple related tracks. If a song is a meal, the individual tracks (one mic on the snare, one on the kick drum) are the raw ingredients. Stems are the completed components—the sauce, the protein—already prepared and ready to plate.

Most songs break down into four standard stems: drums and percussion, bass, instruments (guitars, keyboards, synths), and vocals. A modern pop song might have over a hundred individual tracks, but it usually reduces to four to eight stems. This is the “middle ground” of control that makes collaboration practical—a drummer can mute the drum stem and play along to everything else, a DJ can grab just the vocal stem for a remix, and a film editor can pull down the vocal stem during dialogue while keeping the instrumental bed.

The hierarchy is simple: individual tracks get mixed into stems, stems get mixed into the final stereo master.

Rubato: The Click Track’s Rebellious Cousin

Now here’s where it gets interesting. If the click track is a rigid robot, rubato is its romantic, slightly rebellious cousin.

Rubato—Italian for “stolen”—is a technique where the performer subtly speeds up and slows down the tempo for emotional effect. The time “stolen” from one part of a phrase is usually “paid back” by slowing down elsewhere, so the overall length stays roughly the same. Think of the difference between a GPS voice giving directions (perfectly rhythmic) and a great actor delivering a monologue (speeding up with excitement, pausing for drama).

Three-week-old baby photographed on a Very Hungry Caterpillar quilt with a sunburst Les Paul guitar collaged into her arms, mid-wail
Rosemetal, 3 weeks old. She has opinions about tempo.

There are two flavors. The classical approach, popularized by Mozart and Chopin, keeps the accompaniment at a steady tempo while the melody floats and wavers above it—like a tightrope walker above a steady platform. The modern approach lets the entire ensemble breathe together, with the structure of the beat itself expanding and contracting.

Here’s the rub: click tracks are the enemy of true rubato. If you’re locked to a click, you cannot steal or give back time. And backing tracks are equally rigid—if a singer decides to take a long, dramatic pause, the backing track keeps right on playing without them. This is exactly why orchestras have conductors. The conductor can feel a soloist’s rubato and bring the rest of the ensemble along in real-time.

Want to hear rubato in action? Listen to almost any Chopin nocturne—the melodies float just slightly off-beat from the left hand. Or the opening of Queen’s “Bohemian Rhapsody,” which is wonderfully fluid before the ballad section settles in. Or Billie Holiday singing behind the beat on a jazz ballad—that’s vocal rubato, and it’s what makes her unmistakable.

The golden rule: rubato should feel like a heartbeat. It speeds up when you’re excited and slows down when you’re resting, but it never stops being a rhythm. Overdo it, and the music just sounds messy.

The Spectrum: From Grid-Locked to Free-Floating

Here’s a rough ordering of musical genres from most metronomically rigid to most temporally flexible:

Locked to the grid: EDM, techno, and house music are literally quantized to a digital grid. Hip-hop and trap use programmed beats with nearly zero deviation. Drum corps and marching bands make precision the whole point.

Tight but human: Bluegrass keeps very tight time with its driving banjo and mandolin chop. Funk grooves are tight and repetitive by design, though the subtle micro-timing swing matters enormously. Pop and rock studio recordings are click-tracked.

Moderate flexibility: Jazz swing has a steady pulse but with deliberate micro-timing. Blues is looser than funk, with expressive timing pulls. Country generally stays steady but allows more rubato than bluegrass. Folk and singer-songwriter music lets tempo follow the vocalist’s phrasing.

Rubato-rich: Classical orchestral music—conductors shape tempo constantly with accelerandos, ritardandos, and fermatas. Art song stretches time for expression. Opera lets dramatic pacing override metronomic consistency. And choral music—with its breath phrases, fermatas, and conductor interpretation—creates wide timing variation. At the far end, Gregorian chant has no fixed meter at all; rhythm follows the text.

“Jack & Diane”: A Case Study in Beautiful Imprecision

John Mellencamp’s “Jack & Diane” is a perfect example of how these concepts play out in a real recording.

The song opens with rhythmic freedom—Mellencamp lingers on acoustic guitar chords and stretches the vocal line: “Little ditty... about Jack and Diane...” He’s not singing to a strict beat. He’s stealing time for storytelling, in the folk busker tradition. If he’d been fighting a click track during this section, it would have killed the vibe.

Then there’s the famous drum part. Kenny Aronoff’s drumming on this track was notoriously difficult to record because Mellencamp wanted a specific “human” feel—not mechanical, not sloppy, but alive. Even when the beat is steady, it has a push-pull quality, like it’s breathing. And listen to how Mellencamp sings the chorus: “Oh yeah, life goes on...” He often starts phrases slightly late and rushes to catch up by the end of the line. That’s micro-rubato, and it gives the song its laid-back, front-porch-in-Indiana character.

Here’s the thing: this was 1982. Digital grids didn’t really exist. If you loaded “Jack & Diane” into modern recording software today, the beats wouldn’t align perfectly with the vertical grid lines. It drifts, because the humans were playing with feeling rather than mathematical precision.

Why This Matters for Remote Music

Here’s where I’ll put on my Lyrekos hat for a moment.

This entire spectrum—from click-locked EDM to free-floating choral music—is the landscape we’re navigating when we try to get people singing together across the internet. Real-time collaboration tools like JackTrip and Jamulus work best at the rigid end of the spectrum, where everyone can lock to the same beat despite network latency. But the music that most needs remote collaboration tools—choral music, worship music, ensemble vocal performance—lives at the flexible end, where a conductor’s breath and a shared glance normally hold everything together.

That tension is exactly what led us to our approach at Lyrekos. Instead of trying to make the internet fast enough for real-time sync (a physics problem you can’t win—see why Zoom fails for music for the full story), we use backing tracks as a shared temporal reference. Each singer performs with the same track, and our system aligns everything automatically—with enough precision to satisfy even the rubato-rich world of choral performance.

The backing track, it turns out, isn’t just a convenience. It’s the bridge between the click track’s precision and the conductor’s expressiveness. The rubato is baked into the track itself—every fermata, every ritardando, every breath phrase. Singers follow it naturally, the way they’d follow a conductor. And we handle the rest. If you want to see how that plays out across all of Lyrekos’ operating modes, take a look at Lyrekos’ flexible building blocks.

From click tracks to rubato, the architecture of musical time is more nuanced than most people realize. And understanding that nuance is what makes building tools for musicians so endlessly fascinating.

Lance Glasser

Lance Glasser

Lance is CEO and Co-founder of Kinetic Audio Innovations. He was previously a faculty member at MIT, Director of Electronics Technology at DARPA, and CTO at KLA. He also makes sculpture, which has nothing to do with audio but explains the hundreds of pounds of bronze in his house.

Sing in Sync, Anywhere

Join the Lyrekos waitlist and be among the first to perform together across distance — with all the rubato intact.

Join the Waitlist