[go: up one dir, main page]

Jamming with Udio

Today, I had my first jam session with Udio. With the introduction of audio prompting, I am now able to use my own sound as the starting point for a generated track. This seems like a leap forward, despite the actual product still being quite clunky. It’s a leap, because through introducing audio upload, Udio managed to merge casino creativity with the other, more traditional kind. Let me walk you through what I found.

As my first try, I fed Udio one of my old abandoned loops. As part of making music, I often arrive at dead ends: promising loops with which I can’t figure out what to do. I have a bunch.

Then, I extended this loop from both ends with Udio, producing a decent trance track in a matter of a few minutes. It’s not going to win any awards, but it’s definitely farther than I’ve been able to walk on my own.

Here’s the original loop that I made a while back:

Here’s the finished Udio track: https://www.udio.com/songs/usCmcABg3yP4aC1J8S5WCA

Extending existing audio clips fits well into the standard Udio process.

In the standard Udio process, we get 32 seconds of audio as a starting point, and then we iteratively extend this audio from either end to produce music that we like. Each iterative extension is an opportunity to make some choices – the casino creativity at its finest.

When uploading the audio prompt, the prompt becomes the first N seconds of the audio (the N depends on the length of the audio prompt we load).

Udio tries to match the prompt’s tempo and the style, expanding it, and in the process, riffing on it. It feels like extrusion: pushing more music through the template that I defined. As Udio expands the clip, it adds new details to what was in the original clip, trying to predict what might have been playing before or after that clip.

You can still hear the original loop in the finished track at 3:12. It is bookended by entirely new sound that now fits seamlessly around it. The music around it is something that Udio extruded, generating it using the original loop as a template.

The presence of the original loop hints at the connection between two kinds of creativity that I mentioned earlier. For instance, I could imagine myself sitting down with Ableton and building out a catchy loop, then shifting to Udio to help me imagine the track that would contain this loop. I could then go back to Ableton and use the results of our little jam session as inspiration.

As my next try, I did something slightly different. I gave a simple melody to Udio and then rolled the casino dice until I’ve gotten the right sound. At this point, Udio anchors on the audio prompt quite firmly, so if you give it a piano (like I did) and ask for a saxophone, it might take a few attempts to produce a rendition of the melody with a different instrument.

Here, I was looking to create something that sounds like a film score, so I was looking for strings. After a little while, Udio relented and gave me an extension that seemed right.

At that point, I trimmed the original clip from the Udio track. Now that Udio had learned my melody, I no longer needed the original material, since it didn’t fit with the vibe I was looking for. This removal of the original trick is something I expect to be pretty common. For instance, I could hum a melody or peck it single-note on my keyboard. My intuition is that Udio will (sooner or later) have a “remix” feature for audio prompts, where we can start with the sound of my whistling of a tune and then shape it directly, rather than waiting for the right extension to happen.

Once the right vibe was established, the rest of the process was quite entertaining. It was fun to watch Udio reimagine my original melody in minor for the “scary” part of the movie and boost it with drums and a full orchestra at the climax.

Here’s the original melody:

Here’s the finished track: https://www.udio.com/songs/eCMEFFSGnoicnHR4S5fRK1

In both cases, the process felt a lot more like a jam session than creative casino, because the final product included a distinct contribution from me. It wasn’t something that I just told Udio to do. I gave it raw material to riff on. And it did a pretty darned good job.

Casino Creativity

I’ve been geeking out on AI-generated music services Suno and Udio, and it’s been super-interesting to see them iterate and ship quickly. It looks like there might be a value niche in this particular neck of the woods in the larger generative AI space. There are tons of users with very active Discord communities for both, and it does not seem like the interest is waning.

The overall arc of the generative music story seems to follow that of Midjourney, with the interest primarily fueled by the phenomenon that I would like to name “casino creativity”. Let’s see if I can define what I mean by that.

I would like to start by positing that the craving to create is in every one of us. Some of us are more blessed than others in also having skills to satisfy this craving. Moreso, I am going to proclaim that most of us are unable to fully embrace our creative selves because we lack some of the skills required to take flight.

For instance, I can make music. I have been making music since I was a teenager. For me, satisfying my craving for creativity is just a matter of firing up Ableton. When I am skilled in the medium, the friction to create is low. All it takes is being next to my keyboard (and Push), a little inspiration – and a track begins to emerge.

However, I can’t sing. Like, not at all. Like, don’t even ask me. In the music school, when testing out for the choir or orchestra, I was asked to sing. After me belting out a few words (not even a full verse!), the teacher yelled: “The Orchestra! The Orchestra!”

Being a music producer without a voice is a story of unrequited love. I have to settle for tracks without lyrics. The instrumentals are nice, but it’s just not the same feeling without a voice.

So obviously, ever since the current generative AI spring blossomed, I’ve been on a quest to find a way to sate this creative craving. I played with Melodyne and  Synth V, and while they both offered a path forward, the barrier to entry was just too high. Gaining a voice is not the same as knowing how to sing. It’s about the same distance between being able to buy a violin and knowing how to play one.

Things started shifting with Chirp. This was the original model created by Suno, and it was Discord-only, very similar to Midjourney – feed it the lyrics alongside a description of the vibe, and out comes a 30-second clip of music. Not just music – it also sang out the lyrics I gave it!

Brain-splosion. Sort of. The output quality of Chirp was pretty weak-sauce. It was not the music I could share with anyone except for minor giggles and an eye roll. I forgot about Chirp for a little while, until this spring Suno came out with the v3 of their sound model. I heard about it from Alex, whose work colleagues composed various songs to celebrate his last day at Stripe.

Ok, now we were getting somewhere. Songs generated with Suno v3 possessed that extra emotional weight that made them nearly passable as listenable music. When Udio came out shortly with their own model, it upped the barrier even more. I was blown away by some of the output. Just like that, my age of voiceless musicing was over. I could type in some lyrics and get back something that expressed it back to me as music.

Every generation took only about a minute and produced two variants for me to pick from. I could choose the one I like and extend it or remix it – or roll the dice again. All it takes is a click.

It’s this metaphorical rolling of the dice that gives the name to the titular term. As I was pushing the “Create 🎶” button, I realized that the anticipation of the output had a pronounced dopamine hit. What will come out? Will it be something like Duran Duran? Or maybe more like Bono? Will it go in a completely different direction? Gimme gimme gimme. I was hooked on Suno.

Casino creativity is a form of creative expression that emerges when the creative environment has such a low barrier to entry that the main way to express my creativity is through providing preference: selecting one choice out of a few offered. A creative casino is a place where all I need to bring is my money and my vibes: everything else will be provided. 

Midjourney is one of the first environments where I experienced casino creativity. There’s something subtly addictive about looking for that prompt and seeing those 4-up images that pop out. I know peeps who can spend a very long time tweaking and tuning their inputs. We could argue that prompt craftsmanship itself is a skill that must be acquired. But this skill has a short expiration date – as the models improve and change, the need for prompt-foo diminishes rapidly.

At the end, what we’re left with is pressing the button and making choices. Casino creativity is less about the skill and more about the vibes.

Not to say that casino creativity isn’t able to produce interesting – and perhaps even beautiful – things. Vibes are important – and some of us have more latent vibes hidden within us that we could ever realize. Ultimately, casino creativity is very similar in spirit to the democratization of writing that we’d seen with the Web. I am not yet ready to proclaim that casino creativity is somehow less intriguing and full of potential than any other type of creativity. Just like my Midjourney-obsessed friends, I can see how unleashing one’s creative energy might lead to surprising and wonderful results.

Here’s a twist though. As long as I have the credits to roll the dice, I can see if my vibes work for others. Both Suno and Udio are vying to be the place where music happens. I can look at what’s popular and peruse the top charts. It’s all very naive and simplistic at the moment. 

Yet, when executed ruthlessly (and it’s inevitable that somebody will do this), the creative casino is not just the place where I can express my creativity. It’s also the place where I can get the extra dopamine release of seeing my song climb the charts – of my vibes becoming recognized. Come for the vibes, stay for the likes.

An interesting effect of introducing generative AI, it seems, is that we’re likely to see more creative casinos and more ventures capitalizing on casino creativity itself. And we have to ponder the implications of that.