Let’s start by saying that this article is not an endpoint. It is a step in our ongoing research into developing responsible AI applications for artists. One of these artists is Eveline Ypma, with whom we are organizing a live performance on April 4 in OT301. Together with Eveline, we are investigating the potential of text-to-music AI technology will share our findings in this article.
Eveline created an EP that combines sampled field recordings from the nature of Iceland with her own vocals and bass guitar. The result is a harmonious 15-minute soundscape. Our challenge was to extend and translate this EP into a 30/45-minute live performance, using generative AI. Together, we decided to experiment with AI tools that can generate similar-sounding field recordings and sound effects that Eveline could use to extend her live performance.
How did we start?
Our goal was to generate new audio files (10-20 seconds) that sounded similar to her own Icelandic music samples. To do so we started by looking into different ways to generate new music with AI. What AI models are already out there? Which existing tools can we test? And how do we make sure that the technology providers do not take Eveline's data?
First, we conducted a series of experiments with existing AI models. Inspired by Dadabots and their infinite stream of AI generated death metal, we started working with SampleRNN models. This is an audio-to-audio model where you upload a music file and get similar music files in return. Unfortunately, we were not quite happy with the results because the output was too noisy. As well, the process was very time consuming and very complex.
We moved onto Stable Diffusion’s algorithm called Dance Diffusion. This is also an audio-to-audio system that allows you to create audio samples that sound like your input files. Unfortunately, like the previous model, this model also produced a lot of noise and was very glitchy.
Our aim was to look for off-the-shelf AI models that we could immediately use to create a workflow for Eveline – without having to train our own customized AI model. But unfortunately, this turned out to be more difficult than expected. That's why we decided to change course and look at ready-made AI tools.
First, we tried Stable Diffusion’s text-to-music application called Stable Audio, which creates audio files based on text prompts. A ChatGPT for music. For the first time, we produced AI-generated output that indeed sounded like a usable music sample. Still, we could not really use the output: the terms of use prevented us from continuing to use the tool.
We also tried Meta’s MusicGen and AudioGen, as similar prompt based AI model that allows you to generate audio files and music files. As long as you have a Gmail account, anyone can use these models in a Google Collab environment. MusicGen provided us with the best results so far. It generated high-quality audio samples that we could work with right away. Unfortunately, this system had similar terms of use.
Terms of use
In our opinion, the terms of use of too many generative AI music tools are misleading. Although most product websites tell you that you maintain full ownership of your input and output, it often becomes clear that you also "sublicense" your work to the AI platform – once you dive into their legal documentation. Technically, you always remain the owner of your input and output. But you also give ownership to someone else.
In the case of Eveline Ypma, this is problematic. Eveline is an artist and she should own the rights to her own creative work. That is why we eventually decided to download the underlying MusicGen AI model from Github and create a local version on a private server ourselves. This is possible because Meta published the code open-source via Github under an MIT License.
The Open Culture Tech "text-to-music" app
At this moment, we are working together with a front-end developer to build our own text-to-music application on top of the MusicGen AI model. Our goal is to host the underlying AI model on a European server and make sure that we don't safe the user's input and output data. In this way, anyone can use the AI technology for free – without having to give away their creative work.
We plan to launch this app on April 4 in OT301.