The situation with OpenAI and Scarlett Johansson is perfect for a movie script. Some time ago, OpenAI asked the actress to lend her characteristic voice to ChataGPT. Scarlett Johansson considered such cooperation, but ultimately rejected it. OpenAI used a voice that closely resembled Scarlett Johansson's anyway. The actress has taken legal action, and OpenAI is already withdrawing from the Sky sound.
Copying Scarlett Johansson's voice, or our cyberpunk reality
The history of the scandal between Scarlett Johansson and OpenAI should start with the 2013 film “Her”. In this film, the actress played the role of the voice of an artificial intelligence with which the hero played by Joaquin Phoenix fell in love. The film from over a decade ago very accurately predicted the development of artificial intelligence, because ChatGPT and the possibility of voice conversations with generative artificial intelligence is almost a reproduction of the film script. Sam Altman and the OpenAI team wanted ChatGPT-4o to be voiced by the film's Samantha, actress Scarlett Johansson. OpenAI contacted the actress and offered cooperation.
According to the actress's story, she initially considered such cooperation, but ultimately rejected it for “personal reasons”. All this happened in September 2023, long before ChatuGPT-4o was presented to the world. However, the story did not end with this refusal, because after the OpenAI Spring Update presentation, friends, family and Internet users said that Sky, the new voice of ChatuGPT-4o, sounds very similar to the actress.
OpenAI translations and further controversy related to the voice (r)evolution of LLMa from Altman syndrome
The whole situation is made even more interesting by the fact that right after the presentation of the latest voice version of ChatuGPT, Sam Altman wrote an enigmatic but at the same time unambiguous tweet saying “Her”. This was an obvious reference to the film and the fact that from now on users will be able to establish relationships similar to the 2013 film. Exaggerated claim? Well, the delay of the new version of ChatuGPT in audio responses is approximately 232-330 milliseconds, which is equal to a natural human conversation. It was clearly visible and audible during the presentation. ChatGPT-4o's voice is natural, and during the conversation you can hear not only advanced intonation, but also interludes in the form of laughter, hesitations, and slowdowns. It sounds so realistic that the association with a film from over a decade ago comes naturally.
The voice of Sky, the new voice version of ChatuGPT-4o, actually resembles that of Scarlett Johansson. Even a characteristic hoarseness is audible. This is how the actress commented on NBC News that Sky sounds exactly like her:
When I heard the public demo, I was shocked, angry, and couldn't believe that Mr. Altman had chosen to use a voice that sounded so eerily similar to mine that my closest friends and the media couldn't tell the difference.
It is worth mentioning that, according to the actress, two days after the OpenAI Spring Update, Sam Altman contacted her agent, asking her to reconsider the cooperation proposal. Scarlett Johansson's response was not only a denial, but also a letter from legal counsel. In this letter, the lawyer requested the head of OpenAI to explain and describe the process by which Sky's vote was obtained. On Monday, May 20, OpenAI published a statement explaining that Sky does not imitate Scarlett Johansson's voice. The voice of another actress, hired specifically for this task, was to be used to train the model. The company “out of respect for Ms. Johansson” decided to abandon the idea of using this particular Sky voice.
Future is now oldman, i.e. voice cloning is easier than ever before
The situation with Scarlett Johansson is just the tip of the iceberg of the problem of how easy it is to copy voices today. Tools such as Eleven Labs are able to copy any person's voice from a sample of just a few minutes. Many such modifications are already circulating on social media, and the legislation clearly lags behind the appropriate legal framework. Although Eleven Labs has provided special tools that allow you to determine with almost 100% certainty whether a given audio recording was created using their program, this fact definitely does not solve the problem. We live in times when copying someone's voice is easier than ever before and requires virtually no specialist knowledge. Just two clicks and you're done.