Being towards death

AI Voice Cloning

Today we will take a look at a very impressive AI voice cloning tool called Speaking AI, which can convert text into natural-sounding speech similar to that of a real person. It also supports users to clone their own voices for free.

I. Introduction to Speaking AI

Speaking AI is a startup company founded by Harry Zheng, whose team members are also Chinese. The company was established with the belief that conversational speech synthesis represents the future interface between humans and artificial intelligence. They aim to make voice cloning sound more natural and bring about fundamental changes in the way humans interact with AI.

Speaking AI currently has two main features: text-to-speech and voice cloning. The voice cloning feature is particularly outstanding, achieving a close-to-human speaking effect.

Speaking AI currently supports text-to-speech conversion in Chinese and English, and offers 5 celebrity voice templates for users to choose from. This feature is currently free, but may require queuing when there are many users. Speaking AI supports users to clone their own voices and the voices of others. You can choose to record 10 seconds of audio online or upload a local audio file, and then perform real-time conversion, which is very convenient to operate. The emotions and intonation used during recording will affect the final synthesis effect, and Speaking AI's model will autonomously select appropriate emotional tones based on the text content.

Note: When uploading someone else's voice, you need to obtain their consent, and the synthesized voice cannot be used for any illegal, fraudulent, or harmful purposes.

According to the founder, Speaking AI is currently in the V1 model stage, which is more like a trial demo with limited performance. This situation will be improved in the coming weeks. The official is already developing the V2 model, which will support more languages, have faster voice cloning speed, and higher quality.

In addition to Speaking AI recommended today, there are currently many mature AI voice generation applications and open-source tools on the market.

For example, Elevenlabs, which has been receiving a lot of attention, can clone users' personal voices and synthesize new voices in just a few minutes. It supports converting text into 28 languages including Chinese and can perform interpretations with different emotions. It has recently launched a video automatic translation and dubbing function. Elevenlabs has a wide range of applications in translation, film/game dubbing, audiobook production, and chatbot conversations. Its text-to-speech function can be used for free, and you can experience it by registering an account on their official website.

NetEase Youdao's technical team in China open-sourced their self-developed speech synthesis (TTS) engine "EmotiVoice" in November. It currently supports both Chinese and English, with over 2000 different voices available. EmotiVoice also supports emotional synthesis, allowing the synthesis of voices with different feelings such as happiness, excitement, sadness, and anger. Their GitHub page provides a web interface and script interfaces for batch generation of results, and it can be used for free after installation.

I recently came across a new AI music synthesis tool called Musicfy AI online. It can convert hummed vocals into accompaniments of different types of instruments, which is very interesting. This may change the traditional music composition workflow, as people can easily create music using their voices. Those interested in music can give it a try.

