15.1K Stars! Fish Speech 1.5 is officially launched! The world's leading multilingual TTS tool! Can be locally deployed and "tuned".

Dec 6, 2024#AI392

AI Translation

This post is translated from Chinese into English through AI.View Original

AI-generated summary

FishSpeech is a TTS (text-to-speech) tool developed by the FishAudio team, known for its expertise in AI voice cloning. Key features include: - **Zero-shot & Few-shot TTS**: Generates high-quality speech from just 10-30 seconds of audio samples, ideal for voice cloning. - **Phoneme-independent Generalization**: Can handle any language representation without phoneme dependency, broadening application scenarios. - **High Accuracy**: Achieves around 2% character and word error rates on 5 minutes of English text. - **User-friendly Interfaces**: Offers a web UI compatible with major browsers and a PyQt6 GUI for seamless API integration. - **Easy Deployment**: Supports quick deployment both locally and in the cloud, minimizing speed loss for developers. For more information, visit the official website or GitHub page.

Project Introduction

FishSpeech is a TTS voice generation tool developed by the FishAudio team, which, along with ChatTTS, is one of the super popular open-source TTS projects from the same period (June-July 2024). Speaking of its team members, they are various SVC experts on GitHub, the pioneers of AI voice cloning.

Main Features

• Zero-shot & Few-shot TTS: Only 10-30 seconds of voice samples are needed to generate high-quality speech, perfectly supporting voice cloning needs.
• Strong generalization capability without phoneme dependency: The Fish Speech model is phoneme-independent and can easily handle any language represented in text, making TTS application scenarios more extensive.
• Ultra-high accuracy: On 5 minutes of English text, the character error rate (CER) and word error rate (WER) are only about 2%.
• User-friendly multi-interface support:
• WebUI: A web user interface based on Gradio, compatible with mainstream browsers (Chrome, Firefox, Edge).
• GUI Inference: Provides a PyQt6 graphical interface that works seamlessly with the API server.
• Easy deployment: Supports quick deployment both locally and in the cloud, minimizing speed loss and providing great convenience for developers.

Official Homepage: https://fish.audio

GitHub Project Address: https://github.com/fishaudio/fish-speech

HF Demo: https://huggingface.co/spaces/fishaudio/fish-speech-1