The delay to OpenAI's ChatGPT's impressive voice mode has many AI chatbot enthusiasts worried, but they may now have scoped out. French artificial intelligence developer Kyutai has introduced a real-time voice AI assistant called Moshi.
Moshi is designed to provide lifelong conversations with users via voice, similar to Alexa or Google Assistant, but it differs from the larger language models powered by ChatGPT and its competitors, in this case, the Helium 7B model. is reinforced. According to Kyotai, Moshi can speak in different accents and has 70 different emotes and speech patterns. The AI can also handle two audio streams simultaneously, allowing Moshi to listen and talk simultaneously.
The development of Kiyotai's Moshi included fine-tuning over 100,000 simulated conversations created using text-to-speech (TTS) technology. His purpose was to help teach Moshi the nuances and nuances of human communication. The brand even collaborated with a professional voice artist to enhance Moshi's sound quality.
This AI assistant integrates both text and audio training, suitable for multiple backends, meaning it can run on devices such as laptops without interacting with the cloud. The company touts this as a way to maintain privacy and security by preventing the transmission of sensitive data over the Internet. You can see a demo of Moshi here.
Open Talk
Kyutai announced that Moshi will be an open source project, including the model's codes and framework, which will provide the basis for further innovation. An open-source approach could also help alleviate the complaints that big AI companies are dealing with about their closed models regarding safety and ethics. Kyotai's backers, including French billionaire Xavier Niel, are pushing the open-source approach.
Kyutai is also working on AI audio recognition, watermarking, and signature tracking systems that will be included in Moshi. These features will help promote identification, accountability and traceability of AI-generated audio while ensuring that AI-generated content can be monitored and verified.
Moshi is still in development, but the presentation of the sound is impressive. The voice approach could serve as a catalyst for other voice-enabled versions of ChatGPT competitors or accelerate the addition of LLMs to Alexa and other voice assistants if Moshi catches on and becomes popular.
If you want to try Moshi, a demo is available online, and you can sign up for early access to the full chatbot there too.