Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz ...
Not all platforms support the same features. For instance Tensor Cores acceleration isn't supported on WebGPU yet. Using an instruction that isn't available on a ...