Abstract: While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show ...
Abstract: The integration of electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) can facilitate the advancement of brain-computer interfaces (BCIs). However, existing ...
Diffusion Speech is a diffusion-based text-to-speech model. Our speech synthesis pipeline is quite simple. We use a diffusion transformer model (DiT) to predict the duration of each phoneme. Then we ...
This repository contains the implementation of (MQGAN) for audio synthesis. The project is structured to facilitate the entire workflow from data preparation to model deployment.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results