Abstract: While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show ...
Abstract: In this paper, we propose a deep learning (DL)-based task-driven spectrum prediction framework, named DeepSPred. The DeepSPred comprises a feature encoder and a task predictor, where the ...
Diffusion Speech is a diffusion-based text-to-speech model. Our speech synthesis pipeline is quite simple. We use a diffusion transformer model (DiT) to predict the duration of each phoneme. Then we ...
This repository contains the implementation of (MQGAN) for audio synthesis. The project is structured to facilitate the entire workflow from data preparation to model deployment.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results