Paper
PRVAE-VC2+: Improving PRVAE-VC2 Training With Time-Invariant and Time-Variant Data Augmentations
Shoma Kanno, Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Yuto Kondo, Toshie Matsui
Audio Samples
Clean Input
-->
Source speech | |||
---|---|---|---|
Target speech (X) |
Baseline vs. Proposed | |||
---|---|---|---|
B | |||
P |
Ablation study1 | |||
---|---|---|---|
A1 | |||
A2 | |||
A3 | |||
A4 | |||
A5 | |||
A6 |
Ablation study2 | |||
---|---|---|---|
BF | |||
PF |
Noisy Input
-->
Target speech |
---|
SNR=0 dB | |||
---|---|---|---|
Input | |||
P | |||
A4 |
SNR=5 dB | |||
---|---|---|---|
Input | |||
P | |||
A4 |
SNR=10 dB | |||
---|---|---|---|
Input | |||
P | |||
A4 |
SNR=20 dB | |||
---|---|---|---|
Input | |||
P | |||
A4 |