Paper
PRVAE-VC2+: Improving PRVAE-VC2 Training With Time-Invariant and Time-Variant Data Augmentations
Shoma Kanno, Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Yuto Kondo, Toshie Matsui
Audio Samples
Clean Input
-->
| Source speech | |||
|---|---|---|---|
| Target speech (X) |
| Baseline vs. Proposed | |||
|---|---|---|---|
| B | |||
| P | |||
| Ablation study1 | |||
|---|---|---|---|
| A1 | |||
| A2 | |||
| A3 | |||
| A4 | |||
| A5 | |||
| A6 | |||
| Ablation study2 | |||
|---|---|---|---|
| BF | |||
| PF | |||
Noisy Input
-->
| Target speech |
|---|
| SNR=0 dB | |||
|---|---|---|---|
| Input | |||
| P | |||
| A4 | |||
| SNR=5 dB | |||
|---|---|---|---|
| Input | |||
| P | |||
| A4 | |||
| SNR=10 dB | |||
|---|---|---|---|
| Input | |||
| P | |||
| A4 | |||
| SNR=20 dB | |||
|---|---|---|---|
| Input | |||
| P | |||
| A4 | |||