Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Supplemental Examples Page


  1.1.2. The Effect of Tstart Used

# Source Prompt Target Prompt Original Audio Tstart=110 Tstart=100 Tstart=90 Tstart=80 Tstart=70
1 A recording of a happy upbeat song in a Latin jazz style. A recording of a happy upbeat song in a retro arcade game soundtrack style.
2 A recording of a funky jazz song.
3 Trumpets playing alongside a piano, bass and drums in an upbeat old-timey cool jazz song. A banjo playing alongside a piano, bass and drums in an upbeat old-timey cool country song.


  3. Comparison of Unsupervised Editing Directions With Random Directions

# Type Inversion Prompt Edited Audios -γPC Original Audio Edited Audios +γPC PC Interpretation Edit Parameters
1 Random A high quality recording of a man singing with a rock band accompaniment.
γ = -12

γ = -8

γ = -2

γ = 2

γ = 8

γ = 12
t'∈[200, -1]
Specific t=80 used
PC #1
Ours A high quality recording of a man singing with a rock band accompaniment.
γ = -3

γ = -2

γ = -1

γ = 1

γ = 2

γ = 3
Drum-beat style t'∈[200, -1]
Specific t=80 used
PC #1
3 Random
γ = -240

γ = -120

γ = -40

γ = 40

γ = 120

γ = 240
t'∈[115, 95]

PC #1
Ours
γ = -60

γ = -40

γ = -20

γ = 20

γ = 40

γ = 60
Isolate Woman/Man t'∈[115, 95]

PC #1
5 Random A recording of an old timey rock song from the sixties.
γ = -12

γ = -8

γ = -2

γ = 2

γ = 8

γ = 12
t'∈[200, -1]
Specific t=65 used
PCs 1+2+3
Ours A recording of an old timey rock song from the sixties.
γ = -2

γ = -1

γ = -0.5

γ = 0.5

γ = 1

γ = 2
Guitar/Singer emphasis t'∈[200, -1]
Specific t=65 used
PCs 1+2+3