WO2022023130A1 - Multiple percussive sources separation for remixing. - Google Patents

Multiple percussive sources separation for remixing. Download PDF

Info

Publication number
WO2022023130A1
WO2022023130A1 PCT/EP2021/070306 EP2021070306W WO2022023130A1 WO 2022023130 A1 WO2022023130 A1 WO 2022023130A1 EP 2021070306 W EP2021070306 W EP 2021070306W WO 2022023130 A1 WO2022023130 A1 WO 2022023130A1
Authority
WO
WIPO (PCT)
Prior art keywords
percussive
separation
audio
separations
electronic device
Prior art date
Application number
PCT/EP2021/070306
Other languages
French (fr)
Inventor
Thomas Kemp
Giorgio FABBRO
Marc FERRAS FONT
Falk-Martin HOFFMANN
Stefan Uhlich
Original Assignee
Sony Group Corporation
Sony Europe B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Group Corporation, Sony Europe B.V. filed Critical Sony Group Corporation
Publication of WO2022023130A1 publication Critical patent/WO2022023130A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/40Rhythm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • G10H1/0025Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/051Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/056Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/071Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for rhythm pattern analysis or rhythm style recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/101Music Composition or musical creation; Tools or processes therefor
    • G10H2210/125Medley, i.e. linking parts of different musical pieces in one single piece, e.g. sound collage, DJ mix
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/025Envelope processing of music signals in, e.g. time domain, transform domain or cepstrum domain
    • G10H2250/035Crossfade, i.e. time domain amplitude envelope control of the transition between musical sounds or melodies, obtained for musical purposes, e.g. for ADSR tone generation, articulations, medley, remix
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/311Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation

Definitions

  • the present disclosure generally pertains to the field of audio processing, and in particular, to de- vices, methods and computer programs for audio enhancement.
  • audio content is already mixed from original audio source signals, e.g. for a mono or ste- reo setting, without keeping original audio source signals from the original audio sources which have been used for production of the audio content.
  • the disclosure provides an electronic device comprising circuitry configured to perform audio source separation on an audio input signal to extract one or more per- cussive separations and to perform rhythmic enhancement on the one or more percussive separa- tions to obtain at least one enhanced percussive separation.
  • the disclosure provides a method comprising performing audio source separation on an audio input signal to extract one or more percussive separations; and performing rhythmic enhancement on the one or more percussive separations to obtain at least one enhanced percussive separation.
  • the disclosure provides a computer program comprising instructions, the instructions when executed on a processor causing the processor to perform audio source separation on an audio input signal to extract one or more percussive separations and to perform rhythmic en- hancement on the one or more percussive separations to obtain at least one enhanced percussive separation.
  • Fig. 1 schematically shows a general approach of audio upmixing/ remixing by means of blind source separation (BSS), such as music source separation (MSS);
  • BSS blind source separation
  • MSS music source separation
  • Fig. 2 schematically shows a process of audio signal enhancement based on audio source separation and rhythm amplification
  • Fig. 3 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification
  • Fig. 4 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification
  • Fig. 5 schematically describes in more detail an embodiment of the beat detection process per- formed in the process of audio enhancement
  • Fig. 6 visualizes the beat detection process described in Fig. 5;
  • Fig. 7 schematically shows a representation of the beat frequency obtained from beat detection as a beat line
  • Fig. 8 schematically describes in more detail an embodiment of the rhythm pattern estimation pro- cess performed in the process of audio enhancement described in Fig. 3;
  • Fig. 9a schematically shows a representation of the rhythm pattern estimation described in Fig. 8.
  • Fig. 9b schematically shows a representation of the snare drums estimated rhythm pattern obtained by the rhythm pattern estimation described in Fig. 8;
  • Fig. 10 schematically shows a representation of the beat frequency, expressed in beats per minute, in an 16 ⁇ note quantization pattern
  • Fig. 11a shows a rhythm pattern of a rock music song on an 8 th note quantization pattern represen- tation
  • Fig.l lb shows a rhythm pattern of a rock music song on an 8* note quantization pattern representa- tion
  • Fig. 12 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification
  • Fig. 13 shows in more detail an embodiment of a process of a rhythm similarity estimation process performed as described in Fig. 12;
  • Fig. 14 shows a flow diagram visualizing a method for signal mixing related to audio signal enhance- ment based on source separation and rhythm amplification to obtain an enhanced audio signal
  • Fig. 15 schematically describes an embodiment of an electronic device that can implement the pro- Waits of audio enhancement based on rhythm amplification and music source separation.
  • remixing upmixing, and downmixing can refer to the overall process of generating output audio content on the basis of separated audio source signals originating from mixed input audio content
  • mixing can refer to the mixing of the separated audio source signals.
  • the “mixing” of the separated audio source signals can result in a “remixing”, “upmixing” or “downmixing” of the mixed audio sources of the input audio content.
  • the embodiments disclose an electronic device comprising circuitry configured to perform audio source separation on an audio input signal to extract one or more percussive separations and to per- form rhythmic enhancement on the one or more percussive separations to obtain at least one en- hanced percussive separation.
  • the electronic device may for example be any multi-purpose mobile computing device such as smartphones, wearable devices in the form of a wristwatch, e.g. smartwatch or the like. DMX con- troller, or the like.
  • the circuitry of the electronic device may include a processor, may for example be CPU, a memory (RAM, ROM or the like), a memory and/ or storage, interfaces, etc.
  • Circuitry may comprise or may be connected with input means (mouse, keyboard, camera, etc.), output means (display (e.g. liquid crystal, (organic) light emitting diode, etc.)), loudspeakers, etc., a (wireless) interface, etc., as it is gen- erally known for electronic devices (computers, smartphones, etc.).
  • circuitry may com- prise or may be connected with sensors for sensing still images or video image data (image sensor, camera sensor, video sensor, etc.), for sensing environmental parameters (e.g. radar, humidity, light, temperature), etc.
  • Audio source separation an input signal comprising a number of sources (e.g. instruments, voices, or the like) is decomposed into separations.
  • Audio source separation may be unsupervised (called “blind source separation”, BSS) or partly supervised. “Blind” means that the blind source separation does not necessarily have information about the original sources. For example, it may not necessarily know how many sources the original signal contained, or which sound information of the input sig- nal belong to which original source.
  • the aim of blind source separation is to decompose the original signal separations without knowing the separations before.
  • a blind source separation unit may use any of the blind source separation techniques known to the skilled person.
  • source signals may be searched that are minimally correlated or maximally independent in a probabilistic or information-theoretic sense or on the basis of a non-negative matrix factorization structural constraints on the audio source signals can be found.
  • Methods for performing (blind) source separation are known to the skilled person and are based on, for example, principal compo- nents analysis, singular value decomposition, (independent component analysis, non-negative ma- trix factorization, artificial neural networks, etc.
  • the present disclosure is not limited to embodiments where no further information is used for the separation of the audio source signals, but in some embodiments, further information is used for generation of separated audio source signals.
  • further information can be, for example, in- formation about the mixing process, information about the type of audio sources included in the in- put audio content, information about a spatial position of audio sources included in the input audio content, etc.
  • the input signal can be an audio signal of any type. It can be in the form of analog signals, digital signals, it can origin from a compact disk, digital video disk, or the like, it can be a data file, such as a wave file, mp3-file or the like, and the present disclosure is not limited to a specific format of the input audio content.
  • An input audio content may for example be a stereo audio signal having a first channel input audio signal and a second channel input audio signal, without that the present disclo- sure is limited to input audio contents with two audio channels.
  • the input audio content may include any number of channels, such as remixing of an 5.1 audio signal or the like.
  • the input signal may comprise one or more source signals.
  • the input signal may com- prise several audio sources.
  • An audio source can be any entity, which produces sound waves, for ex- ample, music instruments, voice, vocals, artificial generated sound, e.g. origin form a synthesizer, etc.
  • the input audio content may represent or include mixed audio sources, which means that the sound information is not separately available for all audio sources of the input audio content, but that the sound information for different audio sources, e.g. at least partially overlaps or is mixed.
  • the circuitry may be configured to perform rhythmic enhancement of at least one of the separations included in the audio input signal.
  • the rhythmic enhancement may be performed by estimating a rhythm pattern of one or more separations and selectively amplifying at least one of the separations by a gain factor, or the like.
  • the circuitry may be configured to perform the remixing or upmixing based on at least one en- hanced separated source and based on other separated sources obtained by the blind source separa- tion to obtain the remixed or upmixed signal.
  • the remixing or upmixing may be configured to perform remixing or upmixing of the separated sources, here percussive separations and non-per- cussive separations to produce a remixed or upmixed enhanced signal, which may be sent to the loudspeaker system.
  • the circuitry of the electronic device may for example be configured to amplify at least one of the percussive separations by a gain factor to obtain the at least one enhanced percussive separation.
  • the gain factor may control the contribution of the one or more percussive separations, to the mix, that is, one percussive separation may be enhanced, while another percussive separation may be un- changed in the mix, or the like, without limiting the present disclosure to that result.
  • the gain factor may not be selected by a binary system, instead the gain factor may be selected by a weighting system that sets the value of the gain factor in an arbitrary way.
  • the at least one percussive separation which may contain the main beat of a music piece, may for example be amplified by a static gain factor, or the gain factor may be analyzed in real-time, while the music is playing and may be dynamically adapted, or the like.
  • a gain factor to be applied to the percussive separation that has beats in a certain speed range that do not vary over time may be higher than the gain factor to be applied to the other percussive separations, without limiting the present disclosure in that regard.
  • the circuitry of the electronic device may for example be configured to perform rhythm pattern es- timation on the percussive separations to obtain estimated rhythm patterns.
  • the rhythm pattern esti- mation may be performed, at the beginning of a song, on the percussive separations contained in the audio signal.
  • a beat may be extracted using an a-priori percussive instrument, or the like.
  • the circuitry of the electronic device may for example be configured to perform rhythm pattern se- lection based on the rhythm patterns to obtain the at least one enhanced percussive separation.
  • the rhythm pattern of at least one of the percussive separations may be compared between several or all the rest percussive separations. The percussive separation with the strongest and easiest pattern may be selected and may be used for an accurate rhythm estimation.
  • the circuitry of the electronic device may for example be configured to perform a spectral analysis of a percussive separation to estimate a rhythm pattern.
  • a rhythm pattern may be esti- mated using a classical energy-based rhythm extractor, or the like.
  • the circuitry of the electronic device may for example be configured to perform beat detection on the audio input signal to obtain a beat frequency and to perform the rhythm pattern estimation based on a percussive separation and the beat frequency.
  • the gain factors may control the contribution of the one or more percussive separations, to the mix, that is, one percussive separation may be enhanced, while another percussive separation may be un- changed in the mix, or the like, without limiting the present disclosure to that result.
  • the gain factors may not be selected by a binary system, instead the gain factors may be selected by a weighting system that sets the values of the gain factors in an arbitrary way.
  • the circuitry of the electronic device may for example be configured to determine by a rhythm simi- larity estimation a similarity between a percussive separation and a beat signal to obtain at least one enhanced percussive separation.
  • the circuitry of the electronic device may for example be configured to perform similarity-to-gain mapping based on the similarity to obtain gain factors used for obtaining the at least one enhanced percussive separation.
  • the circuitry of the electronic device may for example be configured to perform a spectral compari- son between a percussive separation and a beat signal to obtain at least one enhanced percussive separation.
  • the circuitry of the electronic device may for example be configured to perform the source separa- tion on the audio input signal to obtain one or more percussive separations and one or more non- percussive separations and to perform mixing of the enhanced separated source with the one or more non-percussive separations to obtain an enhanced audio signal.
  • the enhanced au- dio signal may be obtained by amplifying the desirable percussive separations, without deleting the rest separations, e.g. the singing voice, or the like.
  • the non-percussive separations may include me- lodic and harmonic parts.
  • the non-percussive separation may include the remaining sources of the audio input signal, apart from the percussive separations, that is to include e.g. vocals, guitar, bass, or the like.
  • the one or more percussive separations comprise drums.
  • the circuitry of the electronic device may for example comprise a microphone to acquire the audio input signal.
  • the circuitry of the electronic device may for example comprise a loudspeaker system to output the enhanced audio signal.
  • the loudspeaker system may be any loudspeaker system, such as for example a Bluetooth in-ear earphone, or the like.
  • the enhanced audio signal may be provided it as live (in real-time) playback to a user, in which the enhanced percussive separations may be mixed in time sync with the rest separations, or the like.
  • the circuitry of the electronic device may for example comprise a vibration motor to output vibra- tions based on a signal of the enhanced separation.
  • a secondary notification signal such as a vibration from a cellphone, may be generated.
  • the vibrator of a smartphone that a user wears in his pocket may be controlled from a beat signal derived from a percussive separation, without limit- ing the present disclosure in that regard.
  • the embodiments also disclose a method comprising performing audio source separation on an au- dio input signal to extract one or more percussive separations and performing rhythmic enhance- ment on the one or more percussive separations to obtain at least one enhanced percussive separation.
  • the embodiments also disclose a computer program comprising instructions, the instructions when executed on a processor causing the processor to perform the processes disclosed here.
  • Fig. 1 schematically shows a general approach of audio upmixing/ remixing by means of blind source separation (BSS), such as music source separation (MSS).
  • BSS blind source separation
  • MSS music source separation
  • source separation also called “demixing” which decomposes a source audio sig- nal 1 comprising multiple channels I and audio from multiple audio sources Source 1, Source 2, ...
  • Source K e.g. instruments, voice, etc.
  • source estimates 2a-2d for each channel i, wherein K is an integer number and denotes the number of audio sources.
  • a residual signal 3 (r(n)) is generated in addition to the separated audio source signals 2a-2d.
  • the residual signal may for example represent a difference between the input audio content and the sum of all separated audio source signals.
  • the audio signal emitted by each audio source is repre- sented in the input audio content 1 by its respective recorded sound waves.
  • a spatial information for the audio sources is typically included or represented by the input audio con- tent, e.g. by the proportion of the audio source signal included in the different audio channels.
  • the separation of the input audio content 1 into separated audio source signals 2a-2d and a residual 3 is performed based on blind source separation or other techniques which are able to separate audio sources.
  • the separations 2a-2d and the possible residual 3 are remixed and rendered to a new loudspeaker signal 4, here a signal comprising five channels 4a-4e, namely a 5.0 channel system.
  • a new loudspeaker signal here a signal comprising five channels 4a-4e, namely a 5.0 channel system.
  • an output audio content is gen- erated by mixing the separated audio source signals and the residual signal taking into account spatial information.
  • the output audio content is exemplary illustrated and denoted with reference number 4 in Fig. 1.
  • the number of audio channels of the input audio content is referred to as M in and the number of audio channels of the output audio content is referred to as M out .
  • M out 5 channels 4a-4e
  • the approach in Fig. 1 is gen- erally referred to as remixing, and in particular as upmixing if M in ⁇ M out .
  • remixing and in particular as upmixing if M in ⁇ M out .
  • Audio signal enhancement based on source separation and rhythm amplification Fig. 2 schematically shows a process of audio signal enhancement based on audio source separation and rhythm amplification. The process allows to perform audio enhancement using source separa- tion and rhythm amplification by combining (online) audio source separation with audio gain ampli- fication.
  • the audio input signal x(n) is decomposed into percussive separations, here drums, bass drums s BD (n), snare drums s SD (n), rest drums separation s RD (n), and into non-percussive separations, here non-percussive instruments S NP (n) , which include melodic and harmonic parts.
  • the non-percussive separation s NP (n) includes the remaining sources of the audio input signal, apart from the percussive separations, e.g. vocals, guitar, bass, or the like.
  • Each of the percussive separations here the bass drums s BD (n), the snare drums s SD (n), and the rest drums separation s RD (n ) are amplified 102 by a gain factor, which is set statically, to obtain an enhanced percussive separation, here an enhanced bass drums e BD (n), an en- hanced snare drums e SD (n), and an enhanced rest drums separation e RD (n).
  • the bass drums s BD (n ) are amplified by a gain factor equal to +5 dB
  • the snare drums s SD (n ) are amplified by a gain factor equal to +3 dB
  • the rest drums separation S RD (n) are amplified by a gain factor equal to +1 dB.
  • a mixer 103 mixes the enhanced bass drums s' BD (n ), the enhanced snare drums s' SD (n), and the enhanced rest drums separation s' RD (n ) to the non-percussive sepa- rations s NP (n ) to obtain an enhanced audio signal x ' (n).
  • the enhanced audio signal x'(n) is output to a loudspeaker system 104.
  • all the above described processes namely the music source separation 101, and the gain factor amplification 102 can be performed in real-time, e.g. “online” with some latency.
  • they could be directly run on the smartphone, smartwatch of the user/in his headphones, Bluetooth device, or the like.
  • the music source separation 101 process may for example be implemented as described in more de- tail in published paper Uhlich, Stefan, et al. "Improving music source separation based on deep neu- ral networks through data augmentation and network blending.”
  • IMSSP International Conference on Acoustics, Speech and Signal Processing
  • Fig, 3 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification.
  • the process allows to perform music personaliza- tion using source separation and rhythm amplification by combining (online) audio source separa- tion with rhythm pattern estimation.
  • the audio input signal x(n) is decomposed into percussive separations, here drums, bass drums s BD (n), snare drums s SD (n), rest drums separation s RD (n ), and into non-percussive separations, here non-percussive instruments s WP (n), which include melodic and harmonic parts.
  • the non-percussive separation s NP (n) includes the remaining sources of the audio input signal, apart from the percussive separations, e.g. vocals, guitar, bass, or the like.
  • a rhythm pattern estimation 202 process is performed on each of the per- cussive separations, here the bass drums s BD (n), the snare drums s SD (n ), and the rest drums sepa- ration s RD (n), to obtain an estimated rhythm pattern RP[], here a bass drums estimated rhythm pattern RP BB [ ], a snare drums estimated rhythm pattern RP SD [ ], and a rest drums separation esti- mated rhythm pattern RP RB [ ].
  • An embodiment of the rhythm pattern estimation 202 process is de- scribed in more detail with regard to Fig. 8below.
  • a rhythm pattern selection 203 process is performed on the bass drums estimated rhythm pattern RP BD [ ], the snare drums estimated rhythm pattern RP B u [ L and the rest drums estimated rhythm pattern RP RD [ ], to obtain gain factors g(n).
  • a separation enhancement 204 process is performed on the bass drums s BD (n), the snare drums S SD (n), and the rest drums separation s RD (n), based on the gain factors to obtain an enhanced drums separation s' D (n).
  • a mixer 103 mixes the enhanced drums separation s’ D (n ) to the non-per- cussive separations s NP (n) to obtain an enhanced audio signal x’(n).
  • the enhanced audio signal x'(n) is output to a loudspeaker system 104.
  • the audio input signal x(n) is decomposed into percussive separations, here bass drums s BD (n ), snare drums s BD (n), rest drums separation s RD (n ) (e.g.
  • non-percussive instruments s NP which include melodic and harmonic parts.
  • the non-percussive separation s NP (n) includes the remaining sources of the audio input signal, apart from the percussive separations, e.g. vocals, guitar, bass, or the like.
  • a beat detection 201 pro- cess is performed on the audio input signal x(n) to obtain a beat frequency ⁇ . An embodiment of the beat detection 201 process is described in more detail with regard to Figs. 6 and 7 below.
  • a rhythm pattern estimation 202 process is performed on each of the percussive separations, here the bass drums s BD (n ), the snare drums s SD (n), and the rest drums separation s RD (n), based on the beat frequency w, to obtain a percussive separation estimated rhythm pattern RP[ ], here a bass drums estimated rhythm pattern RP BB [ ], a snare drums estimated rhythm pattern RP SD [ ] , and a rest drums separation estimated rhythm pattern RP RB [ ].
  • An embodiment of the rhythm pattern es- timation 202 process is described in more detail with regard to Figs. 8 to 9b below.
  • a rhythm pat- tern selection 203 process is performed on the bass drums estimated rhythm pattern RP BD [ ], the snare drums estimated rhythm pattern RP SD [ ] , and the rest drums estimated rhythm pattern RP RD [ ], to obtain gain factors g(n).
  • a separation enhancement 204 process is performed on the bass drums s BD (n), the snare drums s SD (n), and the rest drams separation s RD (n ), based on the gain factors to obtain an enhanced drums separation s' D (n).
  • a mixer 103 mixes the enhanced drums separation s' D (n) to the non-percussive separations s NP (n ) to obtain an enhanced audio sig- nal x'(n).
  • the enhanced audio signal x'(n) is output to a loudspeaker system 104.
  • all the above described processes namely the music source separation 101, the beat detection 201, the rhythm pattern estimation 202, the rhythm pattern selection 203 and the sep- aration enhancement 204 can be performed in real-time, e.g. “online”. For example, they could be directly run on the smartphone, smartwatch of the user/in his headphones, Bluetooth device, or the like.
  • the music source separation 101 process may for example be implemented as described in more de- tail in published paper Uhlich, Stefan, et al. "Improving music source separation based on deep neu- ral networks through data augmentation and network blending.” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017.
  • There also exist programming toolkits for performing blind source separation such as Open-Unmix, DEMUCS, Spleeter, Asteriod, or the like which allow the skilled person to perform a source separation process as described in Fig. 1 above.
  • the rhythm pattern estimation 202 process may for example be implemented as described in more detail in published papers Ilya Shmulevich, et al. “Perceptual Issues in Music Pattern Recognition: Complexity of Rhythm and Key Finding” February 23, 1999, Sankalp Gulati, Preeti Rao “Rhythm Pattern Representations for Tempo Detection in Music” Proc. of the First International Conference on Intelligent Interactive Technologies and Multimedia, Dec, 2010, Elabad, India, and Makarand Velankar and Dr. Parag Kulkarni “Pattern recognition for computational music” December 2017.
  • Fig. 5 schematically describes in more detail an embodiment of the beat detection process per- formed in the process of audio enhancement described in Fig. 4 above, in which beat detection is performed on the audio input signal to obtain a beat frequency.
  • a beat detection 201 process is performed on the audio input signal x(n), to obtain a beat frequency ⁇ .
  • a process of windowing 401 is performed on audio input x(n), to obtain windowed audio x n (i).
  • a process of total energy determination 402 is performed on the windowed audio x n (i ) to obtain a signal energy curve of the audio x n (i).
  • a Fast Fourier Transform (FFT) spectrum 403 is applied to the windowed audio x n (i) to obtain the FFT spectrum x ⁇ (n) , also known as power spectral density.
  • a spectral analysis 404 is performed on the FFT spec- trum x ⁇ (n) to obtain a beat frequency ⁇ . The spectral analysis 404 looks for maximum in the FFT spectrum x ⁇ (n).
  • the signal energy E Xn of each windowed audio x n (i) can be ob- tained by where x n (i) is the windowed audio.
  • the energy curve E x (n ) of each windowed audio is converted into a respective short-term power spectrum.
  • the sum can be applied on the energy curve of the entire windowed audio or only on parts of the energy curve of the windowed audio, suitably chosen by the skilled person.
  • the maximum in the FFT power spectrum x( ⁇ ) are detected in order to obtain the beat frequency ⁇ .
  • Fig. 6 visualizes the beat detection 201 process described in Fig. 5 above.
  • the upper part of Fig. 6 shows an audio signal, here the audio input x(n) being input in the beat detection 201, that com- prises several bars of length T.
  • a first bar starts at time instance 0 and ends at time instance T.
  • a second bar subsequent to the first bar starts at time instance T and ends at time instance 2T.
  • a third bar subsequent to the second bar starts at time instance 2T and ends at time instance 3T.
  • the audio signal x(n) represents percussive sounds.
  • Each bar comprises several percussive sounds, namely a bass drum BD at the start of each bar (first beat of a bar, expressed in eights -notes quantization), a snare drum SD in the middle of each bar (fifth beat of a bar, expressed in eights-notes quantization), and a hi hat HH playing an eights-notes rhythm (second, third, fourth, sixth, seventh, and eights beat of a bar, expressed in eights-notes quantization).
  • the middle part of Fig. 6 shows the power spectrum obtained by the FFT spectrum determination 403 (see Fig. 5), in which the y-axis of the power spectrum represents the spectral density x( ⁇ ) of the audio input x(n) and the x-axis of the power spectrum represents the frequency ⁇ expressed in beats per minute (bpm).
  • the power spectrum shows four peaks, a first peak at 20 bpm, a second beak at 40 bpm, a third peak at 80 bpm and fourth peak at 160 bpm.
  • the speed of the rhythm in the audio input x(n) can be deter- mined.
  • one may for example relate the third peak of the power spectrum of Fig. 6 to an eights-note beat, so that the fourth peak relates to sixteenth notes, the second peak re- lates to fourth notes, and so on.
  • Fig. 6 shows the output of the beat detection 201, that is the beat frequency (i) ex- pressed in beats per minute (bpm), here 80bpm, as obtained by the spectral analysis 404 described above.
  • the beat detection 201 process may also for example be implemented by analyzing audio spectro- grams such as described in more detail in published papers Jonathan Foote, Shingo Uchihashi “The Beat Spectrum: a New Approach To Rhythm Analysis” IEEE International Conference on Multi- media & Expo 2001, IEEE (2001).
  • Fig. 7 schematically shows a representation of the beat frequency obtained from beat detection as a beat line b(n),
  • Fig. 8 schematically describes in more detail an embodiment of the rhythm pattern estimation pro- cess performed in the process of audio enhancement described in Fig. 4 above, in which rhythm pattern estimation is performed on the drums separations to obtain estimated rhythm patterns for each drums separation.
  • a rhythm pattern estimation 202 process is performed on the drums separa- tions, here the snare drums s SD (n), to obtain an estimated rhythm patterns for each drums separa- tion, here a snare drums estimated rhythm pattern RP SD [ ] ⁇
  • a process of windowing 401 is performed on snare drums s SD (n), to obtain windowed snare drums S SDn (i).
  • a process of total energy 402 determination is performed on the windowed snare drums S SDn (i) to obtain an en- ergy curve E SD of the snare drums s SD (n ).
  • An analysis of the total energy 403 is performed on the energy curve E SD of each windowed snare drums S SDn (i) to obtain a snare drums estimated rhythm pattern RP SD [ ] ⁇
  • a windowed percussive separation such as the windowed snare drums where S SD ( n + i) represents the discretized snare drums signal (i representing the sample number and thus time) shifted by n samples, /l(i) is a framing function around time n (respectively sample n), like for example the hamming function, which is well-known to the skilled person.
  • the energy curve E SD of each windowed snare drums S SD (n) can be obtained by where S SDn (i) is the windowed snare drums.
  • Fig. 9a schematically shows a representation of the rhythm pattern estimation, described in Fig. 8 above.
  • Fig. 9a shows a percussive signal, here the snare drums S SD ( n ) as obtained by the music source separation 101 described in Fig. 1.
  • This percussive signal s SD (n ) is input in the rhythm pattern estimation 202, that comprises several bars of length T, wherein each bar comprises a snare drums sound SN in the middle of each bar (fifth beat of a bar, expressed in eights-notes quantiza- tion).
  • a first beat starts at time instance 0 and ends at time instance T.
  • a second beat subsequent to the first beat starts at time instance T and ends at time instance 2T.
  • a third beat subsequent to the second beat starts at time instance 2T and ends at time instance 3T.
  • the middle part of Fig. 9a shows the output of the total energy determination 402 (see Fig.8), that is, the energy curve E SD (n) of the snares drums s SD (n).
  • the energy curve E SD (n) of the snares drums s SD (n ) is presented on the beat line b(n), where b(n) represents the position of the beats on the time line expressed in sample numbers n.
  • Fig. 9a shows the energy thresholding performed in the analysis of total energy 403 (see Fig. 8).
  • the energy thresholding which is performed on the isolated snare drums, filters the beats with much louder or much softer energy signal in the beat line b(n).
  • the snare drums estimated rhythm pattern RP SD [ ] is obtained as described in Fig. 9b below.
  • Fig. 9b schematically shows a representation of the snare drums estimated rhythm pattern RP SD [ ] obtained by the rhythm pattern estimation, described in Fig. 8 above.
  • Fig. 9b shows the snare drums estimated rhythm pattern RP SD [ ] of the audio sig- nal x(n), obtained after the energy thresholding (see lower part of Fig. 9a) performed in the analysis of total energy 403 (see Fig. 8).
  • Fig. 10 schematically shows a representation of the beat frequency, expressed in beats per minute, in an 16* note quantization pattern.
  • the beat frequency ⁇ obtained by beat detection 201 (see Fig. 5) and an 16 th note quantization pattern (see upper part of Fig. 10) are represented by a beat pattern, e.g. an 16* note beat pattern array PP[1...16 ] (see middle part of Fig. 10), which is used to obtain the estimated rhythm pattern RP[ ], here the bass drum rhythm pattern RP BD [ ] ⁇
  • Fig. 11a shows a rhythm pattern of a rock music song on an 8* note quantization pattern represen- tation.
  • Fig. lib shows a rhythm pattern of a rock music song on an 8* note quantization pattern represen- tation.
  • Fig, 12 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification.
  • the process allows to perform audio enhance- ment using source separation and rhythm amplification by combining (online) audio source separa- tion with rhythm similarity estimation and beat detection.
  • the audio input signal x(n ) is decomposed into percussive separations, here drums, bass drums s BD (n), snare drums S SD (ft), rest drums separation s RD (n ), and into non-percussive separations, here non-percussive instruments s NP (n), which include melodic and harmonic parts.
  • the non-percussive separation s NP (n ) includes the remaining sources of the audio input signal, apart from the percussive separations, e.g. vocals, guitar, bass, or the like.
  • a beat detection 201 process is performed on the audio input signal x(n ) to obtain a beat frequency 0).
  • a rhythm similarity estimation 900 process is performed on each of the percussive separations, here the bass drums s BD (n), the snare drums s SD (n), and the rest drums separation s RD (n ), based on a beat signal b(n) obtained from the beat frequency 0), to obtain a rhythm similarity, here a bass drums rhythm similarity, a snare drums rhythm similarity, and a rest drums rhythm similarity.
  • An embodiment of the rhythm similarity estimation 900 process is described in more detail with regard to Fig. 13 below.
  • a similarity- to-gain mapping 901 process is performed on the bass drums rhythm similarity, the snare drums rhythm similarity, and the rest drums rhythm similarity, to obtain gain factors g(n ).
  • a separation enhancement 204 process is per- formed on the bass drums s BD (n), the snare drums S SD (ft), and the rest drums separation s RD (n ), based on the gain factor to obtain an enhanced drums separation s' D (n).
  • a mixer 103 mixes the en- hanced drums separation s' D (n) to the non-percussive separations s NP (n ) to obtain an enhanced audio signal x'(n ).
  • the enhanced audio signal x'(n) is output to a loudspeaker system 104.
  • the similarity-to-gain mapping 901 maps the rhythm similarity with a gain value.
  • the gain factors can control the contribution of the percussive separations, to the mix, that is, the snare drums may be enhanced by +5dB, the bass drums may be unchanged in the mix, or the like.
  • the gain factors may not be a binary system, i.e. the bass drums may be enhanced by +1.3dB, the snare drums may be enhanced by +1.1dB, and the rest drums may be enhanced by +0.5dB, depending on the online result of the rhythm similarity estimation 900.
  • Fig. 13 shows in more detail an embodiment of a process of a rhythm similarity estimation process performed as described in Fig. 12 above.
  • a rhythm similarity estimation 900 process is performed on each of the bass drums s BD (n), the snare drums s SD (n), and the rest drums separation s RD (n ), based on a beat signal b(n) obtained from the beat frequency ⁇ , to obtain a similarity.
  • a rhythm similarity estimation 900 process is performed on the snare drums s SD (n) obtained by the music source separation 101 (see Fig. 12), to obtain a snare drums similarity.
  • a FFT spectrum 902 is performed on both the beat signal bin) and the snare drums S SD (n) signal to obtain a magnitude of the short-term power spectrum
  • a process of an energy spectrum comparator 903 is performed on both power spectrums
  • the magnitude of the short-term power spectrum may be obtained by where x n (i) is the signal in the windowed audio, as defined above, ⁇ are the frequencies in the fre- quency domain,
  • Fig. 14 shows a flow diagram visualizing a method for signal mixing related to audio signal enhance- ment based on source separation and rhythm amplification to obtain an enhanced audio signal.
  • the music source separation receives an audio input signal x(n) (see stereo file 1 in Fig. 1).
  • music source separation is per- formed based on the received audio input signal x(n ) to obtain percussive separations, here bass drums s BD (n), snare drums s SD (n), rest drums separation s RD (n) , and non-percussive separations s Np (n).
  • beat detection (see 201 in Figs. 4, and 12) is performed on the received audio in- put signal to obtain a detected beat signal b(n).
  • rhythm pattern estimation (see 202 in Figs. 3 and 4) is performed on the percussive separations based on the beat frequency to obtain per- cussive separations estimated rhythm patterns.
  • rhythm pattern selection (see 203 in Fig. 4) is performed based on the percussive separations estimated rhythm patterns to obtain gain factors g(n).
  • separation enhancement (see 204 in Figs. 2, 4, and 12) is performed on the percus- sive separations based on the gain factors g(n) to obtain enhanced drums separation s' D (n).
  • the enhanced drums separation s' D (n) is performed to obtain an enhanced audio signal x'(n).
  • the enhanced audio signal x'(n ) is output to a loudspeaker system (see 104 in Figs. 2, 4, and 10), such as a loudspeaker system of a smartphone, of a smartwatch, of a Bluetooth, or the like.
  • a flow diagram visualizing a method for signal mixing using beat de- tection and rhythm pattern estimation is described, however, the present disclosure is not limited to the method steps described above.
  • the beat detection process, rhythm pattern estima- tion process and rhythm pattern selection process may be omitted, and instead of the separation enhancement process (see 204 in Figs. 3, 4, and 12) a static gain amplification process (see 102 in Fig. 2) can be performed, or the like.
  • Fig. 15 schematically describes an embodiment of an electronic device that can implement the pro- Cons of audio enhancement based on rhythm amplification and music source separation, as de- scribed above.
  • the electronic device 1200 comprises a CPU 1201 as processor.
  • the electronic device 1200 further comprises a microphone array 1210, a loudspeaker array 1211 and a convolu- tional neural network unit 1220 that are connected to the processor 1201.
  • the processor 1201 may for example implement a gain amplification 102, a separation enhancement 204 that realize the pro- Cons described with regard to Figs. 2, Fig. 3, Fig. 4, and Fig. 12 in more detail.
  • the CNN 1220 may for example be an artificial neural network in hardware, e.g.
  • the CNN 1220 may for example implement a source separation 101, a beat detection 201, a rhythm pattern estima- tion 202, a rhythm pattern selection 203 that realize the processes described with regard to Figs. 4and Fig. 12 in more detail.
  • Loudspeaker array 1211 consists of one or more loudspeakers that are distributed over a predefined space and is configured to render any kind of audio, such as 3D audio.
  • the electronic device 1200 further comprises a user interface 1212 that is connected to the processor 1201. This user interface 1212 acts as a man-machine interface and enables a dialogue be- tween an administrator and the electronic system.
  • the electronic device 1200 further comprises an Ethernet interface 1221, a Bluetooth interface 1204, and a WLAN interface 1205. These units 1204, 1205 act as 1/ O interfaces for data communication with external devices. For example, additional loudspeakers, microphones, and video cameras with Ethernet, WLAN or Bluetooth connection may be coupled to the processor 1201 via these interfaces 1221, 1204, and 1205.
  • the electronic device 1200 further comprises a data storage 1202 and a data memory 1203 (here a RAM).
  • the data memory 1203 is arranged to temporarily store or cache data or computer instructions for processing by the processor 1201.
  • the data storage 1202 is arranged as a long-term storage, e.g. for recording sensor data obtained from the microphone array 1210 and provided to or retrieved from the CNN 1220.
  • An electronic device comprising circuitry configured to perform audio source separation (101) on an audio input signal to extract one or more percussive separations ( S BD (n), S SD (n), s RD (n)) and to perform rhythmic enhancement on the one or more percussive separations ( S BD (n), S SD (n), s RD (n)) to obtain at least one enhanced percussive separation (s , D (n)).
  • circuitry configured to amplify at least one of the percussive separations ( s BD (n ), s SD (n ), s RD (n)) by a gain factor ( g(n)) to obtain the at least one enhanced percussive separation (s , D (n)).
  • circuitry is configured to perform rhythm pattern selection (203) based on the rhythm patterns ( RP[ ]) to obtain the at least one enhanced percussive separation (s' D (n)).
  • circuitry is configured to perform a spectral analysis
  • circuitry is configured to perform beat detection (201) on the audio input signal (x(n)) to obtain a beat frequency (w) and to perform the rhythm pattern estimation (202) based on a percussive separation (s BD (n), s SD (n ), s RD (n)) and the beat frequency ( ⁇ ).
  • circuitry is configured to determine by a rhythm similarity estimation (900) a similarity (904) between a percussive separation ( S BD (n), S SD (n), s RD (n)) and a beat signal (&(n)) to obtain at least one enhanced percussive separation
  • circuitry is configured to perform similarity-to-gain mapping (302) based on the similarity (904) to obtain gain factors (g( n)) used for obtaining the at least one enhanced percussive separation (s , D (n)).
  • circuitry is configured to perform the source separation (101) on the audio input signal (x(n)) to obtain one or more percussive sepa- rations ( s D (n )) and one or more non-percussive separations (s NP (n)), and to perform mixing of the enhanced separated source (s'D(n)) with the one or more non-percussive separations ( s NP (n )), to obtain an enhanced audio signal (V(n)).
  • the one or more percussive separa- tions (s D (n)) comprise drums.
  • circuitry comprises a microphone to acquire the audio input signal (x(n)).
  • circuitry further comprises a loudspeaker system (104) to output the enhanced audio signal (x'(n)).
  • circuitry further comprises a vi- bration motor to output vibrations based on a signal of the enhanced separation (s' D (n)).
  • a method comprising: performing audio source separation (101) on an audio input signal (x(n)) to extract one or more percussive separations ( S BD (n), S SD (n), s RD (n)) ; and performing rhythmic enhancement on the one or more percussive separations ( S BD (n), S SD (n), s RD (n)) to obtain at least one enhanced percussive separation (s' D (n)).
  • a computer program comprising instructions, the instructions when executed on a processor causing the processor to perform the method of (15).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

An electronic device having a circuitry configured to perform audio source separation on an audio input signal to extract one or more percussive separations, and to perform rhythmic enhancement on the one or more percussive separations to obtain at least one enhanced percussive separation.

Description

MULTIPLE PERCUSSIVE SOURCES SEPARATION FOR REMIXING
TECHNICAL FIELD
The present disclosure generally pertains to the field of audio processing, and in particular, to de- vices, methods and computer programs for audio enhancement.
TECHNICAL BACKGROUND
There is a lot of audio content available, for example, in the form of compact disks (CD), tapes, au- dio data files which can be downloaded from the internet, but also in the form of sound tracks of videos, e.g. stored on a digital video disk or the like, etc.
Typically, audio content is already mixed from original audio source signals, e.g. for a mono or ste- reo setting, without keeping original audio source signals from the original audio sources which have been used for production of the audio content.
However, there exist situations or applications where a remixing or upmixing of the audio content would be desirable. For instance, in situations where the audio content is to be played on a device having more audio channels available than the audio content provides, e.g. mono audio content to be played on a stereo device, stereo audio content to be played on a surround sound device having six audio channels, etc. In other situations, the perceived spatial position of an audio source shall be amended, or the perceived loudness of an audio source shall be amended.
Although there generally exist techniques for remixing audio content, it is generally desirable to im- prove methods and apparatus for audio enhancement.
SUMMARY
According to a first aspect the disclosure provides an electronic device comprising circuitry configured to perform audio source separation on an audio input signal to extract one or more per- cussive separations and to perform rhythmic enhancement on the one or more percussive separa- tions to obtain at least one enhanced percussive separation.
According to a second aspect the disclosure provides a method comprising performing audio source separation on an audio input signal to extract one or more percussive separations; and performing rhythmic enhancement on the one or more percussive separations to obtain at least one enhanced percussive separation.
According to a third aspect the disclosure provides a computer program comprising instructions, the instructions when executed on a processor causing the processor to perform audio source separation on an audio input signal to extract one or more percussive separations and to perform rhythmic en- hancement on the one or more percussive separations to obtain at least one enhanced percussive separation.
Further aspects are set forth in the dependent claims, the following description and the drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments are explained by way of example with respect to the accompanying drawings, in which:
Fig. 1 schematically shows a general approach of audio upmixing/ remixing by means of blind source separation (BSS), such as music source separation (MSS);
Fig. 2 schematically shows a process of audio signal enhancement based on audio source separation and rhythm amplification;
Fig. 3 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification;
Fig. 4 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification;
Fig. 5 schematically describes in more detail an embodiment of the beat detection process per- formed in the process of audio enhancement;
Fig. 6 visualizes the beat detection process described in Fig. 5;
Fig. 7 schematically shows a representation of the beat frequency obtained from beat detection as a beat line;
Fig. 8 schematically describes in more detail an embodiment of the rhythm pattern estimation pro- cess performed in the process of audio enhancement described in Fig. 3;
Fig. 9a schematically shows a representation of the rhythm pattern estimation described in Fig. 8;
Fig. 9b schematically shows a representation of the snare drums estimated rhythm pattern obtained by the rhythm pattern estimation described in Fig. 8;
Fig. 10 schematically shows a representation of the beat frequency, expressed in beats per minute, in an 16ώ note quantization pattern;
Fig. 11a shows a rhythm pattern of a rock music song on an 8th note quantization pattern represen- tation; Fig.l lb shows a rhythm pattern of a rock music song on an 8* note quantization pattern representa- tion;
Fig. 12 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification;
Fig. 13 shows in more detail an embodiment of a process of a rhythm similarity estimation process performed as described in Fig. 12;
Fig. 14 shows a flow diagram visualizing a method for signal mixing related to audio signal enhance- ment based on source separation and rhythm amplification to obtain an enhanced audio signal; and
Fig. 15 schematically describes an embodiment of an electronic device that can implement the pro- cesses of audio enhancement based on rhythm amplification and music source separation.
DETAILED DESCRIPTION OF EMBODIMENTS
Before a detailed description of the embodiments under reference of Fig. 1 to Fig. 15, some general explanations are made.
In the following, the terms remixing, upmixing, and downmixing can refer to the overall process of generating output audio content on the basis of separated audio source signals originating from mixed input audio content, while the term “mixing” can refer to the mixing of the separated audio source signals. Hence the “mixing” of the separated audio source signals can result in a “remixing”, “upmixing” or “downmixing” of the mixed audio sources of the input audio content.
The embodiments disclose an electronic device comprising circuitry configured to perform audio source separation on an audio input signal to extract one or more percussive separations and to per- form rhythmic enhancement on the one or more percussive separations to obtain at least one en- hanced percussive separation.
The electronic device may for example be any multi-purpose mobile computing device such as smartphones, wearable devices in the form of a wristwatch, e.g. smartwatch or the like. DMX con- troller, or the like.
The circuitry of the electronic device may include a processor, may for example be CPU, a memory (RAM, ROM or the like), a memory and/ or storage, interfaces, etc. Circuitry may comprise or may be connected with input means (mouse, keyboard, camera, etc.), output means (display (e.g. liquid crystal, (organic) light emitting diode, etc.)), loudspeakers, etc., a (wireless) interface, etc., as it is gen- erally known for electronic devices (computers, smartphones, etc.). Moreover, circuitry may com- prise or may be connected with sensors for sensing still images or video image data (image sensor, camera sensor, video sensor, etc.), for sensing environmental parameters (e.g. radar, humidity, light, temperature), etc.
In audio source separation, an input signal comprising a number of sources (e.g. instruments, voices, or the like) is decomposed into separations. Audio source separation may be unsupervised (called “blind source separation”, BSS) or partly supervised. “Blind” means that the blind source separation does not necessarily have information about the original sources. For example, it may not necessarily know how many sources the original signal contained, or which sound information of the input sig- nal belong to which original source. The aim of blind source separation is to decompose the original signal separations without knowing the separations before. A blind source separation unit may use any of the blind source separation techniques known to the skilled person. In (blind) source separa- tion, source signals may be searched that are minimally correlated or maximally independent in a probabilistic or information-theoretic sense or on the basis of a non-negative matrix factorization structural constraints on the audio source signals can be found. Methods for performing (blind) source separation are known to the skilled person and are based on, for example, principal compo- nents analysis, singular value decomposition, (independent component analysis, non-negative ma- trix factorization, artificial neural networks, etc.
Although some embodiments use blind source separation for generating the separated audio source signals, the present disclosure is not limited to embodiments where no further information is used for the separation of the audio source signals, but in some embodiments, further information is used for generation of separated audio source signals. Such further information can be, for example, in- formation about the mixing process, information about the type of audio sources included in the in- put audio content, information about a spatial position of audio sources included in the input audio content, etc.
The input signal can be an audio signal of any type. It can be in the form of analog signals, digital signals, it can origin from a compact disk, digital video disk, or the like, it can be a data file, such as a wave file, mp3-file or the like, and the present disclosure is not limited to a specific format of the input audio content. An input audio content may for example be a stereo audio signal having a first channel input audio signal and a second channel input audio signal, without that the present disclo- sure is limited to input audio contents with two audio channels. In other embodiments, the input audio content may include any number of channels, such as remixing of an 5.1 audio signal or the like.
The input signal may comprise one or more source signals. In particular, the input signal may com- prise several audio sources. An audio source can be any entity, which produces sound waves, for ex- ample, music instruments, voice, vocals, artificial generated sound, e.g. origin form a synthesizer, etc. The input audio content may represent or include mixed audio sources, which means that the sound information is not separately available for all audio sources of the input audio content, but that the sound information for different audio sources, e.g. at least partially overlaps or is mixed.
The circuitry may be configured to perform rhythmic enhancement of at least one of the separations included in the audio input signal. For example, the rhythmic enhancement may be performed by estimating a rhythm pattern of one or more separations and selectively amplifying at least one of the separations by a gain factor, or the like.
The circuitry may be configured to perform the remixing or upmixing based on at least one en- hanced separated source and based on other separated sources obtained by the blind source separa- tion to obtain the remixed or upmixed signal. The remixing or upmixing may be configured to perform remixing or upmixing of the separated sources, here percussive separations and non-per- cussive separations to produce a remixed or upmixed enhanced signal, which may be sent to the loudspeaker system.
The circuitry of the electronic device may for example be configured to amplify at least one of the percussive separations by a gain factor to obtain the at least one enhanced percussive separation.
The gain factor may control the contribution of the one or more percussive separations, to the mix, that is, one percussive separation may be enhanced, while another percussive separation may be un- changed in the mix, or the like, without limiting the present disclosure to that result. Alternatively, the gain factor may not be selected by a binary system, instead the gain factor may be selected by a weighting system that sets the value of the gain factor in an arbitrary way.
The at least one percussive separation, which may contain the main beat of a music piece, may for example be amplified by a static gain factor, or the gain factor may be analyzed in real-time, while the music is playing and may be dynamically adapted, or the like. A gain factor to be applied to the percussive separation that has beats in a certain speed range that do not vary over time may be higher than the gain factor to be applied to the other percussive separations, without limiting the present disclosure in that regard.
The circuitry of the electronic device may for example be configured to perform rhythm pattern es- timation on the percussive separations to obtain estimated rhythm patterns. The rhythm pattern esti- mation may be performed, at the beginning of a song, on the percussive separations contained in the audio signal. A beat may be extracted using an a-priori percussive instrument, or the like.
The circuitry of the electronic device may for example be configured to perform rhythm pattern se- lection based on the rhythm patterns to obtain the at least one enhanced percussive separation. For example, the rhythm pattern of at least one of the percussive separations may be compared between several or all the rest percussive separations. The percussive separation with the strongest and easiest pattern may be selected and may be used for an accurate rhythm estimation.
The circuitry of the electronic device may for example be configured to perform a spectral analysis of a percussive separation to estimate a rhythm pattern. Alternatively, a rhythm pattern may be esti- mated using a classical energy-based rhythm extractor, or the like.
The circuitry of the electronic device may for example be configured to perform beat detection on the audio input signal to obtain a beat frequency and to perform the rhythm pattern estimation based on a percussive separation and the beat frequency.
The gain factors may control the contribution of the one or more percussive separations, to the mix, that is, one percussive separation may be enhanced, while another percussive separation may be un- changed in the mix, or the like, without limiting the present disclosure to that result. Alternatively, the gain factors may not be selected by a binary system, instead the gain factors may be selected by a weighting system that sets the values of the gain factors in an arbitrary way.
The circuitry of the electronic device may for example be configured to determine by a rhythm simi- larity estimation a similarity between a percussive separation and a beat signal to obtain at least one enhanced percussive separation.
The circuitry of the electronic device may for example be configured to perform similarity-to-gain mapping based on the similarity to obtain gain factors used for obtaining the at least one enhanced percussive separation.
The circuitry of the electronic device may for example be configured to perform a spectral compari- son between a percussive separation and a beat signal to obtain at least one enhanced percussive separation.
The circuitry of the electronic device may for example be configured to perform the source separa- tion on the audio input signal to obtain one or more percussive separations and one or more non- percussive separations and to perform mixing of the enhanced separated source with the one or more non-percussive separations to obtain an enhanced audio signal. For example, the enhanced au- dio signal may be obtained by amplifying the desirable percussive separations, without deleting the rest separations, e.g. the singing voice, or the like. The non-percussive separations may include me- lodic and harmonic parts. For example, the non-percussive separation may include the remaining sources of the audio input signal, apart from the percussive separations, that is to include e.g. vocals, guitar, bass, or the like.
According to an embodiment, the one or more percussive separations comprise drums. The circuitry of the electronic device may for example comprise a microphone to acquire the audio input signal.
The circuitry of the electronic device may for example comprise a loudspeaker system to output the enhanced audio signal. The loudspeaker system may be any loudspeaker system, such as for example a Bluetooth in-ear earphone, or the like. The enhanced audio signal may be provided it as live (in real-time) playback to a user, in which the enhanced percussive separations may be mixed in time sync with the rest separations, or the like.
The circuitry of the electronic device may for example comprise a vibration motor to output vibra- tions based on a signal of the enhanced separation. For example, a secondary notification signal such as a vibration from a cellphone, may be generated. The vibrator of a smartphone that a user wears in his pocket may be controlled from a beat signal derived from a percussive separation, without limit- ing the present disclosure in that regard.
Still further, in a case of (Digital Multiplex) DMX controller, which is used to control stage lighting and effects, different percussive separations having different percussive rhythm patterns may be re- lated to the control of different light effects, or the like.
The embodiments also disclose a method comprising performing audio source separation on an au- dio input signal to extract one or more percussive separations and performing rhythmic enhance- ment on the one or more percussive separations to obtain at least one enhanced percussive separation.
The embodiments also disclose a computer program comprising instructions, the instructions when executed on a processor causing the processor to perform the processes disclosed here.
Embodiments are now described by reference to the drawings.
Audio remixing/upmixing by means of audio source separation
Fig. 1 schematically shows a general approach of audio upmixing/ remixing by means of blind source separation (BSS), such as music source separation (MSS).
First, source separation (also called “demixing”) is performed which decomposes a source audio sig- nal 1 comprising multiple channels I and audio from multiple audio sources Source 1, Source 2, ... Source K (e.g. instruments, voice, etc.) into “separations”, here into source estimates 2a-2d for each channel i, wherein K is an integer number and denotes the number of audio sources. In the embodi- ment here, the source audio signal 1 is a stereo signal having two channels i = 1 and i = 2. As the separation of the audio source signal may be imperfect, for example, due to the mixing of the audio sources, a residual signal 3 (r(n)) is generated in addition to the separated audio source signals 2a-2d. The residual signal may for example represent a difference between the input audio content and the sum of all separated audio source signals. The audio signal emitted by each audio source is repre- sented in the input audio content 1 by its respective recorded sound waves. For input audio content having more than one audio channel, such as stereo or surround sound input audio content, also a spatial information for the audio sources is typically included or represented by the input audio con- tent, e.g. by the proportion of the audio source signal included in the different audio channels. The separation of the input audio content 1 into separated audio source signals 2a-2d and a residual 3 is performed based on blind source separation or other techniques which are able to separate audio sources.
In a second step, the separations 2a-2d and the possible residual 3 are remixed and rendered to a new loudspeaker signal 4, here a signal comprising five channels 4a-4e, namely a 5.0 channel system. Based on the separated audio source signals and the residual signal, an output audio content is gen- erated by mixing the separated audio source signals and the residual signal taking into account spatial information. The output audio content is exemplary illustrated and denoted with reference number 4 in Fig. 1.
In the following, the number of audio channels of the input audio content is referred to as Min and the number of audio channels of the output audio content is referred to as Mout. As the input audio content 1 in the example of Fig. 1 has two channels i = 1 and i =2 and the output audio content 4 in the example of Fig. 1 has five channels 4a-4e, Min = 2 and Mout = 5. The approach in Fig. 1 is gen- erally referred to as remixing, and in particular as upmixing if Min < Mout. In the example of the Fig. 1 the number of audio channels Min = 2 of the input audio content 1 is smaller than the number of audio channels Mout = 5 of the output audio content 4, which is, thus, an upmixing from the ste- reo input audio content 1 to 5.0 surround sound output audio content 4.
Technical details about source separation process described in Fig. 1 above are known to the skilled person. An exemplifying technique for performing blind source separation is for example disclosed in European patent application EP 3 201 917, or by Uhlich, Stefan, et al. "Improving music source separation based on deep neural networks through data augmentation and network blending." 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017. There also exist programming toolkits for performing blind source separation, such as Open-Unmix, DEMUCS, Spleeter, Asteriod, or the like which allow the skilled person to perform a source separa- tion process as described in Fig. 1 above.
Audio signal enhancement based on source separation and rhythm amplification Fig. 2 schematically shows a process of audio signal enhancement based on audio source separation and rhythm amplification. The process allows to perform audio enhancement using source separa- tion and rhythm amplification by combining (online) audio source separation with audio gain ampli- fication.
An audio input signal (see 1 in Fig. 1) containing multiple sources (see 1, 2, ..., K in Fig. 1), with, for example, multiple channels (e.g. Min = 2) e.g. a piece of music, is input to music source separation 101 and decomposed into separations (see separated sources 2a-2d and residual signal 3 in Fig. 1) as it is described with regard to Fig. 1 above. In the present embodiment, the audio input signal x(n) is decomposed into percussive separations, here drums, bass drums sBD(n), snare drums sSD(n), rest drums separation sRD(n), and into non-percussive separations, here non-percussive instruments SNP (n) , which include melodic and harmonic parts. The non-percussive separation sNP(n) includes the remaining sources of the audio input signal, apart from the percussive separations, e.g. vocals, guitar, bass, or the like. Each of the percussive separations, here the bass drums sBD(n), the snare drums sSD(n), and the rest drums separation sRD(n ) are amplified 102 by a gain factor, which is set statically, to obtain an enhanced percussive separation, here an enhanced bass drums eBD(n), an en- hanced snare drums eSD(n), and an enhanced rest drums separation eRD(n). In the present embodi- ment, the bass drums sBD(n ) are amplified by a gain factor equal to +5 dB, the snare drums sSD(n ) are amplified by a gain factor equal to +3 dB, and the rest drums separation SRD (n) are amplified by a gain factor equal to +1 dB. A mixer 103 mixes the enhanced bass drums s'BD(n ), the enhanced snare drums s'SD(n), and the enhanced rest drums separation s'RD(n ) to the non-percussive sepa- rations sNP(n ) to obtain an enhanced audio signal x'(n). The enhanced audio signal x'(n) is output to a loudspeaker system 104.
It is to be noted that all the above described processes, namely the music source separation 101, and the gain factor amplification 102 can be performed in real-time, e.g. “online” with some latency. For example, they could be directly run on the smartphone, smartwatch of the user/in his headphones, Bluetooth device, or the like.
The music source separation 101 process may for example be implemented as described in more de- tail in published paper Uhlich, Stefan, et al. "Improving music source separation based on deep neu- ral networks through data augmentation and network blending." 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017. There also exist programming toolkits for performing blind source separation, such as Open-Unmix, DEMUCS, Spleeter, Asteriod, or the like which allow the skilled person to perform a source separation process as described in Fig. 1 above. Rhythm amplification with rhythm pattern estimation
Fig, 3 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification. The process allows to perform music personaliza- tion using source separation and rhythm amplification by combining (online) audio source separa- tion with rhythm pattern estimation.
An audio input signal (see 1 in Fig. 1) containing multiple sources (see 1, 2, ..., K in Fig. 1), with, for example, multiple channels (e.g. Min = 2) e.g. a piece of music, is input to music source separation 101 and decomposed into separations (see separated sources 2a-2d and residual signal 3 in Fig. 1) as it is described with regard to Fig. 1 above. In the present embodiment, the audio input signal x(n) is decomposed into percussive separations, here drums, bass drums sBD(n), snare drums sSD(n), rest drums separation sRD(n ), and into non-percussive separations, here non-percussive instruments sWP(n), which include melodic and harmonic parts. The non-percussive separation sNP(n) includes the remaining sources of the audio input signal, apart from the percussive separations, e.g. vocals, guitar, bass, or the like. A rhythm pattern estimation 202 process is performed on each of the per- cussive separations, here the bass drums sBD(n), the snare drums sSD(n ), and the rest drums sepa- ration sRD(n), to obtain an estimated rhythm pattern RP[], here a bass drums estimated rhythm pattern RPBB [ ], a snare drums estimated rhythm pattern RPSD [ ], and a rest drums separation esti- mated rhythm pattern RPRB [ ]. An embodiment of the rhythm pattern estimation 202 process is de- scribed in more detail with regard to Fig. 8below. A rhythm pattern selection 203 process is performed on the bass drums estimated rhythm pattern RPBD[ ], the snare drums estimated rhythm pattern RPBu [ L and the rest drums estimated rhythm pattern RPRD [ ], to obtain gain factors g(n). A separation enhancement 204 process is performed on the bass drums sBD(n), the snare drums SSD (n), and the rest drums separation sRD(n), based on the gain factors to obtain an enhanced drums separation s'D(n). A mixer 103 mixes the enhanced drums separation s’D(n ) to the non-per- cussive separations sNP(n) to obtain an enhanced audio signal x’(n). The enhanced audio signal x'(n) is output to a loudspeaker system 104.
It is to be noted that all the above described processes, namely the music source separation 101, the rhythm pattern estimation 202, the rhythm pattern selection 203 and the separation enhancement 204 can be performed in real-time, e.g. “online”. For example, they could be directly run on the smartphone, smartwatch of the user/in his headphones, Bluetooth device, or the like. ]ig. 4 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification. The process allows to perform audio enhance- ment using source separation and rhythm amplification by combining (online) audio source separa- tion with rhythm pattern estimation and beat detection.
An audio input signal (see 1 in Fig. 1) containing multiple sources (see 1, 2, ..., K in Fig. 1), with, for example, multiple channels (e.g. Min = 2) e.g. a piece of music, is input to music source separation 101 and decomposed into separations (see separated sources 2a-2d and residual signal 3 in Fig. 1) as it is described with regard to Fig. 1 above. In the present embodiment, the audio input signal x(n) is decomposed into percussive separations, here bass drums sBD(n ), snare drums sBD(n), rest drums separation sRD(n ) (e.g. all other drums that are not bass drum nor snare), and into non-percussive separations, here non-percussive instruments sNP(n), which include melodic and harmonic parts. The non-percussive separation sNP(n) includes the remaining sources of the audio input signal, apart from the percussive separations, e.g. vocals, guitar, bass, or the like. A beat detection 201 pro- cess is performed on the audio input signal x(n) to obtain a beat frequency ω. An embodiment of the beat detection 201 process is described in more detail with regard to Figs. 6 and 7 below. A rhythm pattern estimation 202 process is performed on each of the percussive separations, here the bass drums sBD(n ), the snare drums sSD(n), and the rest drums separation sRD(n), based on the beat frequency w, to obtain a percussive separation estimated rhythm pattern RP[ ], here a bass drums estimated rhythm pattern RPBB [ ], a snare drums estimated rhythm pattern RPSD [ ] , and a rest drums separation estimated rhythm pattern RPRB [ ]. An embodiment of the rhythm pattern es- timation 202 process is described in more detail with regard to Figs. 8 to 9b below. A rhythm pat- tern selection 203 process is performed on the bass drums estimated rhythm pattern RPBD[ ], the snare drums estimated rhythm pattern RPSD [ ] , and the rest drums estimated rhythm pattern RPRD[ ], to obtain gain factors g(n). A separation enhancement 204 process is performed on the bass drums sBD(n), the snare drums sSD(n), and the rest drams separation sRD(n ), based on the gain factors to obtain an enhanced drums separation s' D(n). A mixer 103 mixes the enhanced drums separation s'D(n) to the non-percussive separations sNP(n ) to obtain an enhanced audio sig- nal x'(n). The enhanced audio signal x'(n) is output to a loudspeaker system 104.
It is to be noted that all the above described processes, namely the music source separation 101, the beat detection 201, the rhythm pattern estimation 202, the rhythm pattern selection 203 and the sep- aration enhancement 204 can be performed in real-time, e.g. “online”. For example, they could be directly run on the smartphone, smartwatch of the user/in his headphones, Bluetooth device, or the like. The music source separation 101 process may for example be implemented as described in more de- tail in published paper Uhlich, Stefan, et al. "Improving music source separation based on deep neu- ral networks through data augmentation and network blending." 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017. There also exist programming toolkits for performing blind source separation, such as Open-Unmix, DEMUCS, Spleeter, Asteriod, or the like which allow the skilled person to perform a source separation process as described in Fig. 1 above.
The rhythm pattern estimation 202 process may for example be implemented as described in more detail in published papers Ilya Shmulevich, et al. “Perceptual Issues in Music Pattern Recognition: Complexity of Rhythm and Key Finding” February 23, 1999, Sankalp Gulati, Preeti Rao “Rhythm Pattern Representations for Tempo Detection in Music” Proc. of the First International Conference on Intelligent Interactive Technologies and Multimedia, Dec, 2010, Allahabad, India, and Makarand Velankar and Dr. Parag Kulkarni “Pattern recognition for computational music” December 2017.
Rhythm amplification with beat detection and rhythm pattern estimation
Fig. 5 schematically describes in more detail an embodiment of the beat detection process per- formed in the process of audio enhancement described in Fig. 4 above, in which beat detection is performed on the audio input signal to obtain a beat frequency.
As described in Fig. 4, a beat detection 201 process is performed on the audio input signal x(n), to obtain a beat frequency ω. In particular, a process of windowing 401 is performed on audio input x(n), to obtain windowed audio xn(i). A process of total energy determination 402 is performed on the windowed audio xn(i ) to obtain a signal energy curve of the audio xn(i). A Fast Fourier Transform (FFT) spectrum 403 is applied to the windowed audio xn(i) to obtain the FFT spectrum xω(n) , also known as power spectral density. A spectral analysis 404 is performed on the FFT spec- trum xω(n) to obtain a beat frequency ω. The spectral analysis 404 looks for maximum in the FFT spectrum xω(n).
At the windowing 401, a windowed audio xn(i), can be obtained by xn(i) = x(n + i)h(i) where x(n + i) represents the discretized audio signal (i representing the sample number and thus time) shifted by n samples, h(i) is a framing function around time n (respectively sample n), like for example the hamming function, which is well-known to the skilled person.
At the total energy 402 estimation, the signal energy EXn of each windowed audio xn(i) can be ob- tained by
Figure imgf000015_0001
where xn(i) is the windowed audio.
At the FFT 403 spectrum analysis , the energy curve Ex(n ) of each windowed audio is converted into a respective short-term power spectrum. The short-term power spectrum x(ω) as obtained at the Discrete Fourier transform, also known as power spectral density, which may be obtained by
Figure imgf000015_0002
where Ex (n) is the energy curve in the windowed audio, as defined above, ω are the frequencies in the frequency domain, | x(ω) | are the components of the short-term power spectrum x(ω) and N is the numbers of samples in a windowed audio. The sum
Figure imgf000015_0003
can be applied on the energy curve of the entire windowed audio or only on parts of the energy curve of the windowed audio, suitably chosen by the skilled person. At the spectral analysis 404, the maximum in the FFT power spectrum x(ω) are detected in order to obtain the beat frequency ω.
Fig. 6 visualizes the beat detection 201 process described in Fig. 5 above. The upper part of Fig. 6 shows an audio signal, here the audio input x(n) being input in the beat detection 201, that com- prises several bars of length T. A first bar starts at time instance 0 and ends at time instance T. A second bar subsequent to the first bar starts at time instance T and ends at time instance 2T. A third bar subsequent to the second bar starts at time instance 2T and ends at time instance 3T. The audio signal x(n) represents percussive sounds. Each bar comprises several percussive sounds, namely a bass drum BD at the start of each bar (first beat of a bar, expressed in eights -notes quantization), a snare drum SD in the middle of each bar (fifth beat of a bar, expressed in eights-notes quantization), and a hi hat HH playing an eights-notes rhythm (second, third, fourth, sixth, seventh, and eights beat of a bar, expressed in eights-notes quantization).
The middle part of Fig. 6 shows the power spectrum obtained by the FFT spectrum determination 403 (see Fig. 5), in which the y-axis of the power spectrum represents the spectral density x(ω) of the audio input x(n) and the x-axis of the power spectrum represents the frequency ω expressed in beats per minute (bpm). The power spectrum shows four peaks, a first peak at 20 bpm, a second beak at 40 bpm, a third peak at 80 bpm and fourth peak at 160 bpm. By analyzing the position of the maxima of the power spectrum the speed of the rhythm in the audio input x(n) can be deter- mined. As a schematic example, one may for example relate the third peak of the power spectrum of Fig. 6 to an eights-note beat, so that the fourth peak relates to sixteenth notes, the second peak re- lates to fourth notes, and so on.
The lower part of Fig. 6 shows the output of the beat detection 201, that is the beat frequency (i) ex- pressed in beats per minute (bpm), here 80bpm, as obtained by the spectral analysis 404 described above.
The beat detection 201 process may also for example be implemented by analyzing audio spectro- grams such as described in more detail in published papers Jonathan Foote, Shingo Uchihashi “The Beat Spectrum: a New Approach To Rhythm Analysis” IEEE International Conference on Multi- media & Expo 2001, IEEE (2001).
Fig. 7 schematically shows a representation of the beat frequency obtained from beat detection as a beat line b(n), The upper part of Fig. 7 shows the beat frequency ω = 80 bpm obtained by beat de- tection 201 (see Fig. 5). The lower part of Fig. 7 shows a beat signal b(n) obtained from this beat frequency w = 80 bpm, where b(n) represents the position of the beats on the time line expressed in sample numbers n.
Fig. 8 schematically describes in more detail an embodiment of the rhythm pattern estimation pro- cess performed in the process of audio enhancement described in Fig. 4 above, in which rhythm pattern estimation is performed on the drums separations to obtain estimated rhythm patterns for each drums separation.
As described in Fig. 4, a rhythm pattern estimation 202 process is performed on the drums separa- tions, here the snare drums sSD(n), to obtain an estimated rhythm patterns for each drums separa- tion, here a snare drums estimated rhythm pattern RPSD [ ] · In particular, a process of windowing 401 is performed on snare drums sSD(n), to obtain windowed snare drums SSDn(i). A process of total energy 402 determination is performed on the windowed snare drums SSDn(i) to obtain an en- ergy curve ESD of the snare drums sSD(n ). An analysis of the total energy 403 is performed on the energy curve ESD of each windowed snare drums SSDn(i) to obtain a snare drums estimated rhythm pattern RPSD [ ]·
At the windowing 401 , a windowed percussive separation, such as the windowed snare drums
Figure imgf000016_0001
where SSD ( n + i) represents the discretized snare drums signal (i representing the sample number and thus time) shifted by n samples, /l(i) is a framing function around time n (respectively sample n), like for example the hamming function, which is well-known to the skilled person.
At the total energy determination 402, the energy curve ESD of each windowed snare drums SSD (n) can be obtained by
Figure imgf000017_0001
where SSDn(i) is the windowed snare drums.
At the analysis of total energy 403, an analysis of the total energy performed on the isolated snare drum track, as in this example, for detecting the beats. The analysis of total energy 403 filters the beats with much louder or much softer energy signal in the beat line b(n). Therefore, the snare drums estimated rhythm pattern RPSD [ ] is obtained. Fig. 9a schematically shows a representation of the rhythm pattern estimation, described in Fig. 8 above.
The upper part of Fig. 9a shows a percussive signal, here the snare drums SSD ( n ) as obtained by the music source separation 101 described in Fig. 1. This percussive signal sSD(n ) is input in the rhythm pattern estimation 202, that comprises several bars of length T, wherein each bar comprises a snare drums sound SN in the middle of each bar (fifth beat of a bar, expressed in eights-notes quantiza- tion). A first beat starts at time instance 0 and ends at time instance T. A second beat subsequent to the first beat starts at time instance T and ends at time instance 2T. A third beat subsequent to the second beat starts at time instance 2T and ends at time instance 3T.
The middle part of Fig. 9a shows the output of the total energy determination 402 (see Fig.8), that is, the energy curve ESD(n) of the snares drums sSD(n). The arrow, between the upper part of Fig. 9a and the middle part of Fig. 9a, represents the process performed on the total energy determi- nation 402 described in Fig. 8. The energy curve ESD(n) of the snares drums sSD(n ) is presented on the beat line b(n), where b(n) represents the position of the beats on the time line expressed in sample numbers n.
The lower part of Fig. 9a shows the energy thresholding performed in the analysis of total energy 403 (see Fig. 8). The energy thresholding, which is performed on the isolated snare drums, filters the beats with much louder or much softer energy signal in the beat line b(n). For example, the analysis of total energy 403 described in Fig. 8, keeps the beats having energy higher than a threshold, here the threshold is represented by a dashed line. Therefore, a rhythm pattern array RPSD [1 8 ] for each bar, 4/ 4 time, of an audio signal, is obtained, wherein RPSD [1 ... 4 ] = 0, RPSD [6 ... 8 ] = 0 and RPSD[5 ] = 1. Thus, the snare drums estimated rhythm pattern RPSD [ ] is obtained as described in Fig. 9b below.
Fig. 9b schematically shows a representation of the snare drums estimated rhythm pattern RPSD [ ] obtained by the rhythm pattern estimation, described in Fig. 8 above.
The upper part of Fig. 9b shows the snare drums estimated rhythm pattern RPSD [ ] of the audio sig- nal x(n), obtained after the energy thresholding (see lower part of Fig. 9a) performed in the analysis of total energy 403 (see Fig. 8). A rhythm pattern array RPSD [1 ... 8 ] for each bar of the audio signal x(n), is obtained, wherein for each bar expressed in eighth-notes quantization RPSD [1 ...4 ] = 0, RPSD[ 6 ... 8 ] = 0 and RPSD[5 ] = 1.
The lower part of Fig. 9b shows the snare drums estimated rhythm pattern RPSD [ ] of a bar, ex- pressed in eighth-notes quantization. That is, a rhythm pattern array RPSD [1 ... 8 ], wherein RPSD[ 1 ... 4 ] = o, RPsd[ 6 ...8 ] = 0 and RPSD[ 5 ] = 1.
Fig. 10 schematically shows a representation of the beat frequency, expressed in beats per minute, in an 16* note quantization pattern. The beat frequency ω obtained by beat detection 201 (see Fig. 5) and an 16th note quantization pattern (see upper part of Fig. 10) are represented by a beat pattern, e.g. an 16* note beat pattern array PP[1...16 ] (see middle part of Fig. 10), which is used to obtain the estimated rhythm pattern RP[ ], here the bass drum rhythm pattern RPBD [ ]· The bass drum rhythm pattern RPBD [ ] is represented by a rhythm pattern array RP[ 1. ..16 ] (see lower part of Fig. 10), wherein PP[1] = 1, PP[9] = 1, and PP[2...8 ] = 0, and PP[10. ..16 ] = 0.
Fig. 11a shows a rhythm pattern of a rock music song on an 8* note quantization pattern represen- tation. The upper rhythm pattern of Fig. 10a is a snare drums rhythm pattern, in which PP[1] = 1, RP[ 3] = 1, RP[ 5] = 1, RP[ 7] = 1, and RP[ 2] = 0, PP[4] = 0, PP[6] = 0, PP[8] = 0. The middle rhythm pattern of Fig. 10a is a bass drums rhythm pattern, in which PP[1] = 1, PP[8] = 1, and PP[2 ... 7] = 0. The lower rhythm pattern of Fig. 11a is a hi-hat rhythm pattern, in which PP[1 ...8] = 1.
Fig. lib shows a rhythm pattern of a rock music song on an 8* note quantization pattern represen- tation. The upper rhythm pattern of Fig. lib is a snare drums rhythm pattern, in which PP[3] = 1, PP[4] = 1, PP[7] = 1, and PP[1] = 0, PP[2] = 0, PP[5] = 0, PP[6] = 0, PP[8] = 0. The middle rhythm pattern of Fig. lib is a bass drums rhythm pattern, in which PP[1] = 1, PP[6] = 1, and PP[2 ... 5] = 0, PP[7 ... 8] = 0. The lower rhythm pattern of Fig. lib is a hi-hat rhythm pat- tern, in which PP[1 ... 8] = 1. Rhythm amplification using similarity to gain mapping
Fig, 12 schematically shows another embodiment of a process of audio signal enhancement based on audio source separation and rhythm amplification. The process allows to perform audio enhance- ment using source separation and rhythm amplification by combining (online) audio source separa- tion with rhythm similarity estimation and beat detection.
An audio input signal (see 1 in Fig. 1) containing multiple sources (see 1, 2, ..., K in Fig. 1), with, for example, multiple channels (e.g. Min = 2) e.g. a piece of music, is input to music source separation 101 and decomposed into separations (see separated sources 2a-2d and residual signal 3 in Fig. 1) as it is described with regard to Fig. 1 above. In the present embodiment, the audio input signal x(n ) is decomposed into percussive separations, here drums, bass drums sBD(n), snare drums SSD (ft), rest drums separation sRD(n ), and into non-percussive separations, here non-percussive instruments sNP(n), which include melodic and harmonic parts. The non-percussive separation sNP(n ) includes the remaining sources of the audio input signal, apart from the percussive separations, e.g. vocals, guitar, bass, or the like. A beat detection 201 process is performed on the audio input signal x(n ) to obtain a beat frequency 0). An embodiment of the beat detection 201 process is described in more detail with regard to Figs. 5 and 6 above. A rhythm similarity estimation 900 process is performed on each of the percussive separations, here the bass drums sBD(n), the snare drums sSD(n), and the rest drums separation sRD(n ), based on a beat signal b(n) obtained from the beat frequency 0), to obtain a rhythm similarity, here a bass drums rhythm similarity, a snare drums rhythm similarity, and a rest drums rhythm similarity. An embodiment of the rhythm similarity estimation 900 process is described in more detail with regard to Fig. 13 below. A similarity- to-gain mapping 901 process is performed on the bass drums rhythm similarity, the snare drums rhythm similarity, and the rest drums rhythm similarity, to obtain gain factors g(n ). A separation enhancement 204 process is per- formed on the bass drums sBD(n), the snare drums SSD (ft), and the rest drums separation sRD(n ), based on the gain factor to obtain an enhanced drums separation s'D(n). A mixer 103 mixes the en- hanced drums separation s'D(n) to the non-percussive separations sNP(n ) to obtain an enhanced audio signal x'(n ). The enhanced audio signal x'(n) is output to a loudspeaker system 104.
It is to be noted that all the above described processes, namely the music source separation 101, the beat detection 201, the rhythm similarity estimation 900, the similarity-to-gain mapping 901 and the separation enhancement 204 can be performed in real-time, e.g. “online”. For example, they could be directly run on the smartphone, smartwatch of the user/in his headphones, Bluetooth device, or the like. The similarity-to-gain mapping 901 maps the rhythm similarity with a gain value. The gain factors can control the contribution of the percussive separations, to the mix, that is, the snare drums may be enhanced by +5dB, the bass drums may be unchanged in the mix, or the like. The gain factors may not be a binary system, i.e. the bass drums may be enhanced by +1.3dB, the snare drums may be enhanced by +1.1dB, and the rest drums may be enhanced by +0.5dB, depending on the online result of the rhythm similarity estimation 900.
Fig. 13 shows in more detail an embodiment of a process of a rhythm similarity estimation process performed as described in Fig. 12 above. A rhythm similarity estimation 900 process is performed on each of the bass drums sBD(n), the snare drums sSD(n), and the rest drums separation sRD(n ), based on a beat signal b(n) obtained from the beat frequency ω , to obtain a similarity. In the pre- sent embodiment, a rhythm similarity estimation 900 process is performed on the snare drums sSD(n) obtained by the music source separation 101 (see Fig. 12), to obtain a snare drums similarity. A FFT spectrum 902 is performed on both the beat signal bin) and the snare drums SSD (n) signal to obtain a magnitude of the short-term power spectrum |Pf(n)|. A process of an energy spectrum comparator 903 is performed on both power spectrums |Pf(n)| to obtain a similarity between the percussive separation ( SBD(n), SSD (n), sRD(n)) and the beat signal (b(n)).
As already described above, the magnitude of the short-term power spectrum may be obtained by
Figure imgf000020_0001
where xn(i) is the signal in the windowed audio, as defined above, ω are the frequencies in the fre- quency domain, |x(ω) | are the components of the short-term power spectrum x(ω) and N is the numbers of samples in a windowed audio.
Method and Implementation
Fig. 14 shows a flow diagram visualizing a method for signal mixing related to audio signal enhance- ment based on source separation and rhythm amplification to obtain an enhanced audio signal. At 1100, the music source separation (see 101 in Figs. 2, 4, and 12) receives an audio input signal x(n) (see stereo file 1 in Fig. 1). At 1101, music source separation (see 101 in Figs. 2, 4, and 12) is per- formed based on the received audio input signal x(n ) to obtain percussive separations, here bass drums sBD(n), snare drums sSD(n), rest drums separation sRD(n) , and non-percussive separations sNp(n). At 1102, beat detection (see 201 in Figs. 4, and 12) is performed on the received audio in- put signal to obtain a detected beat signal b(n). At 1103, rhythm pattern estimation (see 202 in Figs. 3 and 4) is performed on the percussive separations based on the beat frequency to obtain per- cussive separations estimated rhythm patterns. At 1104, rhythm pattern selection (see 203 in Fig. 4) is performed based on the percussive separations estimated rhythm patterns to obtain gain factors g(n). At 1105, separation enhancement (see 204 in Figs. 2, 4, and 12) is performed on the percus- sive separations based on the gain factors g(n) to obtain enhanced drums separation s' D(n). At 1106, mixing (see 103 in Figs. 2, 4, and 14) of the enhanced drums separation s'D(n) to the non- percussive separations sNP(n ) is performed to obtain an enhanced audio signal x'(n). At 1107, the enhanced audio signal x'(n ) is output to a loudspeaker system (see 104 in Figs. 2, 4, and 10), such as a loudspeaker system of a smartphone, of a smartwatch, of a Bluetooth, or the like.
In the embodiment of Fig. 14, a flow diagram visualizing a method for signal mixing using beat de- tection and rhythm pattern estimation, is described, however, the present disclosure is not limited to the method steps described above. For example, the beat detection process, rhythm pattern estima- tion process and rhythm pattern selection process (see 201, 202 and 203 in Figs. 4, and 12 may be omitted, and instead of the separation enhancement process (see 204 in Figs. 3, 4, and 12) a static gain amplification process (see 102 in Fig. 2) can be performed, or the like.
Fig. 15 schematically describes an embodiment of an electronic device that can implement the pro- cesses of audio enhancement based on rhythm amplification and music source separation, as de- scribed above. The electronic device 1200 comprises a CPU 1201 as processor. The electronic device 1200 further comprises a microphone array 1210, a loudspeaker array 1211 and a convolu- tional neural network unit 1220 that are connected to the processor 1201. The processor 1201 may for example implement a gain amplification 102, a separation enhancement 204 that realize the pro- cesses described with regard to Figs. 2, Fig. 3, Fig. 4, and Fig. 12 in more detail. The CNN 1220 may for example be an artificial neural network in hardware, e.g. a neural network on GPUs or any other hardware specialized for the purpose of implementing an artificial neural network. The CNN 1220 may for example implement a source separation 101, a beat detection 201, a rhythm pattern estima- tion 202, a rhythm pattern selection 203 that realize the processes described with regard to Figs. 4and Fig. 12 in more detail. Loudspeaker array 1211 consists of one or more loudspeakers that are distributed over a predefined space and is configured to render any kind of audio, such as 3D audio. The electronic device 1200 further comprises a user interface 1212 that is connected to the processor 1201. This user interface 1212 acts as a man-machine interface and enables a dialogue be- tween an administrator and the electronic system. For example, an administrator may make configu- rations to the system using this user interface 1212. The electronic device 1200 further comprises an Ethernet interface 1221, a Bluetooth interface 1204, and a WLAN interface 1205. These units 1204, 1205 act as 1/ O interfaces for data communication with external devices. For example, additional loudspeakers, microphones, and video cameras with Ethernet, WLAN or Bluetooth connection may be coupled to the processor 1201 via these interfaces 1221, 1204, and 1205. The electronic device 1200 further comprises a data storage 1202 and a data memory 1203 (here a RAM). The data memory 1203 is arranged to temporarily store or cache data or computer instructions for processing by the processor 1201. The data storage 1202 is arranged as a long-term storage, e.g. for recording sensor data obtained from the microphone array 1210 and provided to or retrieved from the CNN 1220.
It should be noted that the description above is only an example configuration. Alternative configu- rations may be implemented with additional or other sensors, storage devices, interfaces, or the like.
It should be recognized that the embodiments describe methods with an exemplary ordering of method steps. The specific ordering of method steps is, however, given for illustrative purposes only and should not be construed as binding.
It should also be noted that the division of the electronic device of Fig. 15 into units is only made for illustration purposes and that the present disclosure is not limited to any specific division of functions in specific units. For instance, at least parts of the circuitry could be implemented by a re- spectively programmed processor, field programmable gate array (FPGA), dedicated circuits, and the like.
All units and entities described in this specification and claimed in the appended claims can, if not stated otherwise, be implemented as integrated circuit logic, for example, on a chip, and functionality provided by such units and entities can, if not stated otherwise, be implemented by software.
In so far as the embodiments of the disclosure described above are implemented, at least in part, us- ing software-controlled data processing apparatus, it will be appreciated that a computer program providing such software control and a transmission, storage or other medium by which such a com- puter program is provided are envisaged as aspects of the present disclosure.
Note that the present technology can also be configured as described below.
(1) An electronic device comprising circuitry configured to perform audio source separation (101) on an audio input signal to extract one or more percussive separations ( SBD(n), SSD (n), sRD(n)) and to perform rhythmic enhancement on the one or more percussive separations ( SBD(n), SSD (n), sRD(n)) to obtain at least one enhanced percussive separation (s, D(n)). (2) The electronic device of (1), wherein the circuitry is configured to amplify at least one of the percussive separations ( sBD(n ), sSD(n ), sRD(n)) by a gain factor ( g(n)) to obtain the at least one enhanced percussive separation (s, D(n)).
(3) The electronic device of (1) or (2), wherein the circuitry is configured to perform rhythm pattern estimation (202) on the percussive separations ( sBD(n ), sSD(n ), sRD(n)) to obtain esti- mated rhythm patterns (RP[ ]).
(4) The electronic device of (3), wherein the circuitry is configured to perform rhythm pattern selection (203) based on the rhythm patterns ( RP[ ]) to obtain the at least one enhanced percussive separation (s'D(n)). (5) The electronic device of (3), wherein the circuitry is configured to perform a spectral analysis
(404) of a percussive separation ( sBD(n ), sSD(n), sRD(n )) to estimate a rhythm pattern ( RP[ ]).
(6) The electronic device of (3), wherein the circuitry is configured to perform beat detection (201) on the audio input signal (x(n)) to obtain a beat frequency (w) and to perform the rhythm pattern estimation (202) based on a percussive separation (sBD(n), sSD(n ), sRD(n)) and the beat frequency (ω).
(7) The electronic device of anyone of (1) to (6), wherein the circuitry is configured to determine by a rhythm similarity estimation (900) a similarity (904) between a percussive separation ( SBD(n), SSD (n), sRD(n)) and a beat signal (&(n)) to obtain at least one enhanced percussive separation
( s'D(n )). (8) The electronic device of (7), wherein the circuitry is configured to perform similarity-to-gain mapping (302) based on the similarity (904) to obtain gain factors (g( n)) used for obtaining the at least one enhanced percussive separation (s, D(n)).
(9) The electronic device of anyone of (1) to (8), wherein the circuitry is configured to perform a spectral comparison (902, 903) between a percussive separation ( sBD(n ), sSD(n ), sRD(n )) and a beat signal (&(n)) to obtain at least one enhanced percussive separation (s'D (n)).
(10) The electronic device of anyone of (1) to (9), wherein the circuitry is configured to perform the source separation (101) on the audio input signal (x(n)) to obtain one or more percussive sepa- rations ( sD(n )) and one or more non-percussive separations (sNP(n)), and to perform mixing of the enhanced separated source (s'D(n)) with the one or more non-percussive separations ( sNP(n )), to obtain an enhanced audio signal (V(n)). (11) The electronic device of anyone of (1) to (10), wherein the one or more percussive separa- tions (sD(n)) comprise drums.
(12) The electronic device of anyone of (1) to (11), wherein the circuitry comprises a microphone to acquire the audio input signal (x(n)). (13) The electronic device of anyone of (1) to (12), wherein the circuitry further comprises a loudspeaker system (104) to output the enhanced audio signal (x'(n)).
(14) The electronic device of anyone of (1) to (13), wherein the circuitry further comprises a vi- bration motor to output vibrations based on a signal of the enhanced separation (s' D(n)).
(15) A method comprising: performing audio source separation (101) on an audio input signal (x(n)) to extract one or more percussive separations ( SBD(n), SSD (n), sRD(n)) ; and performing rhythmic enhancement on the one or more percussive separations ( SBD(n), SSD (n), sRD(n)) to obtain at least one enhanced percussive separation (s'D(n)).
(16) A computer program comprising instructions, the instructions when executed on a processor causing the processor to perform the method of (15).

Claims

1. An electronic device comprising circuitry configured to perform audio source separation on an audio input signal to extract one or more percussive separations and to perform rhythmic en- hancement on the one or more percussive separations to obtain at least one enhanced percussive separation.
2. The electronic device of claim 1, wherein the circuitry is configured to amplify at least one of the percussive separations by a gain factor to obtain the at least one enhanced percussive separation.
3. The electronic device of claim 1, wherein the circuitry is configured to perform rhythm pat- tern estimation on the percussive separations to obtain estimated rhythm patterns.
4. The electronic device of claim 3, wherein the circuitry is configured to perform rhythm pat- tern selection based on the rhythm patterns to obtain the at least one enhanced percussive separa- tion.
5. The electronic device of claim 3, wherein the circuitry is configured to perform a spectral analysis of a percussive separation to estimate a rhythm pattern.
6. The electronic device of claim 3, wherein the circuitry is configured to perform beat detec- tion on the audio input signal to obtain a beat frequency and to perform the rhythm pattern estima- tion based on a percussive separation and the beat frequency.
7. The electronic device of claim 1, wherein the circuitry is configured to determine by a rhythm similarity estimation a similarity between a percussive separation and a beat signal to obtain at least one enhanced percussive separation.
8. The electronic device of claim 7, wherein the circuitry is configured to perform similarity-to- gain mapping based on the similarity to obtain gain factors used for obtaining the at least one en- hanced percussive separation.
9. The electronic device of claim 1, wherein the circuitry is configured to perform a spectral comparison between a percussive separation and a beat signal to obtain at least one enhanced per- cussive separation.
10. The electronic device of claim 1, wherein the circuitry is configured to perform the source separation on the audio input signal to obtain one or more percussive separations and one or more non-percussive separations, and to perform mixing of the enhanced separated source with the one or more non-percussive separations, to obtain an enhanced audio signal.
11. The electronic device of claim 1, wherein the one or more percussive separations comprise drums.
12. The electronic device of claim 1, wherein the circuitry comprises a microphone to acquire the audio input signal.
13. The electronic device of claim 1, wherein the circuitry further comprises a loudspeaker sys- tem to output the enhanced audio signal.
14. The electronic device of claim 1, wherein the circuitry further comprises a vibration motor to output vibrations based on a signal of the enhanced separation.
15. A method comprising: performing audio source separation on an audio input signal to extract one or more percus- sive separations; and performing rhythmic enhancement on the one or more percussive separations to obtain at least one enhanced percussive separation.
16. A computer program comprising instructions, the instructions when executed on a processor causing the processor to perform the method of claim 15.
PCT/EP2021/070306 2020-07-30 2021-07-20 Multiple percussive sources separation for remixing. WO2022023130A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP20188599.3 2020-07-30
EP20188599 2020-07-30

Publications (1)

Publication Number Publication Date
WO2022023130A1 true WO2022023130A1 (en) 2022-02-03

Family

ID=71894670

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/070306 WO2022023130A1 (en) 2020-07-30 2021-07-20 Multiple percussive sources separation for remixing.

Country Status (1)

Country Link
WO (1) WO2022023130A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010054802A (en) * 2008-08-28 2010-03-11 Univ Of Tokyo Unit rhythm extraction method from musical acoustic signal, musical piece structure estimation method using this method, and replacing method of percussion instrument pattern in musical acoustic signal
US20170034624A1 (en) * 2013-07-12 2017-02-02 Wim Buyens Pre-Processing of a Channelized Music Signal
EP3201917A1 (en) 2014-10-02 2017-08-09 Sony Corporation Method, apparatus and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010054802A (en) * 2008-08-28 2010-03-11 Univ Of Tokyo Unit rhythm extraction method from musical acoustic signal, musical piece structure estimation method using this method, and replacing method of percussion instrument pattern in musical acoustic signal
US20170034624A1 (en) * 2013-07-12 2017-02-02 Wim Buyens Pre-Processing of a Channelized Music Signal
EP3201917A1 (en) 2014-10-02 2017-08-09 Sony Corporation Method, apparatus and system

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ILYA SHMULEVICH ET AL., PERCEPTUAL ISSUES IN MUSIC PATTERN RECOGNITION: COMPLEXITY OF RHYTHM AND KEY FINDING, 23 February 1999 (1999-02-23)
JONATHAN FOOTESHINGO UCHIHASHI: "IEEE International Conference on Multimedia & Expo 2001", 2001, IEEE, article "The Beat Spectrum: a New Approach To Rhythm Analysis"
MAKARAND VELANKARDR. PARAG KULKARNI, PATTERN RECOGNITION FOR COMPUTATIONAL MUSIC, December 2017 (2017-12-01)
SANKALP GULATIPREETI RAO: "Rhythm Pattern Representations for Tempo Detection in Music", PROC. OF THE FIRST INTERNATIONAL CONFERENCE ON INTELLIGENT INTERACTIVE TECHNOLOGIES AND MULTIMEDIA, December 2010 (2010-12-01)
UHLICHSTEFAN ET AL.: "2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP", 2017, IEEE, article "Improving music source separation based on deep neural networks through data augmentation and network blending"
YONGWEI ZHU ET AL: "Drum loop pattern extraction from polyphonic music audio", MULTIMEDIA AND EXPO, 2009. ICME 2009. IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 28 June 2009 (2009-06-28), pages 482 - 485, XP031510795, ISBN: 978-1-4244-4290-4 *

Similar Documents

Publication Publication Date Title
WO2019229199A1 (en) Adaptive remixing of audio content
JP5642296B2 (en) Input interface for generating control signals by acoustic gestures
US20230186782A1 (en) Electronic device, method and computer program
US20180219521A1 (en) Sound Processing Device and Sound Processing Method
EP3255904A1 (en) Distributed audio mixing
US12014710B2 (en) Device, method and computer program for blind source separation and remixing
US20230057082A1 (en) Electronic device, method and computer program
US9445210B1 (en) Waveform display control of visual characteristics
US20220076687A1 (en) Electronic device, method and computer program
KR20150080740A (en) Method and apparatus for generating audio signal and vibration signal based on audio signal
WO2022023130A1 (en) Multiple percussive sources separation for remixing.
Wieczorkowska et al. Identification of a dominating instrument in polytimbral same-pitch mixes using SVM classifiers with non-linear kernel
WO2017135350A1 (en) Recording medium, acoustic processing device, and acoustic processing method
US20220392461A1 (en) Electronic device, method and computer program
KR20210148916A (en) Techniques for audio track analysis to support audio personalization
JP2021097406A (en) Audio processing apparatus and audio processing method
CN113348508B (en) Electronic device, method and computer program
Canfer Music Technology in Live Performance: Tools, Techniques, and Interaction
US20230135778A1 (en) Systems and methods for generating a mixed audio file in a digital audio workstation
US20230215454A1 (en) Audio transposition
WO2023062865A1 (en) Information processing apparatus, method, and program
WO2023052345A1 (en) Audio source separation
CN114827886A (en) Audio generation method and device, electronic equipment and storage medium
WO2014142201A1 (en) Device and program for processing separating data
CA3235626A1 (en) Generating tonally compatible, synchronized neural beats for digital audio files

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21746737

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21746737

Country of ref document: EP

Kind code of ref document: A1