CN101604526B

CN101604526B - Weight-based system and method for calculating audio frequency attention

Info

Publication number: CN101604526B
Application number: CN2009100630452A
Authority: CN
Inventors: 胡瑞敏; 杭波; 马晔; 高戈; 杨玉红; 周成; 王晓晨
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2009-07-07
Filing date: 2009-07-07
Publication date: 2011-11-16
Anticipated expiration: 2029-07-07
Also published as: CN101604526A

Abstract

The invention relates to the technical field for processing audio frequency signals, in particular to a weight-based system and a method for calculating audio frequency attention. The system comprises an initialization module, an interesting sound attention calculation module, a switch module, a disturbing sound attention calculation module and an attention fusion module, wherein the attention fusion module calculates the intermediate results of the modules so as to obtain the weight-based attention M of interesting sound and the weight-based attention N of disturbing sound, finally calculates to obtain the fusion attention Ma which is obtained after attention fusion and output the fusion attention. The invention has the advantages that a proper attention judgment threshold value of the audio frequency attention can be easily set under different total types of interesting sound, thereby reducing the difficulty of judging the audio frequency attention and increasing the accuracy of calculating and judging the audio frequency attention.

Description

System and method for calculating audio attention based on weight

Technical Field

The invention relates to the technical field of audio signal processing, in particular to a system and a method for calculating audio attention based on weight.

Background

Attention is the degree of human attention to objects, while audio attention is attention to audio objects and features. Currently, existing audio attention calculation methods include a top-down method and a bottom-up method. The top-down audio attention calculation method is characterized in that whether an object is an attention object is determined by classifying and judging the object according to different attention degrees of people on specific class objects, such as different attention degrees of different types of sounds such as explosion sound and voice in audio; the bottom-up audio attention calculation method is to determine whether an object belongs to an object of interest by analyzing features of the object according to human attention to the features of the object, such as energy and frequency.

The following describes a technical scheme of an audio attention calculation method, which is a common audio attention calculation method in the prior art, in detail.

When a piece of audio contains multiple types of sounds, attention (M) describing the various types of sounds of interest in the piece of audio is first sought, usually based on audio classification₁，M₂… …) and then further calculates the audio attention M of the audio file at a certain moment_a：

And is

Wherein M is_iAttention, λ, to the i-th class of sounds of interest in a segment of audio_iCalculating the audio attention M for the ith type interesting sound contained in the audio_aThe weight of time, n, is the total number of sound types of interest contained in the piece of audio. And M_iThe calculation method is as follows:

M_i＝E·P_i

wherein E ═ E_avr/Max_E_avr，E_avrMax _ E, the average energy of the i-th class of sounds of interest in the piece of audio_avrFor the whole audio file E in which the section of audio is positioned_avrIs measured.

And P is_i＝exp(s_i-Max_s_i)，s_iLog-likelihood score (Max _ s) of i-th class of sound of interest for the segment of audio_iIs the whole audio file s of the section of audio_iIs measured.

If the type of the interesting sound which needs to be included in the audio attention calculation is less, for example, only 2 to 3 types of interesting sounds are included in the audio, the audio attention calculation method can better judge the interesting soundIn other words, when the user focuses on a plurality of sound types, the different types of interesting sounds are given a weight λ_iWill decline in a reciprocal curve, i.e. as n increases, M_aWill be much less than 1, or even close to 0; meanwhile, under different conditions, the number of types of interesting sounds is different, so that when the audio attention is determined by the audio attention calculation method in the prior art, it is inconvenient to select a proper attention determination threshold value for the audio sequence under different conditions, and the determination of the audio attention is complex and inaccurate.

Considering that there are some types of sounds in audio that are not sounds of interest but have some similar features to sounds of interest that need to be included in the audio attention calculation, it is possible that they will be misjudged as sounds of interest, which we call interference sounds. If the effects of these types of sounds are not excluded from the audio attention calculation, the accuracy of the audio attention may be affected.

Disclosure of Invention

The invention aims to provide a weight-based audio attention calculation system and method, which enable the audio attention to be easily set to be a proper attention judgment threshold value under the condition that the total number of interesting sound types is different, and eliminate the influence of interference sound types similar to the interesting sound types in audio when the audio attention is calculated.

In order to achieve the purpose, the invention adopts the following technical scheme:

a weight-based audio attention calculation system, comprising:

an initialization module for setting the total number i of the interested sound types and the weight lambda of each interested sound_iSetting the total number j of the types of the interference sounds if the interference sounds to be eliminated exist, and setting the weight value omega of each type of the interference sounds_j；

Interesting sound attention calculationA module for receiving the total number i of the interested sound types output by the initialization module, detecting the i-type interested sounds and respectively calculating the attention M of the i-th type interested sound_i；

The switch module is used for receiving the total number j of the interference sound types output by the initialization module and judging whether the total number j of the interference sound types needs to be output to the interference sound attention calculation module;

an interference sound attention degree calculation module for receiving the total number j of the types of the interference sounds output by the switch module, detecting the j types of the interference sounds and respectively calculating the attention degree N of the j type of the interference sounds_j；

An attention fusion module for receiving the weight value lambda of each interesting sound output from the initialization module_iAnd the weight omega of each interference sound_jReceiving attention M of the i-th type of interesting sound output by the interesting sound attention calculation module_iReceiving attention degree N of j-th interference sound output by the interference sound attention degree calculation module_j(ii) a And calculating the intermediate result to obtain the attention M based on weight of the interesting sound and the attention N based on weight of the interference sound, and finally calculating to obtain the fusion attention M obtained after attention fusion_aAnd outputs the fusion attention degree,

wherein the total number of interesting sound types i is greater than 0 and the total number of disturbing sound types j is greater than or equal to 0.

The interesting sound attention degree calculation module is obtained by adopting any existing audio attention degree calculation method when the attention degree of the interesting sound is calculated.

In the interesting sound attention calculation module, the selection range of the weights of all types of interesting sounds is from 0 to 1, including 0 and 1.

In the disturbing sound attention degree calculation module, the weight of each kind of disturbing sound is selected from 0 to 1, including 0 and 1.

The method for calculating the audio attention based on the weight comprises the following steps:

step 1: inputting an audio signal to be calculated, and initializing the total number i of interested sound types and the total number j of interference sound types;

step 2: detecting i-type interesting sounds in the audio signal and calculating the attention M of various interesting sounds_i；

And step 3: setting the weight lambda of various interesting sounds in the audio signal_iAnd a weight-based audio attention of the sound of interest is calculated, that is,

λ_i∈[0，1]

and 4, step 4: judging whether j is 0 or not according to the total number j of the interference sound types initialized in the step 1; if j is 0, the weight-based audio attention N of the disturbing sound is 0, and step 7 is performed; otherwise, executing step 5;

and 5: detecting j interference sounds in the audio signal, and calculating attention degree N of the interference sounds_j；

Step 6: setting the weight omega of various interference sounds in the audio signal_jAnd a weight-based audio attention calculation for each type of disturbing sound, that is,

ω_j∈[0，1]

and 7: calculating a fusion attention M according to the audio attention M based on the weight of the interesting sound and the attention N based on the weight of the interference sound obtained in the step_aThat is to say that,

M_a＝M(1-N)。

the invention has the following advantages and positive effects:

1) the audio attention is easy to set a proper attention judgment threshold value under the condition that the total number of the interesting sound types is different;

2) excluding the influence of interfering sound types similar to the type of sound of interest in the audio;

3) the judgment difficulty of the audio attention is reduced, and the calculation and judgment accuracy of the audio attention is improved.

Drawings

FIG. 1 is a block diagram of a system for calculating audio attention based on weight according to the present invention.

Fig. 2 is a flowchart of a method for calculating audio attention based on weight according to the present invention.

Fig. 3 is a graph showing the calculation results of the audio attention of 10 sounds of interest in the embodiment of the present invention.

Fig. 4 is a graph showing the calculation result of the audio attention of 10 kinds of interesting sounds and 1 kind of disturbing sounds to be eliminated in the embodiment of the present invention.

Fig. 5 is a graph of the result of audio attention calculation based on weights in the embodiment of the present invention.

Fig. 6 is a diagram of the result of weight-based audio attention calculation without considering sounds to be excluded in the embodiment of the present invention.

Fig. 7 is a diagram of an audio attention calculation result obtained by using a conventional audio attention calculation method according to an embodiment of the present invention.

Wherein,

the method comprises the following steps of 1-initializing module, 2-interesting sound attention calculating module, 3-switching module, 4-interference sound attention calculating module and 5-attention fusing module.

Detailed Description

The invention is further illustrated by the following specific examples in conjunction with the accompanying drawings:

the invention provides a weight-based audio attention calculation system, which specifically adopts the following technical scheme, referring to fig. 1, and comprises an initialization module 1, an interesting sound attention calculation module 2, a switch module 3, an interference sound attention calculation module 4 and an attention fusion module 5.

An initialization module 1 for setting the total number i of the types of the sounds of interest and the weight λ of each sound of interest_iSetting the total number j of the types of the interference sounds if the interference sounds to be eliminated exist, and setting the weight value omega of each type of the interference sounds_j；

An interesting sound attention degree calculation module 2 for receiving the total number i of interesting sound types output from the initialization module 1 and detecting the i types of interesting soundsMeasuring and respectively calculating the attention M of the i-th class of interesting sound_i；

The switch module 3 is used for receiving the total number j of the interference sound types output by the initialization module 1 and judging whether the total number j of the interference sound types needs to be output to the interference sound attention degree calculation module 4;

an interference sound attention degree calculation module 4, configured to receive the total number j of the types of interference sounds output by the switch module 3, detect j types of interference sounds, and calculate attention degrees N of j types of interference sounds respectively_j；

An attention fusion module 5 for receiving the weight λ of each sound of interest output from the initialization module 1_iAnd the weight omega of each interference sound_jReceiving attention M from the i-th type of sound of interest outputted from the attention calculation block 2_iReceiving attention degree N of j-th interference sound output by interference sound attention degree calculation module 4_j(ii) a And calculating the intermediate result to obtain the attention M based on weight of the interesting sound and the attention N based on weight of the interference sound, and finally calculating to obtain the fusion attention M obtained after attention fusion_aAnd outputs fusion attention M_a。

Under an embodiment of the present invention, it may be set that each type of interesting sound has the same attention degree, each type of interesting sound has the same weight when calculating the attention degree, and λ may be taken_i＝ω_j＝1。

The method for calculating the audio attention based on the weight provided by the invention specifically adopts the following technical scheme, referring to fig. 2, and comprises the following steps:

step 2: detecting a sense of class i in the audio signalInteresting sound, and calculating the attention M of various interesting sounds_i；

λ_i∈[0，1]

ω_j∈[0，1]

M_a＝M(1-N)

one embodiment of the weight-based audio attention calculation method of the present invention is described as follows:

an audio signal includes 11 different sound types, wherein the first 10 sound types are the sound types of interest, the 6 th sound type is speech, the 11 th sound type is the interfering sound type to be excluded, and the sound type is broadcast speech.

Inputting audio signals to be calculated, namely audio signals containing 11 different sound types, and setting the total number i of interesting sound types to be 10 and the total number j of interference sound types to be 1;

calculating the attention of each type of interesting sound in the audio signal, namely the attention of the 10 interesting sounds is M in sequence₁，M₂，...M₁₀(see FIG. 3);

calculating the attention of each type of sound to be excluded in the audio signal, namely M₁₁(see FIG. 4);

setting the weights of various interesting sounds in the audio signal, namely the weights of 10 interesting sounds are lambda₁＝λ₂...＝λ₁₀＝1；

Setting the weight of various interference sounds in the audio signal, namely the weight of the 11 th interference sound to be eliminated, namely the broadcast voice, to be omega₁＝1；

The weight-based audio attention calculation of the sounds of interest is performed on the attentions and weights of the above-mentioned 10 kinds of sounds of interest, that is,

λ₁＝λ₂＝...＝λ₁₀＝1

the audio attention calculation based on the weight of the interference sound is performed on the attention and the weight of the above-mentioned 1 kinds of interference sounds, that is,

N＝1-(1-ω₁N₁)，ω₁＝1

calculating fusion attention M_a3I.e. M_a3＝M(1-N)；

The audio attention based on the weight, which outputs 10 sounds including a gunshot sound, an alarm sound, a voice, a car whistle sound, etc., and excludes the audio signal of the broadcast voice, is M_a3(see FIG. 5);

in the above steps, if the weight of each type of interference sound in the audio signal is set to 0, i.e. the weight of the 11 th type of sound broadcast voice is ω ₁0, i.e., without considering the influence of the sound to be disturbed on the attention, is available (see fig. 6),

in the above-mentioned steps, if the calculation is performed according to the existing audio attention calculation method described in the background art, that is, the calculation is performed according to the existing audio attention calculation method

And lambda_i＝1/10＝0.1

Then M_a1The resulting audio attention is calculated for the existing audio attention calculation method (see fig. 7).

Comparing fig. 5 and fig. 6, the audio attention calculated by the weight-based audio attention calculation method according to the present invention excludes the influence of the disturbing sound type in the audio, i.e., broadcast voice, on the audio attention calculation.

Comparing fig. 6 and fig. 7, the maximum value of the audio attention value calculated by the weight-based audio attention calculation method according to the present invention is always close to 1, while the audio attention value calculated by the conventional attention calculation method is continuously decreased as the total number of the types of the sounds of interest in the audio increases. Therefore, compared with the existing audio attention calculation method, the audio attention calculation method provided by the invention is more convenient for setting a proper attention judgment threshold value under the condition that the total number of the interested sounds is different, so that the attention calculation and judgment are simpler and more accurate.

Claims

1. A weight-based audio attention calculation system, comprising:

an initialization module (1) for setting the total number i of sound types of interest and the weight λ of each sound of interest_iSetting the total number j of the types of the interference sounds if the interference sounds to be eliminated exist, and setting the weight value omega of each type of the interference sounds_j；

An interesting sound attention degree calculation module (2) for receiving the total number i of interesting sound types output by the initialization module (1) and carrying out I types of interesting soundsDetecting and respectively calculating the attention M of the i-th class of interesting sound_i；

The switch module (3) is used for receiving the total number j of the interference sound types output by the initialization module (1) and judging whether the total number j of the interference sound types needs to be output to the interference sound attention calculation module (4);

an interference sound attention degree calculation module (4) for receiving the total number j of the types of the interference sounds output by the switch module (3), detecting the j types of the interference sounds and respectively calculating the attention degree N of the j type of the interference sounds_j；

A focus fusion module (5) for receiving the weight lambda of each interesting sound output from the initialization module (1)_iAnd the weight omega of each interference sound_jReceiving attention M of the ith type of interesting sound output by the interesting sound attention calculation module (2)_iReceiving attention degree N of j-th interference sound output by the interference sound attention degree calculation module (4)_j(ii) a Calculating the intermediate result input by the attention fusion module to obtain the attention M of the interested sound based on weight and the attention N of the interference sound based on weight, and finally calculating to obtain the fusion attention M obtained after attention fusion_aAnd outputs fusion attention M_aWherein:

M_a＝M(1-N)，

λ_i∈[0，1]；

ω_j∈[0，1](ii) a m and n are positive integers

2. The weight-based audio attention computing system of claim 1, wherein:

3. The weight-based audio attention computing system of claim 1, wherein:

4. A method for calculating audio attention based on weight is characterized by comprising the following steps:

And step 3: the weight λ i of each type of sound of interest in the audio signal is set, and the weight-based audio attention of the sound of interest is calculated, i.e.,

λ_i∈[0，1]n is a positive integer

ω_j∈[0，1]m is a positive integer

M_a＝M(1-N)。