Summary of the invention
In view of this, the object of the present invention is to provide a kind of residual echo detection method and system, eliminate sordid problem to solve in the prior art the interrupted or residual echo of near-end speech.Technical scheme is as follows:
A kind of residual echo detection method comprises:
Calculate the statistic relevant with the voice signal situation, described statistic comprises the level and smooth energy of reference signal energy, residual signals and the level and smooth energy of near end input signal, and the ratio of the level and smooth energy of described residual signals and the level and smooth energy of near end input signal;
Determine the current speech signal(l)ing condition according to described statistic, with reference to the corresponding relation between predefined voice signal situation and the detection threshold value, select the detection threshold value corresponding with the current speech signal(l)ing condition.
Preferably, in the said method, described definite current speech signal(l)ing condition comprises:
The ratio of judging the level and smooth energy of described residual signals and the level and smooth energy of near end input signal whether less than first preset value or the level and smooth energy of residual signals whether less than second preset value, if, then determine the current near-end speech that do not exist, otherwise, then determine the current near-end speech that exists.
Preferably, in the said method, described definite current speech signal(l)ing condition also comprises:
When the ratio of the level and smooth energy of residual signals and the level and smooth energy of near end input signal less than first preset value or the level and smooth energy of residual signals less than second preset value, be current when not having near-end speech, further judge that whether the reference signal energy is less than the 3rd preset value, if, determine that then far-end speech is small-signal, otherwise, determine that then far-end speech is not small-signal.
Preferably, in the said method, described definite current speech signal(l)ing condition also comprises:
Current when not existing near-end speech and far-end speech to be not small-signal, further judge whether the residual echo flag of frame of former frame is 1, if determine that then the former frame of present frame is the residual echo frame, otherwise, determine that then the former frame of present frame is not the residual echo frame.
Preferably, in the said method, described definite current speech signal(l)ing condition also comprises:
When the ratio of the level and smooth energy of residual signals and the level and smooth energy of near end input signal is not less than first preset value, and the level and smooth energy of residual signals is not less than second preset value, be current when having near-end speech, further judge whether the residual echo flag of frame of former frame is 1, if, the former frame of then determining present frame is the residual echo frame, otherwise, determine that then the former frame of present frame is not the residual echo frame.
A kind of residual echo detection system comprises:
Computing unit, be used for calculating the statistic relevant with the voice signal situation, described statistic comprises the level and smooth energy of reference signal energy, residual signals and the level and smooth energy of near end input signal, and the ratio of the level and smooth energy of described residual signals and the level and smooth energy of near end input signal;
Memory cell is used for the corresponding relation between the predefined voice signal situation of storage and the detection threshold value;
Selected cell is used for determining the current speech signal(l)ing condition according to described statistic, with reference to the corresponding relation between predefined voice signal situation and the detection threshold value, selects the detection threshold value corresponding with the current speech signal(l)ing condition.
Preferably, in the said system, described selected cell comprises:
First judging unit, the ratio that is used for judging the level and smooth energy of described residual signals and the level and smooth energy of near end input signal whether less than first preset value or the level and smooth energy of residual signals whether less than second preset value;
First determining unit, be used for when the judged result of described first judging unit when being, determine the current near-end speech that do not exist, otherwise, definite current near-end speech that exists then.
Preferably, in the said system, described selected cell also comprises:
Second judging unit, be used for when the judged result of described first judging unit when being, judge that further whether the reference signal energy is less than the 3rd preset value;
Second determining unit, be used for when the judged result of described second judging unit when being, determine that far-end speech is small-signal, otherwise then definite far-end speech is not small-signal.
Preferably, in the said system, described selected cell also comprises:
The 3rd judging unit, be used for judged result when described first judging unit and be and the judged result of second judging unit for not the time, judge further whether former frame residual echo flag of frame is 1, whether be the residual echo frame with the former frame of definite present frame;
The 3rd determining unit, be used for when the judged result of described the 3rd judging unit when being, the former frame of determining present frame is the residual echo frame, otherwise then the former frame of definite present frame is not the residual echo frame.
Preferably, in the said system, described selected cell also comprises:
The 4th judging unit, be used for when the judged result of described first judging unit when denying, judge further whether the residual echo flag of frame of former frame is 1;
The 4th determining unit, be used for when the judged result of described the 4th judging unit when being, the former frame of determining present frame is the residual echo frame, otherwise then the former frame of definite present frame is not the residual echo frame.
By technique scheme as can be known, compared with prior art, the embodiment of the invention is by calculating the statistic relevant with voice signal, and determine the current speech signal(l)ing condition according to these statistics, and then with reference to the corresponding relation between predefined voice signal situation and the detection threshold value, select the detection threshold value corresponding with the current speech signal, realized utilizing the detection threshold value of following the dynamic change of voice signal situation to go to detect residual echo, thereby improved the accuracy rate that detects greatly, solved in the prior art the interrupted or residual echo of near-end speech and eliminated sordid problem.
Embodiment
At first a kind of residual echo detection method provided by the invention is described, comprising:
Calculate the statistic relevant with the voice signal situation, described statistic comprises the level and smooth energy of reference signal energy, residual signals and the level and smooth energy of near end input signal, and the ratio of the level and smooth energy of described residual signals and the level and smooth energy of near end input signal;
Determine the current speech signal(l)ing condition according to described statistic, with reference to the corresponding relation between predefined voice signal situation and the detection threshold value, select the detection threshold value corresponding with the current speech signal.
The embodiment of the invention is by calculating the statistic relevant with voice signal, and determine the current speech signal(l)ing condition according to these statistics, and then with reference to the corresponding relation between predefined voice signal situation and the detection threshold value, select the detection threshold value corresponding with the current speech signal, realized utilizing the detection threshold value of following the dynamic change of voice signal situation to go to detect residual echo, thereby improved the accuracy rate that detects greatly, solved in the prior art the interrupted or residual echo of near-end speech and eliminated sordid problem.
Below in conjunction with the accompanying drawing among the present invention, technical scheme of the present invention is clearly and completely described.
Embodiment one:
Referring to shown in Figure 3, the residual echo detection method that the embodiment of the invention provides can may further comprise the steps:
S301, calculate the statistic relevant with the voice signal situation, described statistic comprises the level and smooth energy of reference signal energy, residual signals and the level and smooth energy of near end input signal, and the ratio of the level and smooth energy of described residual signals and the level and smooth energy of near end input signal.
Before detecting residual echo, at first to calculate the statistic relevant with the voice signal situation, to judge the current speech signal(l)ing condition, specifically be calculated as follows:
The energy P of reference signal x (n)
x:
P
x=∑
nx
2(n) ... ... ... ... ... ... (formula 2)
Wherein, the value of n be (0,1,2 ... 159).
The level and smooth energy P of residual signals e (n)
e_ avg:
P
e_ avg=0.1* ∑ e
2(n)+0.9* ∑ e
2(n-N) ... (formula 3)
Wherein, the value of N be the value of 160, n be (0,1,2 ... N-1).
The level and smooth energy P of near end input signal d (n)
d_ avg:
P
d_ avg=0.1* ∑ d
2(n)+0.9* ∑ d
2(n-N) ... (formula 4)
Wherein, the value of N be the value of 160, n be (0,1,2 ... N-1).
The level and smooth energy P of residual signals e (n)
eThe level and smooth energy P of _ avg and near end input signal d (n)
dThe ratio of _ avg:
... ... ... ... ... ... ... .. (formula 5)
S302 determines the current speech signal(l)ing condition according to described statistic, with reference to the corresponding relation between predefined voice signal situation and the detection threshold value, selects the detection threshold value corresponding with the current speech signal.
After calculating the statistic relevant with the voice signal situation, judge the situation of current speech signal according to the residual echo sign of these statistics and former frame, for example: whether have near-end speech, whether far-end speech is small-signal, and whether former frame is the residual echo frame.According to current definite voice signal situation, with reference to the corresponding relation between predefined voice signal situation and the detection threshold value, select the detection threshold value corresponding with the current speech signal then.
By the embodiment of the invention as can be seen, the embodiment of the invention is by calculating the statistic relevant with voice signal, and determine the current speech signal(l)ing condition according to these statistics, and then with reference to the corresponding relation between predefined voice signal situation and the detection threshold value, select the detection threshold value corresponding with the current speech signal, realized utilizing the detection threshold value of following the dynamic change of voice signal situation to go to detect residual echo, thereby improved the accuracy rate that detects greatly, solved in the prior art the interrupted or residual echo of near-end speech and eliminated sordid problem.
Embodiment two:
In the middle of the application demand of reality, whether near-end speech exists is the important evidence that the residual echo detection threshold value is selected.In view of this, the invention provides according to statistic and determine whether to exist near-end speech, whether have the residual echo detection method of selecting with it corresponding residual echo detection threshold value according to near-end speech then.With reference to figure 4, specifically may further comprise the steps:
S401, calculate the statistic relevant with the voice signal situation, described statistic comprises the level and smooth energy of reference signal energy, residual signals and the level and smooth energy of near end input signal, and the ratio of the level and smooth energy of described residual signals and the level and smooth energy of near end input signal.
The specific implementation process of S401 is described in detail in embodiment one S301, so locate to repeat no more.
S402, the ratio of judging the level and smooth energy of described residual signals and the level and smooth energy of near end input signal whether less than first preset value or the level and smooth energy of residual signals whether less than second preset value, if, then carry out S403, otherwise, then carry out S404.
The ratio of the level and smooth energy of the residual signals that utilizes S401 to calculate and the level and smooth energy of near end input signal and the level and smooth energy of residual signals, compare with first preset value and second preset value respectively, when the ratio of the level and smooth energy of residual signals and the level and smooth energy of near end input signal less than first preset value or the level and smooth energy of residual signals during less than second preset value, carry out S403, otherwise, then carry out S404.In the middle of actual at present communications applications, the preferred value of first preset value is that the preferred value of 0.5, the second preset value is-65dB.
S403 determines the current near-end speech that do not exist, and selects and the current corresponding detection threshold value of near-end speech that do not exist.
When the judged result of S402 for being, be that the ratio of the level and smooth energy of residual signals and the level and smooth energy of near end input signal is less than first preset value or the level and smooth energy of residual signals during less than second preset value, determine the current near-end speech that do not exist, there is not a corresponding relation between near-end speech and the detection threshold value according to what set in advance then, selects corresponding detection threshold value.At this moment, owing to there is not near-end speech, therefore do not worry the interrupted of near-end speech, can select a bigger detection threshold value.The embodiment of the invention is in the applied environment of reality, and the preferred value of the detection threshold value corresponding with there not being near-end speech is 0.3.
S404 determines the current near-end speech that exists, and selects and the current corresponding detection threshold value of near-end speech that exists.
When the judged result of S402 for not, be that the ratio of the level and smooth energy of residual signals and the level and smooth energy of near end input signal is when being not less than first preset value and the level and smooth energy of residual signals and being not less than second preset value, determine the current near-end speech that exists, according to the corresponding relation between near-end speech and the detection threshold value of existing that sets in advance, select corresponding detection threshold value then.At this moment, owing to there is near-end speech, therefore need to consider the interrupted of near-end speech, select a less detection threshold value.The embodiment of the invention is in the applied environment of reality, and the preferred value of the detection threshold value corresponding with there being near-end speech is 0.2.
As can be seen from the above-described embodiment, the present invention is by predefined, the existence of near-end speech whether with the corresponding relation of detection threshold value, after whether judgement exists near-end speech, select corresponding with it detection threshold value, thereby fundamentally avoided because the near-end speech that the residual echo elimination causes is interrupted, and when not having near-end speech, can effectively improve the accuracy of detection.
Embodiment three:
In the middle of the application of reality, when not having near-end speech, far-end speech might be the small-signal that is difficult to detect, for fear of causing omission, whether the embodiment of the invention also provides according to far-end speech is the residual echo testing mechanism that small-signal is selected detection threshold value, with reference to shown in Figure 5, specifically may further comprise the steps:
S501, calculate the statistic relevant with the voice signal situation, described statistic comprises the level and smooth energy of reference signal energy, residual signals and the level and smooth energy of near end input signal, and the ratio of the level and smooth energy of described residual signals and the level and smooth energy of near end input signal.
S502, the ratio of judging the level and smooth energy of described residual signals and the level and smooth energy of near end input signal whether less than first preset value or the level and smooth energy of residual signals whether less than second preset value, if, then carry out S503, otherwise, then carry out S504.
S503, current when not having near-end speech, whether further judge the reference signal energy less than the 3rd preset value, if, then carry out S505, otherwise, S506 then carried out.
When the ratio of the level and smooth energy of residual signals and the level and smooth energy of near end input signal less than first preset value or the level and smooth energy of residual signals during less than second preset value, determine the current near-end speech that do not exist, further judge that whether the reference signal energy is less than the 3rd preset value, if, then carry out S505, otherwise, then carry out S506.In the middle of actual at present communications applications, the preferred value of the 3rd preset value is-15dB.
S504 determines the current near-end speech that exists, and selects and the current corresponding detection threshold value of near-end speech that exists.
Wherein, S401, S402 and S404 are corresponding one by one among the specific implementation process of S501, S502 and S504 and the last embodiment, so repeat no more.
S505 determines that far-end speech is small-signal, selects corresponding with it detection threshold value.
When the judged result of S503 for being, when namely not existing near-end speech and far-end speech to be small-signal and since small-signal be difficult to detect, for fear of omission, the corresponding detection threshold value with it that we set in advance is a bigger value.In the applied environment of reality, the preferred value of corresponding detection threshold value is 0.5 with it.
S506 determines that far-end speech is not small-signal, selects corresponding with it detection threshold value.
When the judged result of S503 for not, when namely not existing near-end speech and far-end speech to be not small-signal, need not consider the omission to small-signal this moment, the preferred value of the corresponding detection threshold value with it that we set in advance is 0.3.
As can be seen from the above-described embodiment, present embodiment is except the beneficial effect that has a last embodiment and have, also by predefined, when near-end speech did not exist, whether far-end speech was small-signal, with the corresponding relation of detection threshold value, after judging whether far-end speech is small-signal, selecting corresponding with it detection threshold value, is omission under the small-signal situation thereby effectively avoided far-end speech, has further improved the accuracy in detection to residual echo.
Embodiment four:
In the application of reality, not have near-end speech and far-end speech be not accuracy in detection under the small-signal situation in order to improve, and the embodiment of the invention also provides the residual echo testing mechanism of whether selecting detection threshold value according to the former frame of present frame for the residual echo frame.With reference to shown in Figure 6, specifically may further comprise the steps:
S601, calculate the statistic relevant with the voice signal situation, described statistic comprises the level and smooth energy of reference signal energy, residual signals and the level and smooth energy of near end input signal, and the ratio of the level and smooth energy of described residual signals and the level and smooth energy of near end input signal.
S602, the ratio of judging the level and smooth energy of described residual signals and the level and smooth energy of near end input signal whether less than first preset value or the level and smooth energy of residual signals whether less than second preset value, if, then carry out S603, otherwise, then carry out S604.
S603, current when not having near-end speech, whether further judge the reference signal energy less than the 3rd preset value, if, then carry out S605, otherwise, S606 then carried out.
S604 determines the current near-end speech that exists, and selects and the current corresponding detection threshold value of near-end speech that exists.
S605 determines that far-end speech is small-signal, selects corresponding with it detection threshold value.
Wherein, S501-S505 is corresponding one by one among the specific implementation process of S601-S605 and the last embodiment, so repeat no more.
S606 determines that far-end speech is not small-signal, judges further whether the residual echo flag of frame of former frame is 1, if, then carry out S607, otherwise, S608 then carried out.
After determining not exist near-end speech and far-end speech to be not small-signal, in order further to improve the accuracy in detection of residual echo, need further to judge whether the former frame of present frame is the residual echo frame.
S607, the former frame of determining present frame is the residual echo frame, selects corresponding with it detection threshold value.
When the judged result of S606 for being, namely the residual echo of the former frame of present frame is masked as at 1 o'clock, the former frame of determining present frame is the residual echo frame.At this moment, because the end signal energy of residual echo is smaller, extremely difficult detection, it is unclean usually can to cause residual echo to handle, and therefore needs to select a bigger detection threshold value.In the middle of the practical application, the preferred value of detection threshold value is 0.7 in such cases.
S608, the former frame of determining present frame is non-residual echo frame, selects corresponding with it detection threshold value.
When the judged result of S606 for not, namely the residual echo of the former frame of present frame is masked as at 0 o'clock, the former frame of determining present frame is non-residual echo frame.The preferred value of detection threshold value is 0.3 in such cases.
As can be seen from the above-described embodiment, present embodiment is except the beneficial effect that has a last embodiment and have, also whether the disconnected former frame of official under county magistrate who administers lawsuit, etc. is the corresponding relation between residual echo frame and the detection threshold value, when the former frame of judging present frame is the residual echo frame, select bigger detection threshold value, thereby effectively avoided because the signal energy less omission that causes in residual echo end has further improved the accuracy that detects.
Embodiment five:
In the middle of the practical application, when having near-end speech in order to improve, the accuracy in detection of residual echo, present embodiment also adopts the residual echo testing mechanism of whether selecting detection threshold value according to the former frame of present frame for the residual echo frame.With reference to shown in Figure 7, specifically may further comprise the steps:
S701, calculate the statistic relevant with the voice signal situation, described statistic comprises the level and smooth energy of reference signal energy, residual signals and the level and smooth energy of near end input signal, and the ratio of the level and smooth energy of described residual signals and the level and smooth energy of near end input signal.
S702, the ratio of judging the level and smooth energy of described residual signals and the level and smooth energy of near end input signal whether less than first preset value or the level and smooth energy of residual signals whether less than second preset value, if, then carry out S703, otherwise, then carry out S704.
S703 determines the current near-end speech that do not exist, and selects and the current corresponding detection threshold value of near-end speech that do not exist.
Wherein, S401-S403 is corresponding one by one among the specific implementation process of S701-S703 and the embodiment two, so repeat no more.
S704 determines the current near-end speech that exists, and judges further whether the residual echo flag of frame is 1 or 0, if, then carry out S705, otherwise, S706 then carried out.
After determining there is near-end speech, in order further to improve the accuracy in detection of residual echo, need further to judge whether the former frame of present frame is the residual echo frame.
S705, the former frame of determining present frame is the residual echo frame, selects corresponding with it detection threshold value.
When the judged result of S704 for being, namely the residual echo flag of frame of the former frame of present frame is 1 o'clock, the former frame of determining present frame is the residual echo frame.At this moment, because the end signal energy of residual echo is smaller, extremely difficult detection, it is unclean usually can to cause residual echo to handle, and therefore needs to select a bigger detection threshold value.In the middle of the practical application, the preferred value of detection threshold value is 0.7 in such cases.
S706, the former frame of determining present frame is non-residual echo frame, selects corresponding with it detection threshold value.
When the judged result of S704 for not, namely present frame be masked as at 0 o'clock, the former frame of determining present frame is non-residual echo frame.The preferred value of detection threshold value is 0.2 in such cases.
From above-described embodiment as can be seen, present embodiment is except the beneficial effect that has embodiment two and have, also by judging that whether former frame is the corresponding relation between residual echo frame and the detection threshold value, when the former frame of judging present frame is the residual echo frame, select bigger detection threshold value, thereby effectively avoided because the signal energy less omission that causes in residual echo end has further improved the accuracy that detects.
Description by above method embodiment, the those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform, can certainly pass through hardware, but the former is better execution mode under a lot of situation.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in the storage medium, comprise that some instructions are with so that a computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out all or part of step of the described method of each embodiment of the present invention.And aforesaid storage medium comprises: various media that can be program code stored such as read-only memory (ROM), random-access memory (ram), magnetic disc or CD.
Embodiment six:
Corresponding to top method embodiment, the embodiment of the invention also provides a kind of residual echo detection system, referring to shown in Figure 8, comprising:
Computing unit 801, be used for calculating the statistic relevant with the voice signal situation, described statistic comprises the level and smooth energy of reference signal energy, residual signals and the level and smooth energy of near end input signal, and the ratio of the level and smooth energy of described residual signals and the level and smooth energy of near end input signal;
Memory cell 802 is used for the corresponding relation between the predefined voice signal situation of storage and the detection threshold value;
Selected cell 803 is used for determining the current speech signal(l)ing condition according to described statistic, with reference to the corresponding relation between predefined voice signal situation and the detection threshold value, selects the detection threshold value corresponding with the current speech signal.
Referring to shown in Figure 9, described selected cell 803 can comprise:
The first judging unit 803a, the ratio that is used for judging the level and smooth energy of described residual signals and the level and smooth energy of near end input signal whether less than first preset value or the level and smooth energy of residual signals whether less than second preset value;
The first determining unit 803b, be used for when the judged result of the described first judging unit 803a when being, determine the current near-end speech that do not exist, otherwise, definite current near-end speech that exists then.
Referring to shown in Figure 10, described selected cell 803 also comprises:
The second judging unit 803c, be used for when the judged result of the described first judging unit 803a when being, judge that further whether the reference signal energy is less than the 3rd preset value;
The second determining unit 803d, be used for when the judged result of the described second judging unit 803c when being, determine that far-end speech is small-signal, otherwise then definite far-end speech is not small-signal.
Referring to shown in Figure 11, described selected cell 803 also comprises:
The 3rd judging unit 803e, be used for judged result as the described first judging unit 803a and be and the judged result of the second judging unit 803c when denying, judge further whether the residual echo flag of frame of former frame is 1;
The 3rd determining unit 803f, be used for when the judged result of described the 3rd judging unit 803e when being, the former frame of determining present frame is the residual echo frame, otherwise then the former frame of definite present frame is not the residual echo frame.
Referring to shown in Figure 12, described selected cell 803 also comprises:
The 4th judging unit 803g, be used for when the judged result of the described first judging unit 803a when denying, judge further whether former frame residual echo flag of frame is 1;
The 4th determining unit 803h, be used for when the judged result of described the 4th judging unit 803g when being, the former frame of determining present frame is the residual echo frame, otherwise then the former frame of definite present frame is not the residual echo frame.
For system embodiment, because it is substantially corresponding to method embodiment, so relevant part gets final product referring to the part explanation of method embodiment.
Those skilled in the art will be understood that, system embodiment described above only is schematic, wherein said unit as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical locations also, namely can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select wherein some or all of module to realize the purpose of present embodiment scheme according to the actual needs.Those of ordinary skills namely can understand and implement under the situation of not paying creative work.
In several embodiment that the application provides, should be understood that disclosed system and method not surpassing in the application's the spirit and scope, can be realized in other way.Current embodiment is a kind of exemplary example, should be as restriction, and given particular content should in no way limit the application's purpose.For example, the division of described unit or subelement only is that a kind of logic function is divided, and during actual the realization other dividing mode can be arranged, and for example a plurality of unit or a plurality of subelement combine.In addition, a plurality of unit can or assembly can in conjunction with or can be integrated into another system, or some features can ignore, or do not carry out.
The above only is the specific embodiment of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.