RU2436173C1

RU2436173C1 - Method of detecting pauses in speech signals and device for realising said method

Info

Publication number: RU2436173C1
Application number: RU2010124342/08A
Authority: RU
Inventors: Владимир Викторович Витязев (RU); Владимир Викторович Витязев; Валерий Иванович Розов (RU); Валерий Иванович Розов; Владимир Андреевич Волченков (RU); Владимир Андреевич Волченков
Priority date: 2010-06-15
Filing date: 2010-06-15
Publication date: 2011-12-10

Abstract

FIELD: information technology. ^ SUBSTANCE: speech signal from the output of an electroacoustic transducer is summed up with a new frequency- and amplitude-stable signal. The obtained sum of signals is amplified, amplitude-limited and converted by multiplying with a copy of the primary speech signal into a new signal which is compared with a set threshold, and presence of a pause in the speech signal is indicated by the amplitude of the obtained signal being greater than the set threshold value. ^ EFFECT: low volume of computational operations during digital processing of speech signals. ^ 2 cl, 3 dwg

Description

Изобретение относится к технике цифровой обработки речевых сигналов и может быть использовано в различных приложениях, например в системах аудиоархивации, в справочных службах, в системах передачи речевых сигналов, в области распознавания речи.The invention relates to techniques for digital processing of speech signals and can be used in various applications, for example, in audio archiving systems, in help services, in speech transmission systems, in the field of speech recognition.

Известен способ обнаружения пауз в речевом сигнале [1], использующий коррекцию спектральных характеристик речевого сигнала, создание в нем фазовых сдвигов, сложение этого сигнала с амплитудно-частотным откорректированным сигналом, раздельное детектирование положительных и отрицательных полуволн и их алгебраическое сложение. Недостатком этого способа является трудоемкость и сложность его осуществления.A known method for detecting pauses in a speech signal [1], using the correction of the spectral characteristics of the speech signal, creating phase shifts in it, adding this signal to the amplitude-frequency corrected signal, separate detection of positive and negative half-waves and their algebraic addition. The disadvantage of this method is the complexity and complexity of its implementation.

Известен другой способ обнаружения пауз в речевом сигнале, использующий различие спектральных характеристик сигнала речи от сигнала в паузе (шума) [2]. Этот способ осуществляется путем определения спектрального отклонения сигнала речи от сигнала в паузе (шума) с применением оценивания параметров модели авторегрессии, сравнение суммы энергий сигнала речи и сигнала в паузе (шума) с порогом, вынесение решения о наличии на входе паузы, если уровень суммы энергий сигнала речи и сигнала в паузе (шума) ниже порогового уровня.There is another method for detecting pauses in a speech signal using the difference in the spectral characteristics of a speech signal from a signal in a pause (noise) [2]. This method is carried out by determining the spectral deviation of the speech signal from the signal in the pause (noise) using the estimation of the parameters of the autoregressive model, comparing the sum of the energies of the speech signal and the signal in the pause (noise) with a threshold, deciding whether there is a pause at the input if the level of the sum of energies speech signal and pause signal (noise) below the threshold level.

Недостатки данного способа: расчет характеристик инверсного фильтра проводится с применением оценивания параметров модели авторегрессии. Следует отметить, что эти модели эффективно работают, если шум «окрашен», в противном же случае, если шум абсолютно «белый», то порядок аппроксимирующей модели р должен быть бесконечно большим, что физически нереализуемо. В реальных условиях, как правило, наблюдаемый фоновый шум «окрашен» и в связи с этим может быть описан стохастическим разностным уравнением вида:The disadvantages of this method: the calculation of the characteristics of the inverse filter is carried out using the estimation of the parameters of the autoregressive model. It should be noted that these models work effectively if the noise is “colored”, otherwise, if the noise is absolutely “white”, then the order of the approximating model p must be infinitely large, which is physically unrealizable. In real conditions, as a rule, the observed background noise is “colored” and therefore can be described by a stochastic difference equation of the form:

в котором порядок уравнения р - конечная величина, а параметры:in which the order of the equation p is a finite quantity, and the parameters:

m - математическое ожидание,m is the mathematical expectation

σ₀ - дисперсия сигнала в паузе,σ ₀ - the dispersion of the signal in a pause,

α_k - коэффициенты линейного предсказания - определяются заранее.α _k - linear prediction coefficients - are determined in advance.

Необходимо подчеркнуть, что при скачкообразном изменении параметров этого уравнения, при так называемой разладке, наблюдаемая случайная последовательность по-прежнему может быть описана следующим уравнением:It must be emphasized that with an abrupt change in the parameters of this equation, with the so-called disorder, the observed random sequence can still be described by the following equation:

но, в общем случае, другого порядка и с неизвестными параметрами m₁, σ₁, β_k. При отсутствии априорной информации о значениях параметров уравнения (2) применяют одномерную решающую функцию, построенную на анализе значения порога вида Y=σ₁/σ₀. В случае если величина отношения σ₁/σ₀ превышает порог, то принимается решение о наличии разладки, т.е. о присутствии на входе системы суммы сигнала речи и сигнала в паузе (шума). В противном случае принимается решение о наличии на входе системы только сигнала в паузе (шума).but, in the general case, of a different order and with unknown parameters m ₁ , σ ₁ , β _k . In the absence of a priori information about the values of the parameters of equation (2), a one-dimensional decision function is used based on the analysis of the threshold value of the form Y = σ ₁ / σ ₀ . If the value of the ratio σ ₁ / σ ₀ exceeds the threshold, then a decision is made on the presence of a discrepancy, i.e. the presence at the system input of the sum of the speech signal and the signal in the pause (noise). Otherwise, a decision is made whether there is only a paused signal (noise) at the system input.

При использовании данных решающих функций возникают так называемые "мертвые" зоны, когда решающая функция для некоторых сочетаний параметров до и после разладки или не изменяется или ее значение растет так медленно, что за приемлемое время разладки обнаруживается с вероятностью ложной тревоги.When using these decision functions, the so-called “dead zones” arise when the decision function for some combinations of parameters before and after the debugging either does not change or its value grows so slowly that it is detected with a false alarm probability for an acceptable debugging time.

Другим существенным недостатком способа является подавление как компонентов сигнала в паузе (шума), так и компонентов сигнала речи в случае совпадения их максимумов.Another significant drawback of the method is the suppression of both signal components in a pause (noise), and speech signal components in case of coincidence of their maxima.

Кроме этого, при вычислении энергетического спектра процесса авторегрессии, который в математическом виде может быть записан следующим образом:In addition, when calculating the energy spectrum of the autoregression process, which in mathematical form can be written as follows:

где σ² _ш - дисперсия сигнала в паузе (шума),where σ ² _w - the dispersion of the signal in the pause (noise),

неточность определения α_k приводит к смещению спектра относительно истинного положения, что в свою очередь не позволяет оптимально рассчитать характеристики инверсного фильтра. Необходимость осуществления непрерывного подстраивания характеристик фильтра под текущее значение сигнала в паузе (шума) приводит к большим временным затратам на вычисления.the inaccuracy in determining α _k leads to a shift of the spectrum relative to the true position, which in turn does not allow optimal calculation of the characteristics of the inverse filter. The need for continuous adjustment of the filter characteristics to the current value of the signal in the pause (noise) leads to a large time cost for computing.

И, наконец, для обеспечения одинаковой вероятности обнаружения пауз в речевом сигнале, при изменении уровня входного шума, необходимо корректировать коэффициент усиления речевого тракта.And finally, to ensure the same probability of detecting pauses in the speech signal, when the input noise level changes, it is necessary to adjust the gain of the speech path.

Наиболее близким к предлагаемому является способ обнаружения пауз в речевом сигнале, использующий различие спектральных характеристик сигнала речи от сигнала в паузе (шума) [3], принятый за прототип. В данном способе обнаружения пауз в речевом сигнале оценку определения спектрального отклонения сигнала речи от сигнала в паузе (шума) проводят посредством определения отклонения отношений энергии частотного спектра сигнала речи от частотного спектра сигнала в паузе (шума), выполняя следующие действия:Closest to the proposed is a method for detecting pauses in a speech signal using the difference in the spectral characteristics of a speech signal from a signal in a pause (noise) [3], adopted as a prototype. In this method for detecting pauses in a speech signal, the determination of the spectral deviation of the speech signal from the signal in the pause (noise) is evaluated by determining the deviation of the energy ratios of the frequency spectrum of the speech signal from the frequency spectrum of the signal in the pause (noise) by performing the following steps:

1. Осуществляют дискретизацию с шагом Δt и квантование сигналов с выхода микрофона (получение отсчетов);1. Carry out sampling with a step Δt and quantization of signals from the microphone output (obtaining samples);

2. Записывают в запоминающее устройство поток отсчетов отрезка определенной длины сигнала в паузе (шума) с выхода микрофона в режиме молчания диктора;2. Record in the storage device a stream of samples of a segment of a certain signal length in a pause (noise) from the microphone output in the silent mode of the speaker;

3. Разделяют поток отсчетов отрезка сигнала в паузе (шума) на ряд участков длиной R;3. Divide the stream of samples of the signal segment in the pause (noise) into a number of sections of length R;

4. Разделяют частотный диапазон (1/Δt) энергетического спектра Фурье каждого из этих участков на ряд интервалов (i=1, …, N);4. Divide the frequency range (1 / Δt) of the Fourier energy spectrum of each of these sections into a number of intervals (i = 1, ..., N);

5. Вычисляют точные значения долей энергии отсчетов сигнала в паузе (шума) P_{i пауза}, соответствующих каждому из частотных интервалов, по формуле5. Calculate the exact values of the fractions of the energy of the signal samples in the pause (noise) P _{i pause} corresponding to each of the frequency intervals, according to the formula

где M=2[R/(2N)]+2;where M = 2 [R / (2N)] + 2;

6. Определяют среднее значение долей энергии отсчетов сигнала в паузе (шума) P_{i пауза ср} в каждом частотном интервале для всего потока отсчетов отрезка сигнала в паузе (шума);6. Determine the average value of the fractions of the energy of the signal samples in the pause (noise) P _{i pause sr} in each frequency interval for the entire stream of samples of the signal segment in the pause (noise);

7. Записывают в запоминающее устройство вычисленные значения энергии отсчетов сигнала в паузе (шума) P_{i пауза ср};7. Record the calculated values of the energy of the samples of the signal in the pause (noise) P _{i pause cf} ;

8. Разделяют поток отсчетов речевого сигнала на участки такой же длины, как и при анализе потока отсчетов сигнала в паузе (шума);8. Divide the stream of samples of the speech signal into sections of the same length as when analyzing the stream of samples of the signal in a pause (noise);

9. Вычисляют для каждого участка в каждом из N частотных интервалов значения долей энергии отсчетов речевого сигнала P_iпо формуле9. For each section in each of the N frequency intervals, the values of the energy fractions of the samples of the speech signal P _{i are} calculated by the formula

10. Вычисляют отношения P_iк P_{i пауза ср} в каждом из N частотных интервалов всех выбранных участков и определяют из них максимальное значение max(P_i/P_{i пауза ср});10. Calculate the ratio of P _i to P _{i pause cf} in each of the N frequency intervals of all selected sections and determine from them the maximum value max (P _i / P _{i pause cf} );

11. Передают максимальное значение max(P_i/P_{i пауза ср}) на вход порогового обнаружителя;11. Transmit the maximum value max (P _i / P _{i pause sr} ) to the input of the threshold detector;

12. Определяют значение порога h в схеме вычисления порога с учетом вычисленного значения М;12. Determine the value of the threshold h in the threshold calculation circuit taking into account the calculated value of M;

13. Сравнивают в пороговом обнаружителе максимальное значение max(P_i/P_{i пауза ср}) со значением порога h;13. Compare the maximum value max (P _i / P _{i pause sr} ) with the threshold value h in the threshold detector;

14. Принимают решение о наличии паузы при значении max (P_i/P_{i пауза ср}), меньшем или равным порогу h;14. Decide on the presence of a pause with a value of max (P _i / P _{i pause cp} ) less than or equal to the threshold h;

15. Обновляют значение P_{i пауза ср} с учетом текущего значения P_{i пауза};15. Update the value of P _{i pause cf} taking into account the current value of P _{i pause} ;

16. Производят кодирование пауз, при этом код каждой паузы содержит информацию только о моменте начала и продолжительности паузы.16. Pause encoding is performed, and the code of each pause contains information only about the moment of the start and duration of the pause.

И данный способ обладает рядом недостатков, основными из которых являются:And this method has several disadvantages, the main of which are:

большие временные затраты на вычисления;large time costs for computing;

необходимость постоянной корректировки значений P_{i пауза ср};the need for constant adjustment of the values of P _{i pause avg} ;

необходимость постоянной корректировки значений порога h;the need for constant adjustment of the threshold h;

значительная задержка обнаружения пауз в речевом сигнале.Significant delay in detecting pauses in the speech signal.

Известно техническое устройство, осуществляющее техническую реализацию указанного способа. В состав устройства [3] входят пороговый обнаружитель, схема вычисления порога, которая содержит алгоритмический модуль, в состав которого входит аналого-цифровой преобразователь, устройство записи, запоминающее устройство, считывающее устройство, устройство вычисления энергии спектра, устройство определения среднего значения энергии отсчетов в паузе, устройство вычисления отношений P_iк P_{i пауза ср}, устройство определения max (P_i/P_{i пауза ср}), устройство кодирования, устройство синхронизации. Причем первый вход алгоритмического модуля подключен к выходу микрофона, первый выход алгоритмического модуля подключен к первому входу порогового обнаружителя, второй выход алгоритмического модуля подключен к входу схемы вычисления порога, выход которой подключен ко второму входу порогового обнаружителя, выход порогового обнаружителя подключен ко второму входу алгоритмического модуля.A technical device is known that carries out the technical implementation of this method. The device [3] includes a threshold detector, a threshold calculation circuit, which contains an algorithm module, which includes an analog-to-digital converter, a recording device, a storage device, a reading device, a spectrum energy calculating device, and a device for determining the average value of sample energy in a pause , a device for calculating the relations P _i to P _{i pause cf} , a device for determining max (P _i / P _{i pause cf} ), an encoding device, a synchronization device. Moreover, the first input of the algorithm module is connected to the microphone output, the first output of the algorithm module is connected to the first input of the threshold detector, the second output of the algorithm module is connected to the input of the threshold calculation circuit, the output of which is connected to the second input of the threshold detector, the output of the threshold detector is connected to the second input of the algorithm module .

Недостатки данного устройства соответствуют недостаткам способа, который на нем реализован.The disadvantages of this device correspond to the disadvantages of the method that is implemented on it.

Задачей предлагаемого изобретения является создание способа и устройства для его реализации, обеспечивающее повышение достоверности обнаружения пауз в речевом сигнале и формирование синхронизирующего сигнала, соответствующего наличию пауз в речевом сигнале.The objective of the invention is to provide a method and device for its implementation, providing increased reliability of detection of pauses in a speech signal and the formation of a synchronizing signal corresponding to the presence of pauses in a speech signal.

Техническим результатом использования предложенного изобретения является сокращение объема вычислительных операций при цифровой обработке сигналов речи, сокращение объема памяти для хранения речи и уменьшение графика при ее передаче.The technical result of using the proposed invention is to reduce the amount of computational operations in digital processing of speech signals, reducing the amount of memory for storing speech and reducing the schedule for its transmission.

Поставленная задача достигается тем, что в предлагаемом способе обнаружения пауз в речевом сигнале, включающем сравнение сигнала, содержащего информацию о паузах, с пороговым уровнем, вынесение решения о наличии паузы в сигнале речи определяют по уровню амплитуды нового измерительного сигнала, содержащего информацию о паузах, при этом новый сигнал, содержащий информацию о паузах, получают из речевого сигнала преобразованием речевого сигнала в новый измерительный сигнал, выполняя следующие действия:The problem is achieved in that in the proposed method for detecting pauses in a speech signal, including comparing a signal containing information about pauses with a threshold level, deciding whether there is a pause in a speech signal is determined by the amplitude level of a new measuring signal containing information about pauses, when this new signal containing information about the pauses obtained from the speech signal by converting the speech signal into a new measuring signal by performing the following steps:

1. Речевой сигнал S₁(t)=U₁sin(ωt) с электроакустического преобразователя 1, например с микрофона, подают на первый вход сумматора 3, где суммируют с сигналом S₂(t) и получают сигнал S₃(t)=S₁(t)+S₂(t), который подают в усилитель-ограничитель (УО) 4;1. The speech signal S ₁ (t) = U ₁ sin (ωt) from the electro-acoustic transducer 1, for example, from a microphone, is fed to the first input of the adder 3, where it is summed with the signal S ₂ (t) and a signal S ₃ (t) = is obtained S ₁ (t) + S ₂ (t), which is fed to the amplifier-limiter (UO) 4;

2. Генерируют генератором 2 и подают на второй вход сумматора 3 новый сигнал S₂(t)=f(U₂,f₁), имеющий стабильные предварительно установленные амплитуду U₂=const и частоту f₁=1/T₁=const;2. Generate a generator 2 and apply to the second input of the adder 3 a new signal S ₂ (t) = f (U ₂ , f ₁ ) having stable pre-set amplitude U ₂ = const and frequency f ₁ = 1 / T ₁ = const;

3. Усиливают, ограничивают и нормируют по амплитуде сигнал S₃(t) в УО 4, получают сигнал S₄(t), который подают на первый вход перемножителя 6;3. Amplify, limit and normalize the amplitude of the signal S ₃ (t) in UO 4, receive a signal S ₄ (t), which is fed to the first input of the multiplier 6;

4. Усиливают, ограничивают и нормируют по амплитуде речевой сигнал S₁(t) в УО 5, получают сигнал S₅(t), который подают на второй вход перемножителя 6;4. Amplify, limit and normalize the amplitude of the speech signal S ₁ (t) in UO 5, receive a signal S ₅ (t), which is fed to the second input of the multiplier 6;

5. Перемножают сигналы S₄(t) и S₅(t) и выделяют сигнал S₆(t)=f(U₆(t),f₁) с амплитудой U₆(t), определяемой инверсной амплитудой сигнала S₁(t), и частотой f₁;5. Multiply the signals S ₄ (t) and S ₅ (t) and select the signal S ₆ (t) = f (U ₆ (t), f ₁ ) with an amplitude U ₆ (t) determined by the inverse amplitude of the signal S ₁ ( t), and frequency f ₁ ;

6. Подают сигнал S₆(t)=f(U₆(t),f₁) в фильтр низкой частоты 7 и выделяют с помощью фильтра, настроенного на частоту f₁, сигнал S₇(t)=U₇(t)sin(ω₁t) и сравнивают в пороговом устройстве 8 амплитуду U₇(t) с установленным порогом U_пор(t), вычисленным предварительно во время отсутствия речи по условию U_пор(t)=KU_{7 макс}(t), где U_{7 макс}(t) - максимальное значение амплитуды сигнала на выходе фильтра, настроенного на частоту f₁ при паузах, а коэффициент К меньше или равен единицы, причем значение К выбирается предварительно, и по результатам сравнения амплитуды U₇(t) с установленным значением U_пор(t) принимают решение о наличии паузы в речевом сигнале.6. The signal S ₆ (t) = f (U ₆ (t), f ₁ ) is supplied to the low-pass filter 7 and extracted using the filter tuned to the frequency f ₁ , the signal S ₇ (t) = U ₇ (t) sin (ω ₁ t) and in the threshold device 8 the amplitude U ₇ (t) is compared with the set threshold U _pore (t) calculated previously during the absence of speech by the condition U _pore (t) = KU _{7 max} (t), where U _{7, max} (t) - the maximum value of the signal amplitude at the output of the filter, tuned to the frequency f ₁ at intervals, and the coefficient K is less than or equal to unity, the value of K is preselected, and by comparing amplitudes U ₇ (t) with mustache anovlennym value _pore U (t) decide on the availability pause in the speech signal.

Новизна предложенного способа заключается в том, что паузы в речевых сигналах обнаруживают по уровню амплитуды нового измерительного стабильной частоты сигнала S₇(t)=U₇(t)sin(ω₁t), получаемого фильтрацией из сигнала S₆(t)=f(U₆(t),f₁), который получают путем корреляционной обработки сигналов S₄(t) и S₅(t), причем сигнал S₄(t) получают из сигнала S₃(t), усиливая, ограничивая по амплитуде сигнал S₃(t), который получают путем суммирования сигналов S₁(t) и S₂(t), где S₁(t) - исходный речевой сигнал, S₂(t) - вспомогательный измерительный сигнал, а сигнал S₅(t) получают усиливая, ограничивая по амплитуде исходный сигнал S₁(t).The novelty of the proposed method lies in the fact that pauses in speech signals are detected by the amplitude level of the new measuring stable frequency of the signal S ₇ (t) = U ₇ (t) sin (ω ₁ t) obtained by filtering from the signal S ₆ (t) = f (U ₆ (t), f ₁ ), which is obtained by correlation processing of the signals S ₄ (t) and S ₅ (t), and the signal S ₄ (t) is obtained from the signal S ₃ (t), amplifying, limiting in amplitude the signal S ₃ (t), which is obtained by summing the signals S ₁ (t) and S ₂ (t), where S ₁ (t) is the initial speech signal, S ₂ (t) is the auxiliary measuring signal, and the signal S ₅ ( t) receiving m amplifying, limiting the amplitude of the original signal S ₁ (t).

Критерию «изобретательский уровень» предложенный способ соответствует, т.к. он основан на преобразовании речевого сигнала в новый измерительный стабильной частоты сигнал, имеющий амплитуду, по величине которой в пороговом устройстве определяется наличие пауз в речевом сигнале.The criterion of "inventive step" the proposed method meets, because it is based on the conversion of the speech signal into a new measuring stable frequency signal having an amplitude, the magnitude of which determines the presence of pauses in the speech signal in the threshold device.

Для осуществления данного способа предложено устройство обнаружения пауз в речевом сигнале, включающее электроакустический преобразователь, пороговый обнаружитель и устройство анализа речевого сигнала, которое в отличие от известного вместо схемы вычисления порога, состоящей из алгоритмического модуля, в состав которого входит аналого-цифровой преобразователь, устройство записи, запоминающее устройство, считывающее устройство, устройство вычисления энергии спектра, устройство определения среднего значения энергии отсчетов сигнала в паузе, устройство вычисления отношений P_iк P_{i пауза ср}, устройство определения max (P_i/P_{i пауза ср}), устройство кодирования, устройство синхронизации, содержит генератор нового измерительного сигнала, сумматор, два усилителя-ограничителя, перемножитель и фильтр низкой частоты.To implement this method, a device for detecting pauses in a speech signal is proposed, including an electro-acoustic transducer, a threshold detector and a speech signal analysis device, which, in contrast to the known threshold calculation circuit, consisting of an algorithmic module that includes an analog-to-digital converter, a recording device , storage device, reader, device for calculating the energy of the spectrum, device for determining the average value of the energy of the signal samples and during a pause, the device computing the relationship P _i to P _{i pause cf.,} determination device max (P _i / P _{i pause avg),} the coding device, the synchronization device comprises a generator of a new measurement signal, the adder, the two amplifier-limiter, multiplier, and the filter low frequency.

Отличительными признаками предложенного устройства, подтверждающими новизну и изобретательский уровень, являются:Distinctive features of the proposed device, confirming the novelty and inventive step are:

- наличие устройства анализа речевого сигнала вместо алгоритмического модуля, что позволяет повысить качество анализа речевого сигнала и вероятность обнаружения пауз в речевом сигнале;- the presence of a device for analyzing a speech signal instead of an algorithmic module, which improves the quality of the analysis of the speech signal and the probability of detecting pauses in the speech signal;

- состав устройства анализа речевого сигнала, которое включает генератор нового измерительного сигнала, сумматор, два усилителя-ограничителя, перемножитель и фильтр низкой частоты.- the composition of the speech signal analysis device, which includes a new measuring signal generator, an adder, two limiter amplifiers, a multiplier, and a low-pass filter.

Сущность изобретений поясняется чертежами:The invention is illustrated by drawings:

Фиг.1 - Осциллограмма фразы «Начало тестирования аппаратуры»;Figure 1 - Oscillogram of the phrase "Start testing equipment";

Фиг.2 - Осциллограмма обнаруженных пауз во фразе «Начало тестирования аппаратуры»;Figure 2 - Oscillogram of detected pauses in the phrase "Start testing equipment";

Фиг.3 - блок-схема заявляемого устройства.Figure 3 is a block diagram of the inventive device.

На фиг.3 заявленное устройство состоит из электроакустического преобразователя 1, генератора нового измерительного сигнала 2, сумматора 3, двух усилителей-ограничителей 4 и 5, перемножителя 6, фильтра низкой частоты 7, порогового обнаружителя 8. Электроакустический преобразователь является входом устройства, а пороговое устройство - выходом устройства. Причем выход электроакустического преобразователя подключен к первому входу сумматора и к входу первого усилителя - амплитудного ограничителя, выход генератора нового измерительного сигнала подключен к второму входу сумматора, выход сумматора подключен к входу второго усилителя - амплитудного ограничителя, выход второго усилителя - амплитудного ограничителя подключен к первому входу перемножителя, выход первого усилителя - амплитудного ограничителя подключен к второму входу перемножителя, выход перемножителя подключен к фильтру низкой частоты, а выход фильтра низкой частоты подключен к входу порогового устройства.Figure 3, the claimed device consists of an electro-acoustic transducer 1, a generator of a new measuring signal 2, an adder 3, two amplifying limiters 4 and 5, a multiplier 6, a low-pass filter 7, a threshold detector 8. The electro-acoustic transducer is the input of the device, and the threshold device - device output. Moreover, the output of the electro-acoustic transducer is connected to the first input of the adder and to the input of the first amplifier - the amplitude limiter, the output of the new measuring signal generator is connected to the second input of the adder, the output of the adder is connected to the input of the second amplifier - the amplitude limiter, the output of the second amplifier - the amplitude limiter is connected to the first input multiplier, the output of the first amplifier - amplitude limiter is connected to the second input of the multiplier, the output of the multiplier is connected to the filter Coy frequency and low frequency filter output is connected to the input of the threshold device.

Предложенный способ реализуют на данном устройстве следующим образом.The proposed method is implemented on this device as follows.

Речевой сигнал S₁(t) с выхода электроакустического преобразователя 1 подают на первый вход сумматора 3, генерируют в генераторе 2 новый измерительный сигнал S₂(t)=f(U₂,f₁) установленной заранее стабильной амплитуды U₂=const и частоты f₁=1/T₁=const и подают на второй вход сумматора 3, в сумматоре получают сигнал S₃(t)=S₁(t)+S₂(t), который подают в усилитель-ограничитель (УО) 4, где сигнал S₃(t)=S₁(t)+S₂(t) усиливают в k₄ раз, ограничивают и получают сигнал S₄(t), сигнал S₄(t) подают на первый вход перемножителя 6, на второй вход перемножителя 6 подают сигнал S₅(t), который получают путем преобразования сигнала S₁(t) в усилителе-ограничителе (УО) 5, имеющем те же характеристики, что и УО 4, т.е. коэффициент усиления УО 5 k₅ равен коэффициенту усиления k₄ УО 4, а амплитуда сигнала S₅(t) равна амплитуде сигнала S₄(t). В перемножителе 6 в результате перемножения сигналов S₄(t) и S₅(t) получают сигнал S₆(t)=f(U₆(t),f₁), где U₆(t) и f₁ - существенные параметры, которые используются для определения пауз в речевом сигнале. Сигнал S₆(t)=f(U₆(t),f₁) подают в фильтр низкой частоты 7 и выделяют с помощью фильтра, настроенного на частоту f₁, сигнал S₇(t)=U₇(t)sin(ω₁t) и сравнивают амплитуду U₇(t) в пороговом устройстве с установленным порогом U_пор(t), вычисленным предварительно во время отсутствия речи по условию U_пор(t)=K·U_{7 макс}(t), где U_{7 макс}(t) - максимальное значение сигнала при паузах на выходе фильтра, включенного на входе порогового устройства и настроенного на частоту f₁, а коэффициент K меньше или равен единицы, причем значение К выбирается предварительно.The speech signal S ₁ (t) from the output of the electro-acoustic transducer 1 is fed to the first input of the adder 3, a new measuring signal S ₂ (t) = f (U ₂ , f ₁ ) of a predetermined stable amplitude U ₂ = const and frequency is generated in the generator 2 f ₁ = 1 / T ₁ = const and fed to the second input of the adder 3, in the adder receive a signal S ₃ (t) = S ₁ (t) + S ₂ (t), which is fed to the amplifier-limiter (UO) 4, where the signal S ₃ (t) = S ₁ (t) + S ₂ (t) is amplified k ₄ times, it is limited and the signal S ₄ (t) is received, the signal S ₄ (t) is supplied to the first input of multiplier 6, to the second the input of the multiplier 6 signal S ₅ (t) The second one is obtained by converting the signal S ₁ (t) in the limiting amplifier (UO) 5 having the same characteristics as the UO 4, i.e. the amplification factor of UO 5 k ₅ is equal to the amplification coefficient k _{4 of} UO 4, and the amplitude of the signal S ₅ (t) is equal to the amplitude of the signal S ₄ (t). In the multiplier 6, as a result of the multiplication of the signals S ₄ (t) and S ₅ (t), a signal S ₆ (t) = f (U ₆ (t), f ₁ ) is obtained, where U ₆ (t) and f ₁ are essential parameters , which are used to determine pauses in a speech signal. The signal S ₆ (t) = f (U ₆ (t), f ₁ ) is supplied to the low-pass filter 7 and extracted using a filter tuned to the frequency f ₁ , the signal S ₇ (t) = U ₇ (t) sin ( ω ₁ t) and compare the amplitude U ₇ (t) in the threshold device with the set threshold U _pore (t) calculated previously during the absence of speech by the condition U _pore (t) = K · U _{7 max} (t), where U _{7 max} (t) is the maximum value of the signal during pauses at the output of the filter, switched on at the input of the threshold device and tuned to the frequency f ₁ , and the coefficient K is less than or equal to unity, and the value of K is preselected.

В заключение отметим следующее.In conclusion, we note the following.

1. В результате использования предложенных технических решений благодаря преобразованию речевого сигнала в новый измерительный стабильной частоты сигнал, имеющий амплитуду, по величине которой в пороговом устройстве определяется наличие пауз в речевом сигнале, предложенные способ и устройство для его осуществления позволяют разделять речевые сигналы на периоды активной речи и паузы с высокой вероятностью и при этом не влиять на сигнал речи, подлежащий цифровой обработке.1. As a result of using the proposed technical solutions due to the conversion of the speech signal into a new measuring stable frequency, the signal having an amplitude, the magnitude of which determines the presence of pauses in the speech signal in the threshold device, the proposed method and device for its implementation allow you to divide the speech signals into periods of active speech and pauses with a high probability and without affecting the speech signal to be digitally processed.

2. Полученный в пороговом устройстве сигнал, содержащий информацию об обнаруженных паузах в речевом сигнале, можно кодировать таким образом, чтобы код каждой паузы содержал информацию только о моменте начала и продолжительности пауз, что позволяет сократить объем памяти для хранения речи и уменьшить график при ее передаче.2. The signal received in the threshold device containing information about the detected pauses in the speech signal can be encoded so that the code of each pause contains information only about the start time and duration of the pauses, which reduces the amount of memory for storing speech and reduces the schedule for its transmission .

3. Предложенные способ и устройство для его осуществления могут быть эффективно использованы при распознавании речевых сигналов.3. The proposed method and device for its implementation can be effectively used in the recognition of speech signals.

Использованная литератураReferences

1. Авторское свидетельство СССР по заявке №836656, кл. G10L 1/04, 16.07.79.1. USSR author's certificate according to application No. 836656, cl. G10L 1/04, 07/16/79.

2. Шелухин О.И., Лукьянцев Н.Ф. Цифровая обработка и передача речи. / Под ред. О.И.Шелухина. - М.: Радио и связь, 2000. - 456 с.: ил.2. Shelukhin O.I., Lukyantsev N.F. Digital processing and voice transmission. / Ed. O.I.Shelukhina. - M .: Radio and communications, 2000. - 456 p.: Ill.

3. Патент РФ 2317595 С1, кл. G10L 15/00.3. RF patent 2317595 C1, cl. G10L 15/00.

Claims

1. Способ обнаружения пауз в речевом сигнале, включающий сравнение сигнала, содержащего информацию о паузах, с пороговым уровнем, отличающийся тем, что генерируют вспомогательный предварительно установленной стабильной частоты и амплитуды сигнал, который суммируют с исходным речевым сигналом, затем суммарный сигнал усиливают, ограничивают по амплитуде и умножают с исходным речевым сигналом, который перед умножением усиливают и ограничивают по амплитуде, затем из полученного в результате перемножения сигнала выделяют упомянутый стабильной частоты сигнал и производят сравнение этого сигнала по амплитуде с пороговым уровнем, и по результатам сравнения определяют начало, конец и длительность паузы.1. A method for detecting pauses in a speech signal, comprising comparing a signal containing pause information with a threshold level, characterized in that an auxiliary preset stable frequency and amplitude are generated, which is added to the original speech signal, then the total signal is amplified, limited by amplitude and multiplied with the original speech signal, which before amplification is amplified and limited in amplitude, then the aforementioned stab noy frequency signal and this signal is compared in amplitude with a threshold level, and the comparison results determine the beginning, end and duration of the pause.

2. Устройство обнаружения пауз в речевом сигнале, включающее электроакустический преобразователь, пороговое устройство, отличающееся тем, что содержит генератор нового измерительного сигнала, сумматор, два усилителя-амплитудных ограничителя, перемножитель, фильтр низкой частоты, причем выход электроакустического преобразователя подключен к первому входу сумматора и к входу первого усилителя-амплитудного ограничителя, выход генератора нового измерительного сигнала подключен к второму входу сумматора, выход сумматора подключен к входу второго усилителя-амплитудного ограничителя, выход второго усилителя-амплитудного ограничителя подключен к первому входу перемножителя, выход первого усилителя-ограничителя подключен к второму входу перемножителя, выход перемножителя подключен к фильтру низкой частоты, а выход фильтра низкой частоты подключен к входу порогового устройства. 2. A device for detecting pauses in a speech signal, including an electro-acoustic transducer, a threshold device, characterized in that it contains a generator of a new measuring signal, an adder, two amplifier-amplitude limiters, a multiplier, a low-pass filter, and the output of the electro-acoustic transducer is connected to the first input of the adder and to the input of the first amplifier-amplitude limiter, the output of the generator of the new measuring signal is connected to the second input of the adder, the output of the adder is connected to the input to the second amplifier-amplitude limiter, the output of the second amplifier-amplitude limiter is connected to the first input of the multiplier, the output of the first amplifier-limiter is connected to the second input of the multiplier, the output of the multiplier is connected to the low-pass filter, and the output of the low-pass filter is connected to the input of the threshold device.