KR20090026504A

KR20090026504A - Method and apparatus for assessing audio signal spectrum

Info

Publication number: KR20090026504A
Application number: KR1020070091531A
Authority: KR
Inventors: 김현수
Original assignee: 삼성전자주식회사
Priority date: 2007-09-10
Filing date: 2007-09-10
Publication date: 2009-03-13

Abstract

The present invention is to evaluate the voice signal processing performance of any system utilizing the digital signal processing technology of the voice signal, to set the overall signal processing environment of any system, and to apply to any system under the set system environment The speech signal is coded and compressed according to the codec, the compressed speech signal is adapted to the set system environment, decoded according to the speech codec, reconstructed into a final synthesized speech signal, and then the original spectrum and the final The final synthesized spectrum of the reconstructed speech signal is compared using a predetermined spectrum evaluation method, and the distortion degree according to the difference is determined to evaluate the speech processing performance of an arbitrary system.

Description

Spectrum evaluation method and apparatus of speech signal TECHNICAL FIELD

The present invention relates to a digital signal processing method and apparatus for speech signals, and more particularly, to a method and apparatus for evaluating a spectrum of a speech signal recovered after speech signal processing.

Digital signal processing technology for converting analog voice signals into digital voice signals, coding and compressing them, decoding the compressed voice signals, combining them, and restoring them into analog voice signals is used in various fields. It is becoming. For example, it is used in technical fields such as voice communication, digital broadcasting, and voice recognition, and in addition, there are various technical fields in which digital signal processing technology of voice signals, such as storage and playback of video and music contents, is utilized. Accordingly, digital signal processing schemes for voice signals have been variously developed, and digital signal processing for voice signals of various schemes may be performed by a voice codec corresponding to each scheme.

Each voice codec has performance characteristics according to a codec-specific digital signal processing scheme, and thus, the distortion degree of the voice signal restored by the same voice codec is maintained at a constant level with respect to the original voice signal. However, since the type of the system to which the voice codec is applied or provided may be different, even the voice signal processed by the same voice codec, the reconstructed voice signal may have a different distortion degree than the original voice signal. In other words, the distortion of the reconstructed speech signal may be evaluated differently according to the characteristics of the system to which the speech codec is applied. The voice codec is a technology related to direct processing of a voice signal, that is, compression and reconstruction according to coding and decoding of the voice signal. However, due to the characteristics of a system to which the voice codec is applied, a compressed voice signal may be transmitted through a wired or wireless communication network. In order to improve the transmission characteristic of the audio codec, a process other than the speech signal processing unique to the speech codec may be added. Therefore, even if the same voice codec is used, the sound quality of the voice signal that is finally restored and provided to the user may vary depending on which system the voice codec is used. In this case, the system includes a system associated with each of various technical fields in which digital signal processing of a voice signal may be utilized. For example, a mobile communication system, a voice recognition system, a remote voice recognition system, a digital broadcasting system, and the like may be included.

Voice signals, on the other hand, include voiced sounds, unvoiced sounds and noise. Among them, voiced sound occupies the largest part of voice signal entropy and includes the most characteristic of voice signal. And the characteristic of the voiced sound is most effectively represented by the characteristic of the spectrum of the speech signal. Accordingly, in the digital signal processing of the speech signal, the processing performance of the voiced sound included in the speech signal has the most influence on the distortion degree, and in particular, the coding and decoding processing of the spectral features of the speech signal determine the sound quality of the restored speech signal.

According to this characteristic, the speech signal processing performance evaluation of the system to which the speech codec is applied is not the performance evaluation of the speech codec itself, but the speech distortion processing evaluation must be performed on the entire system to which the speech codec is applied, so that the valid speech signal processing performance can be evaluated. In addition, when applying a specific voice codec to any system, it is desirable to be able to evaluate the speech signal processing performance according to a certain system and a specific voice codec combination in advance before actually implementing any system completely. In particular, in the case of a system for transmitting and receiving a voice signal through a communication network, if the voice signal processing performance evaluation of the system is performed in advance, waste of resources may be prevented. Furthermore, if the speech signal processing performance according to a combination of any system and various speech codecs can be evaluated in advance, it is possible to identify a speech codec or a speech codec to be preferably applied to any system. In particular, if the spectral characteristics of the speech signal can be used to evaluate the speech signal processing performance of an arbitrary system, it will be possible to evaluate the speech signal processing performance more accurately using less signal processing.

Accordingly, the present invention can provide a speech spectrum evaluation method and apparatus capable of evaluating speech processing performance of a system to which a speech codec is applied.

In addition, the present invention can provide a speech spectrum evaluation method and apparatus capable of testing a speech codec exhibiting good speech signal processing performance when applied to any system utilizing digital signal processing of speech signals.

In addition, the present invention can provide a speech spectrum evaluation method and apparatus capable of evaluating speech processing performance by combining various speech codecs in any system without implementing any system utilizing digital signal processing of speech signals. In addition, it is possible to provide a method and apparatus for evaluating the speech processing capability of a system using an actual implemented speech signal according to the degree of spectral distortion.

In order to solve the above problems, the present invention is a voice signal spectrum evaluation method of a spectrum evaluation device for evaluating the voice signal processing performance of any system utilizing digital signal processing of a voice signal, the system of the any system Setting a signal processing environment according to a feature, setting a voice codec selected to be applied to the arbitrary system; and when a voice signal is input, according to the voice signal processing method of the voice codec under the set signal processing environment, the voice Analyzing the signal to detect the original spectrum, and processing the speech signal into a final speech signal according to the speech signal processing method of the speech codec under the set signal processing environment, and converting the final spectrum from the final speech signal. Detecting and the original spectrum And compared according to the rating system set spectrum the final spectrum pre-detecting the spectral distortion includes the step of evaluating the performance of audio signal processing.

The predetermined spectrum evaluation method may include a spectral flatness detection method, a segmental signal-to-noise ratio measure (Seg-SNR) method, a linear predictive coding parameter measure (LLR) method, and a log likelihood ratio (LLR). ) One of the measuring methods and the capstrum distance measuring method.

The process of detecting the original spectrum by analyzing the voice signal may include receiving a voice signal by activating a microphone which is the same as a microphone type set in the arbitrary system, and according to a noise removing method set according to the signal processing environment. Removing the noise included in the voice signal, converting the noise-removed voice signal into a voice signal in a frequency domain according to a voice signal processing method of the voice codec under the set signal processing environment, and analyzing the original signal Extracting the spectral features.

The detecting of the final spectrum from the final speech signal may include encoding or decoding the converted speech signal according to a speech signal processing scheme of the speech codec under the signal processing environment, and adapting the final speech signal to the final speech signal. Converting to a speech signal and detecting the final spectrum from the final speech signal.

The process of evaluating the speech signal processing performance may include detecting the distortion degree of the speech signal by comparing the original spectrum and the final spectrum according to the preset spectrum evaluation method, and comparing the predetermined reference value with the detected distortion degree. And notifying, according to the result, whether the spectral performance has been passed for the arbitrary system.

The final speech signal is a speech signal in a format finally converted by performing a final process of speech signal processing in the arbitrary system.

The present invention can evaluate the speech processing performance of a system to which the speech codec is applied according to the distortion degree of the speech spectrum, and exhibits good speech signal processing performance when applied to any system utilizing digital signal processing of speech signals. The voice codec can be tested, and the voice processing performance can be evaluated by combining various voice codecs in any system without implementing any system. As a result, it is possible to save resources related to the construction of an arbitrary system, and to perform an evaluation and inspection of whether a voice signal processing function suitable for providing a voice service is provided before being released into a real product.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Note that the same components in the drawings are represented by the same reference numerals and symbols as much as possible even though they are shown in different drawings. In addition, in describing the present invention, when it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

The present invention is to evaluate the voice signal processing performance of any system utilizing the digital signal processing technology of the voice signal, to set the overall signal processing environment of any system, and to apply to any system under the set system environment The speech signal is processed according to the speech codec, and the processed speech signal is adapted to the set system environment, thereby changing the speech signal to be finally used in the set system. Then, the original spectrum of the original speech signal and the final spectrum of the finally changed speech signal are compared using a predetermined spectrum evaluation method, and the degree of distortion according to the difference is determined to evaluate the speech processing performance of an arbitrary system. The arbitrary system may be any system that utilizes digital signal processing technology of a voice signal, such as a mobile communication system, a voice recognition system, a remote voice recognition system, a digital broadcast system, and the like. In addition, the digital signal processing technology of the voice signal may be various voice signal codecs. If any system is a far-field speech recognition system, a signal processing environment of the speech recognition system, for example, a type of a microphone collecting a voice signal, a method of removing noise from a signal collected from the microphone, and performing a recognition process Set the signal transmission environment to the server. The A voice codec to be included in the far speech recognition system is set. With the basic environment set up as described above, the present invention processes the voice signal according to the set microphone type and noise reduction method, and codes and compresses the voice signal according to the A voice codec. The compressed speech signal is then adapted to the set signal transmission environment. For example, if the compressed voice signal is larger than the set signal transmission standard, the transmission environment may be set to recompress and transmit the compressed voice signal in another manner, and to decompress the received voice signal. If such a transmission environment is set, the compressed voice signal is compressed / restored in a manner set for signal transmission. Then, the compressed speech signal is decoded and synthesized according to the A speech codec and converted into a final speech signal. In this case, the final voice signal is a voice signal used directly in a voice service process such as voice recognition or a call. In addition, the original spectrum of the original speech signal and the final spectrum of the final speech signal are compared by using a predetermined spectrum evaluation method to determine the degree of distortion, and the speech processing performance of the far speech recognition system is evaluated. The spectral evaluation method is a method of measuring distortion, a spectral flatness detection method, a segmental signal-to-noise ratio measure (SEG-SNR) method, a linear predictive coding parameter measure (linear predictive coding parameter measure) method The LLR (Log Likelihood Ratio) measuring method, the capstrum distance measuring method, and the spectral distortion measurement method newly defined according to the present invention may be set to any one.

According to an exemplary embodiment of the present invention, according to a voice codec set on a set system environment, the entire voice signal may be digitally processed, followed by extracting and evaluating a spectral component. It can also be evaluated. Accordingly, an embodiment of extracting and evaluating a spectral component after digital signal processing of the voice signal itself will be described with reference to FIGS. 1 and 2.

1 is a diagram showing the configuration of a first audio signal spectrum evaluation apparatus according to an embodiment of the present invention. Referring to FIG. 1, a first audio signal spectrum evaluation apparatus includes a voice signal input unit 10, a voice signal analysis unit 20, a final voice signal detection unit 30, a spectrum evaluation unit 40, and an environment setting unit 50. Include.

The environment setting unit 50 applies the digital signal processing technology of the voice signal to the voice signal input unit 10, the voice signal analyzer 20, the final voice signal detector 30, and the spectrum evaluator 40 in response to user selection. Set the signal processing environment, voice codec, and spectrum evaluation method of any system to be used.

The voice signal input unit 10 may include various types of microphones according to an embodiment of the present invention, and activates a microphone selected by the environment setting unit 50 to receive a voice signal. Alternatively, the voice signal input unit 10 includes a connection terminal to which a microphone is connected, and receives a voice signal through the connected microphone. Then, the noise included in the input voice signal is removed according to the noise removing method set by the environment setting unit 50 and output to the voice signal analyzing unit 20. In this case, the noise cancellation method and the type of microphone may be determined according to the signal processing environment of the arbitrary system or may be determined according to a user's selection.

The voice signal analyzer 20 includes a spectrum analyzer 21 and a spectrum coding unit 23, and inputs a voice signal according to a signal processing environment and a voice codec of an arbitrary system set by the environment setting unit 50. Using the Fourier Transform (FFT), the input voice signal is converted into a voice signal in the frequency domain using a fast fourier transform (FFT) and the like. The voice signal analyzer 20 outputs an original spectrum of the voice signal to the spectrum evaluator 40. The voice signal analyzer 20 may code and compress a voice signal according to a signal processing environment and a codec of an arbitrary system set by the environment setting unit 50, and compress the compressed voice signal into a final voice signal detector 30. Will output At this time, if the final speech signal of the arbitrary system is a compressed speech signal, the compressed spectrum characteristic of the compressed speech signal is extracted and output as the final spectrum to the spectrum evaluator 40. The final speech signal is a speech signal of a format finally converted by performing a final process of speech signal processing in the arbitrary system, and may be a speech signal compressed one or more times by a speech codec according to the type of the arbitrary system. It may also be a synthesized signal reconstructed by decoding the compressed speech signal. For example, if any system is speech recognition, the speech signal compared to the reference speech corresponds to the final speech signal. When any system is a communication system, the voice signal output by the receiving terminal to the caller corresponds to the final voice signal. In the case of a broadcast system, the voice signal output by the broadcast receiving device corresponds to the final voice signal. The final voice signal detection unit 30 uses the voice signal received from the voice signal analysis unit 20 according to the signal processing environment and voice codec of the arbitrary system set by the environment setting unit 50 to be used in the arbitrary system. Output as the final audio signal. Accordingly, the final speech signal detector 30 adapts the compressed speech signal input from the speech signal analyzer 20 to a signal processing environment of an arbitrary system, and decodes the compressed speech signal into a final speech signal as necessary. Can be. In other words, the final voice signal detector 30 may perform not only codec processing for the voice signal according to the voice codec, but also signal processing according to signal processing characteristics of an arbitrary system. For example, if a compressed voice signal is transmitted through a communication network due to an arbitrary system characteristic, the final voice signal detector 30 performs signal processing, that is, modulation and demodulation necessary for transmission. In this case, signal loss due to signal transmission may also be considered. Alternatively, if a voice signal is broadcast due to an arbitrary system characteristic, the compressed voice signal is converted into a form suitable for broadcasting, and demodulated again according to the received form at the receiving end. If the recovered signal is needed according to the system characteristic, the speech signal is decoded and converted into a synthesized speech signal. Meanwhile, if the final voice signal is a voice signal output from the voice signal analyzer 20, the final voice signal detector 30 bypasses the voice signal output from the voice signal analyzer 20. The final speech signal detector 30 extracts the final spectrum from the final speech signal and outputs the final spectrum to the spectrum evaluator 40.

The spectrum evaluator 40 detects the distortion degree of the speech signal by comparing the original spectrum and the final spectrum according to the spectrum evaluation method set by the environment setting unit 50. The spectral evaluation unit 40 compares the predetermined reference value with the detected distortion degree and informs whether the spectral performance of the arbitrary system is passed according to the result. That is, if the detected distortion level is less than the reference value, the spectral performance of any system is passed, and thus, even if the currently set voice codec is applied to any system, a sound quality level higher than the reference value can be expected. If the detected distortion is less than or equal to the reference value, the spectral performance of any system is rejected. Thus, if the currently set voice codec is applied to any system, sub-standard sound quality will be provided. Accordingly, different types of voice codecs can be set to evaluate voice signal processing performance of an arbitrary system.

In the spectrum evaluation method set in the spectrum evaluation unit 40 of the first audio signal spectrum evaluation device, it is preferable that a newly presented spectrum distortion measurement method is set according to an embodiment of the present invention. Equation defining the spectral distortion measurement method can be derived as follows.

In the spectral distortion measure for calculating the spectrum of the original speech signal and the compressed speech signal, the spectrum S (ω) of the linear predictive polynomial of the P th order is expressed by Equation 1.

Where S (ω) is the spectrum,

Is the nth linear predictive coefficient. For n> P the corresponding linear prediction coefficient is zero. Where N represents the number of spectral samples and is usually chosen to be much larger than P. The spectral distortion (SD) according to this is shown in Equation 2.

dB

Where M is the total number of frames used,

Wow

Denote the original spectral characteristics of the original speech signal and the final spectral characteristics of the final speech signal, respectively, for the m th frame. The spectral feature may use a linear prediction based feature. And

May be a compressed spectral characteristic of the compressed speech signal or a composite spectral characteristic of the synthesized speech signal according to system characteristics.

A measure of spectral distortion can be seen as a measure of the distance between the original speech signal and the smoothed spectra of the synthesized speech signal. To find smoothed spectra, cepstral or filter-bank analysis techniques can be used. And other weights for spectral distortion, such as using a different standard instead of the Euclidean standard used in Equation 2, or a fixed frequency domain weighting filter or variable weights calculated from the original and compressed spectra. Other methods such as the use of can be used. For example, a spectral flatness detection method, a segmental signal-to-noise ratio measure (Seg-SNR) method, a linear predictive coding parameter measure method, and a log likelihood ratio measurement method , Spectral distortion can be measured using a capstrum distance measurement method.

The first audio signal spectrum evaluation apparatus may have the following assumptions according to an embodiment of the present invention. The first audio signal spectrum evaluation device is configured on the assumption that it analyzes and compresses the whole, but not the part of the speech signal, and decodes the compressed speech signal. That is, the compressed speech signal coded and compressed by the speech signal analyzer 20 may be regarded as a signal coded by the set speech codec. When measuring the distortion according to Equation 2, the residual signal obtained by passing the voice signal through an inverse filter (LP filter) of the vocal tract, and then of the vocal tract filter It can be assumed that the quantized version passes. And it can be assumed that the excitation is correctly reconstructed by the excitation codebook, or the occurrence error is the white spectrum. Since the first spectrum evaluation apparatus digitally processes the entire speech signal according to the set speech codec, it is possible to compare the original spectral characteristics and the final spectral characteristics interacted with other components of the speech signal during the speech signal processing. . That is, it is possible to comprehensively evaluate spectral distortion that may occur due to the influence of other components of the speech signal as well as distortion that occurs only in the spectral conversion process of the speech signal. In addition, since the first spectrum evaluation apparatus sets the entire voice codec, the spectral characteristics and the original spectral characteristics of the speech signal corresponding to the intermediate stage of coding and decoding by the speech codec may be compared.

2 illustrates an operation of the first spectrum evaluation device configured as described above. FIG. 2 is a flowchart illustrating an operation of a first spectrum evaluation apparatus according to a system characteristic when the final speech signal is a decoded synthesized speech signal. Referring to FIG. 2, in operation 101, the first spectrum estimation apparatus receives an input for selecting a system environment to which the voice codec is applied and a type of the voice codec, and proceeds to step 103. In step 103, the first spectrum evaluation apparatus sets the selected system environment and voice codec and proceeds to step 105. In step 105, when the first spectrum evaluation apparatus receives the voice signal, the first spectrum evaluation apparatus proceeds to step 107 to analyze the voice signal, detects an original spectrum characteristic parameter, and proceeds to step 109. In operation 109, the first spectrum evaluation apparatus compresses a speech signal, identifies a compressed spectrum characteristic parameter in the compressed speech signal, and proceeds to operation 111. In operation 111, the first spectrum evaluation apparatus synthesizes the final speech signal, extracts the final synthesis spectrum, and proceeds to operation 113. In operation 113, the first spectrum evaluation apparatus compares the original spectrum and the final synthesized spectrum according to the set spectrum evaluation method to determine the degree of spectral distortion and proceeds to step 115. In operation 115, the first spectrum evaluation apparatus determines whether the spectral distortion is less than or equal to the reference value, and if it is less than or equal to the reference value, proceeds to step 117, notifying that the spectral performance of the selected system has passed. Notify.

On the other hand, according to another embodiment of the present invention, the spectrum evaluation apparatus may be configured as shown in FIG. 2 illustrates a configuration of a second spectrum evaluation apparatus according to another embodiment of the present invention, and the second spectrum evaluation apparatus evaluates the speech spectrum by coding / decoding only the spectral components of the speech signal. Referring to FIG. 2, the apparatus for evaluating a second voice signal spectrum includes a voice signal input unit 210, a spectrum analyzer 220, a spectrum coding unit 230, a final spectrum synthesizer 240, a spectrum evaluator 250, and an environment. The setting unit 260 is included.

The environment setting unit 260 utilizes digital processing techniques of the voice signal in the voice signal input unit 210, the spectrum analyzer 220, the final spectrum synthesizer 240, and the spectrum evaluator 250 in response to user selection. The signal processing environment, voice codec, and spectrum evaluation method of an arbitrary system are set.

The voice signal input unit 210 may include various types of microphones according to an embodiment of the present invention, and activates a microphone selected by the environment setting unit 50 to receive a voice signal. Alternatively, the voice signal input unit 210 includes a connection terminal to which a microphone is connected, and receives a voice signal through the connected microphone. Then, the noise included in the input voice signal is removed according to the noise removing method set by the environment setting unit 260 and output to the spectrum analyzer 220. In this case, the noise cancellation method and the type of microphone may be determined according to the signal processing environment of the arbitrary system or may be determined according to a user's selection.

The voice signal analyzer 220 uses a fast fourier transform (FFT) or the like to input a voice signal input according to a signal processing environment and a voice codec of an arbitrary system set by the environment setting unit 260. The speech signal on the domain is converted to the speech signal on the frequency domain and analyzed to extract the original spectral features. The voice signal analyzer 220 outputs the original spectrum of the voice signal to the spectrum coding unit 230 and the spectrum evaluator 260. The spectral coding unit 230 codes and compresses the input original spectrum, extracts the compressed spectral characteristics, adapts the compression spectrum to a signal processing environment of an arbitrary system, and outputs the final spectral synthesizer 30. In other words, the spectrum coding unit 230 may perform not only speech signal processing according to a voice codec but also signal processing according to a signal processing characteristic of an arbitrary system.

The final spectrum synthesizer 240 decodes an input compressed spectrum according to a signal processing environment and a voice codec of an arbitrary system set by the environment setting unit 260, synthesizes the final compressed spectrum into a final spectrum, and outputs the result to the spectrum evaluator 250. do. The final spectrum is a spectrum that is finally synthesized and reconstructed according to a signal processing environment and a voice codec of an arbitrary system, and is a speech signal spectrum that is finally obtained due to an arbitrary system characteristic.

The spectrum evaluator 250 detects a distortion degree of the speech signal by comparing the original spectrum and the final synthesized spectrum according to the spectrum evaluation method set by the environment setting unit 260. In addition, the spectrum evaluation unit 250 compares the predetermined reference value with the detected distortion degree and notifies whether the spectral performance of the arbitrary system is passed according to the result. That is, if the detected distortion level is less than the reference value, the spectral performance of any system is passed, and thus, even if the currently set voice codec is applied to any system, a sound quality level higher than the reference value can be expected. If the detected distortion is less than or equal to the reference value, the spectral performance of any system is rejected. Thus, if the currently set voice codec is applied to any system, sub-standard sound quality will be provided. Accordingly, different types of voice codecs can be set to evaluate voice signal processing performance of an arbitrary system.

The spectrum evaluator 250 of the second spectrum evaluation apparatus evaluates the spectrum using a spectrum distortion measurement method defined as Equation 2 according to the present invention.

The second spectrum evaluation device does not use the compression spectrum parameters for the selected speech codec, is not affected by the static performance of the compression spectrum, and the dynamic behavior of the compression (inter-frame effects). Reflects well. In contrast, the first spectral distortion measurement method reflects static distortion.

In addition, the second spectrum evaluation apparatus may selectively set only the codec process associated with the spectrum without actually implementing or setting a complete voice codec. Therefore, it may not be affected by the distortion caused by other parts of the selected voice codec.

In addition, the evaluation of the new spectral coding method is very simple, and spectral performance evaluation is possible without making any assumptions about the rest of the speech codec algorithm. In another aspect, this method also means that the appropriate choice of short-term spectrum quantization scheme for a particular speech coding method is independent of other parts of the codec. And the spectral evaluation method by the second spectrum evaluation device is also very closely related to the result of subjective sound quality judgment, which takes into account the short-term spectral envelope of the linear prediction model. This can be a good model for the human auditory system.

An operation process of the second spectrum evaluation device configured as described above is illustrated in FIG. 4. Referring to FIG. 4, in operation 301, the second spectrum evaluation apparatus receives an input for selecting a system environment to which the voice codec is applied and a type of the voice codec, and proceeds to step 303. In operation 303, the second spectrum evaluation apparatus sets the selected system environment and voice codec, and proceeds to operation 305. In step 305, if the second spectrum evaluation apparatus receives the voice signal, the second spectrum evaluation apparatus proceeds to step 307 to analyze the voice signal, detects an original spectrum feature parameter, and proceeds to step 309. In operation 309, the second spectrum evaluation apparatus compresses the original spectrum, determines a spectrum characteristic parameter of the compressed spectrum, and proceeds to operation 311. In operation 311, the second spectrum evaluation apparatus synthesizes the final spectrum and proceeds to operation 313. In operation 313, the second spectrum evaluation apparatus compares the original spectrum and the final synthesized spectrum according to the set spectrum evaluation method to determine the degree of spectral distortion and proceeds to step 315. In operation 315, the second spectrum evaluation apparatus determines whether the spectral distortion is less than or equal to the reference value, and if it is less than or equal to the reference value, proceeds to step 317 and notifies that the spectral performance of the selected system has passed. Notify.

Since the first spectrum evaluation apparatus compresses and restores the speech signal itself according to the set speech codec, the distortion degree of the final spectrum may be affected by other components constituting the speech signal in addition to the spectrum depending on the type of the speech codec. . On the contrary, since the second spectrum evaluation apparatus extracts only the spectral components from the speech signal and compresses and restores the extracted spectral components according to the set speech codec, the second spectrum evaluation apparatus can determine the degree of distortion of the spectral components itself compared to the first spectrum evaluation apparatus. . Therefore, the speech processing performance of the system can be evaluated by selecting an appropriate device from the first spectrum evaluation device and the second spectrum evaluation device according to the type and characteristics of the system utilizing the digital signal processing of the voice signal.

The example which applies this invention to a network robot system is demonstrated. In the network robot system, the network robot processes the voice signal input through the microphone from the robot terminal, removes the noise and the echo sound through the voice preprocessing unit, and then codes (compresses) the spectral feature extracted from the voice signal itself or the voice signal. Transfer to the server at The remote server decodes the transmitted voice signal or voice feature signal, restores the synthesized voice signal, synthesizes the voice signal, and performs a voice service. For example, when the voice service is a voice call, voice communication with the receiver may be used as an input signal of the voice recognizer.

The speech performance evaluation method according to the present invention is applied to such a network robot system as follows. First, if the network robot system is simulated with the first spectrum evaluation device, the environment setting unit 50 may select a voice codec selected from a signal processing environment of the network robot system and at least one voice codec available by the network robot system. 10) the voice signal analysis unit 20 and the final voice signal detection unit 30 are set, and the spectrum evaluation unit 40 sets the spectrum evaluation method. That is, the voice signal input unit 10 comprises a microphone and a preprocessor according to the configuration of the robot terminal, and the voice signal processing method of the robot terminal and the voice signal of the server in the voice signal analyzer 20 and the final voice signal detector 30. Simulate processing method and voice signal transmission environment. Then, the voice signal is input and processed, the spectrum distortion is measured, and the voice performance of the network robot system to be evaluated is evaluated. At this time, if the speech spectrum distortion is out of the reference value, it can be seen that the network robot lacks the voice performance to provide a voice service. If the above evaluation is in the production stage of the network robot, it is possible to re-evaluate after selecting the appropriate voice codec through reworking or debugging, and if it is the development stage of the overall system, it is possible to cause the distortion of the system components. After refining and improving, the reevaluation can be done. This process can be similarly applied to the second spectrum evaluation apparatus. By calculating the spectral distortion of the input and output signals of a specific system, it is possible to measure the spectrum-based sound quality of voiced sound sections with a large amount of information, and to evaluate and evaluate the voice performance before actual system construction. For example, simulation of codec, noise processing, etc. is possible.

In the above description of the present invention, specific embodiments have been described, but various modifications can be made without departing from the scope of the present invention. In the above example, the first spectrum evaluation device and the second spectrum evaluation device are independently configured. However, in the embodiment of the present invention, the first spectrum evaluation device may be included in one device. In addition, although the final voice signal detector 30 and the final spectrum synthesizer 240 are configured to simulate the signal processing environment according to an arbitrary system feature, not the voice signal processing environment by the voice codec, the voice signal analyzer 20 or the spectral coding unit 230 may be configured, or may be configured to be an independent component. Alternatively, the voice signal analyzer 20 and the final voice signal detector 30 may be configured as a single voice signal processor, and the spectrum analyzer 220, the spectral coding unit 230, and the final spectrum synthesizer 240 may be used. ) May be configured as one voice signal processing unit. Therefore, the scope of the present invention should not be defined by the described embodiments, but should be defined by the equivalent of claims and claims.

1 is a view showing the configuration of a first spectrum evaluation apparatus according to an embodiment of the present invention;

2 is a view showing an operation process of the first spectrum evaluation apparatus according to an embodiment of the present invention;

3 is a view showing the configuration of a second spectrum evaluation apparatus according to another embodiment of the present invention;

4 is a view showing the operation of the second spectrum evaluation apparatus according to another embodiment of the present invention.

Claims

A voice signal spectrum evaluation method of a spectrum evaluation device for evaluating voice signal processing performance of an arbitrary system utilizing digital signal processing of a voice signal,

Setting a signal processing environment according to system features of the arbitrary system, and setting a voice codec selected to be applied to the arbitrary system;

When the audio signal is input, detecting the original spectrum by analyzing the voice signal according to the voice signal processing method of the voice codec under the set signal processing environment;

Processing and converting the speech signal into a final speech signal according to a speech signal processing scheme of the speech codec under the set signal processing environment, and detecting a final spectrum from the final speech signal;

And comparing the original spectrum and the final spectrum according to a predetermined spectrum evaluation method to detect spectral distortion to evaluate speech signal processing performance.

The method of claim 1, wherein the equation for calculating the spectral distortion degree according to the preset spectrum evaluation method is as follows.

[Equation 2]

dB

Provided that M is the total number of frames used, N is the number of spectral samples,

Is the original spectral characteristic of the original speech signal for the mth frame,

Is the final spectral characteristic of the final speech signal for the mth frame.

The method of claim 1, wherein the predetermined spectrum evaluation scheme is a spectrum flatness detection scheme, a segmental signal-to-noise ratio measure scheme, and a linear predictive coding parameter measure scheme. , LLR (Log Likelihood Ratio) measuring method, the capturing spectrum spectrum method characterized in that any one of the method.

According to any one of claims 2 and 3, wherein the process of detecting the original spectrum by analyzing the voice signal,

Receiving a voice signal by activating a microphone identical to a microphone type set in the system;

Removing noise included in the voice signal according to a noise removing method set according to the signal processing environment;

And converting the speech signal from which the noise is removed into a speech signal in a frequency domain according to the speech signal processing scheme of the speech codec under the set signal processing environment, and extracting the original spectrum feature. Voice spectrum evaluation method.

The method of claim 4, wherein the detecting of the final spectrum from the final speech signal comprises:

Encoding or decoding the converted speech signal according to the speech signal processing scheme of the speech codec under the signal processing environment, and adapting the converted speech signal to the final speech signal;

Detecting the final spectrum in the final speech signal.

The method of claim 5, wherein the evaluating the voice signal processing performance comprises:

Comparing the original spectrum and the final spectrum according to the preset spectrum evaluation method and detecting a distortion degree of a speech signal;

Comparing the detected degree of distortion with a predetermined reference value and notifying whether or not the spectral performance has been passed for the arbitrary system according to the result.

7. The method of claim 6, wherein the final speech signal is a speech signal in a format finally converted by performing a final process of speech signal processing in the arbitrary system.

In the speech signal spectrum evaluation method of the spectrum evaluation device for evaluating the speech processing performance of any system utilizing digital signal processing of the speech signal,

Converting the original spectrum into a final spectrum according to a speech signal processing scheme of the speech codec under the set signal processing environment;

The speech spectrum evaluation method according to claim 8, wherein the equation for calculating the spectral distortion degree according to the predetermined spectrum evaluation method is as follows.

[Equation 2]

dB

10. The method of claim 9, wherein the predetermined spectrum evaluation scheme is a spectrum flatness detection scheme, a segmental signal-to-noise ratio measure scheme, and a linear predictive coding parameter measure scheme. , LLR (Log Likelihood Ratio) measuring method, the capturing spectrum spectrum method characterized in that any one of the method.

The method of claim 9, wherein the process of detecting the original spectrum by analyzing the voice signal comprises:

The method of claim 11, wherein the converting of the original spectrum into a final spectrum is performed.

In the signal processing environment, encoding the original spectrum and converting the original spectrum into a compressed spectrum according to a speech signal processing scheme of the speech codec;

Converting the compressed spectrum according to the signal processing environment;

And in the signal processing environment, decoding, synthesizing and converting the converted compressed spectrum into the final spectrum according to a speech signal processing scheme of the speech codec.

The method of claim 12, wherein the evaluating the speech signal processing performance comprises:

Comparing the original spectrum with the final spectrum according to the preset spectrum evaluation method and detecting a distortion degree of a voice signal;

Comparing the detected degree of distortion with a predetermined reference value and notifying whether the spectral performance has been passed for the arbitrary system according to the result.

The method of claim 13, wherein the final speech signal is a speech signal in a format finally converted by performing a final process of speech signal processing in the arbitrary system.

A spectrum evaluation apparatus for evaluating speech signal processing performance of an arbitrary system utilizing digital signal processing of speech signals,

An environment setting unit for setting a signal processing environment according to system characteristics of the arbitrary system in the spectrum evaluation device and setting a voice codec selected to be applied to the arbitrary system;

A voice signal input unit configured to receive, process, and output a voice signal according to the signal processing environment and the voice codec setting;

Analyzing the voice signal input from the voice signal input unit according to the voice signal processing method of the voice codec under the signal processing environment, detecting the original spectrum, and the voice signal processing method of the voice codec under the set signal processing environment. A voice signal processor for processing the voice signal and converting the voice signal into a final voice signal and detecting a final spectrum from the final voice signal;

And a spectral evaluation unit for comparing the original spectrum and the final spectrum, input from the voice signal processing unit, according to a preset spectral evaluation scheme, to detect spectral distortion, and to evaluate voice signal processing performance. Spectrum evaluation device.

16. The apparatus of claim 15, wherein the equation for calculating the spectral distortion degree according to the predetermined spectrum evaluation method is as follows.

[Equation 2]

dB

The method of claim 15, wherein the predetermined spectrum evaluation method in the spectrum evaluation unit is a spectrum flatness detection method, a segmental signal-to-noise ratio measure (SEG-SNR) method, a linear predictive coding parameter measurement (linear predictive) An apparatus for evaluating speech spectrum, characterized in that any one of a coding parameter measure method, a log likelihood ratio (LLR) measurement method, and a capstrum distance measurement method.

The system according to any one of claims 16 and 17, wherein the voice signal input unit receives the voice signal by activating a microphone identical to a microphone type set in the arbitrary system, and removes noise set according to the signal processing environment. The speech spectrum evaluation apparatus, characterized in that for removing the noise included in the speech signal to output to the speech signal processor.

The voice signal processor of claim 18, wherein the voice signal processor converts and analyzes a voice signal input from the voice signal input unit into a voice signal in a frequency domain according to a voice signal processing method of the voice codec under the set signal processing environment. Extract the original spectrum, code or decode the converted speech signal, adapt it according to the signal processing environment, convert to the final speech signal, and detect the final spectrum from the final speech signal Evaluation device.

The method of claim 19, wherein the spectrum evaluator detects a distortion degree of a speech signal by comparing the original spectrum and the final spectrum according to the preset spectrum evaluation method, and compares the predetermined reference value with the detected distortion degree. And notifying whether the spectral performance of the arbitrary system is passed according to the result.

21. The apparatus of claim 20, wherein the final speech signal is a speech signal in a format finally converted by performing a final process of speech signal processing in the arbitrary system.

A spectrum evaluation apparatus for evaluating speech processing performance of an arbitrary system utilizing digital signal processing of speech signals,

A voice that analyzes the voice signal from the voice signal input unit according to a voice signal processing method of the voice codec under the set signal processing environment, detects an original spectrum, and converts the original spectrum into a final spectrum. A signal processor,

And a spectral evaluation unit for comparing the original spectrum and the final spectrum input from the voice signal processing unit according to a preset spectrum evaluation scheme to detect spectral distortion to evaluate voice signal processing performance. Device.

23. The apparatus of claim 22, wherein the equation for calculating the spectral distortion degree according to the predetermined spectrum evaluation method is as follows.

[Equation 2]

dB

24. The method of claim 23, wherein the predetermined spectrum evaluation method is a spectrum flatness detection method, a segmental signal-to-noise ratio measure (SEG-SNR) method, a linear predictive coding parameter measure method. Voice spectrum evaluation apparatus, characterized in that any one of the method, Log Likelihood Ratio (LLR) measurement method, the capstrum distance measurement method.

The apparatus of claim 23, wherein the voice signal input unit receives the voice signal by activating a microphone identical to a microphone type set in the system, and removes noise set according to the signal processing environment. The speech spectrum evaluation apparatus, characterized in that for removing the noise included in the speech signal to output to the speech signal processor.

The voice signal processor of claim 25, wherein the voice signal processor converts the voice signal from which the noise is removed to a voice signal in a frequency domain according to a voice signal processing method of the voice codec under the set signal processing environment, and analyzes the original spectrum. Extract, code and convert the original spectrum into a compressed spectrum, convert the compressed spectrum according to the signal processing environment, decode and synthesize the converted compressed spectrum into the final spectrum Evaluation device.

The method of claim 26, wherein the spectrum evaluator detects a distortion degree of a speech signal by comparing the original spectrum and the final spectrum according to the preset spectrum evaluation method, and compares the predetermined reference value with the detected distortion degree. And a voice spectrum evaluation device according to a result of notifying whether or not the spectral performance of the arbitrary system is passed.

28. The apparatus of claim 27, wherein the final speech signal is a speech signal in a format finally converted by performing a final process of speech signal processing in the arbitrary system.