KR20100036894A - Voice feature code book optimization device and method of vector quantization base - Google Patents

Voice feature code book optimization device and method of vector quantization base Download PDF

Info

Publication number
KR20100036894A
KR20100036894A KR1020080096316A KR20080096316A KR20100036894A KR 20100036894 A KR20100036894 A KR 20100036894A KR 1020080096316 A KR1020080096316 A KR 1020080096316A KR 20080096316 A KR20080096316 A KR 20080096316A KR 20100036894 A KR20100036894 A KR 20100036894A
Authority
KR
South Korea
Prior art keywords
codebook
distortion
vector
vector quantization
quantization
Prior art date
Application number
KR1020080096316A
Other languages
Korean (ko)
Inventor
김현수
Original Assignee
삼성전자주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자주식회사 filed Critical 삼성전자주식회사
Priority to KR1020080096316A priority Critical patent/KR20100036894A/en
Publication of KR20100036894A publication Critical patent/KR20100036894A/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • G10L19/038Vector quantisation, e.g. TwinVQ audio
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

PURPOSE: A voice feature code book apparatus for optimizing and method of the vector quantization base exhibit the speech spectrum compression method of the vector quantization base. In that way the feature compression for the feature electrical transmission is optimized in the distributed voice recognition system. CONSTITUTION: An initialization part(100) establishes the base complement north. The sorter(200) calculates the queue parameter. Sorter stands in line the code vector of the code book. The classifier(300) classifies the whole training data base into the code book. The code book uses the vector quantization. If the total distortion extent is the threshold or less, the code book application part(500) closes.

Description

Voice feature code book optimization device and method of vector quantization base

The present invention relates to a method for compressing a speech spectrum based on vector quantization, and can be used when speech features need to be compressed for services such as speech codec and recognition.

In general, a distributed speech recognition system includes an embedded speech recognition system, and the front end and the back end are composed of a client implemented in a terminal and a server including network speech recognition (NSR). Is transmitted and recognition is performed at the remote server.

1 is an LSF-based quantization method for a conventional speech codec.

However, in the conventional LSF-based quantization method, there is a problem in that there is no feature compression scheme of a speech spectrum applicable to a distributed speech recognition system.

Accordingly, an object of the present invention is to provide a vector quantization-based speech spectral feature compression method that can be applied to a distributed speech recognition system. And a method.

According to an aspect of an apparatus for optimizing speech feature codebook based on vector quantization according to an embodiment of the present invention for achieving the above object,

Figure 112008068909093-PAT00005
An initialization unit set to be; Calculate the alignment parameters,
Figure 112008068909093-PAT00006
An alignment unit for arranging code vectors of the codebook; Full training database using vector quantization
Figure 112008068909093-PAT00007
A classification unit classified into a codebook; A distortion determining unit that determines whether or not the total distortion level is less than or equal to the threshold value; And if the total distortion level is less than or equal to a threshold in the distortion determining step, sets m + 1 to m if not less than or equal to a threshold;
Figure 112008068909093-PAT00008
The codebook may include a codebook application unit that replaces the calculation with a median of the training vectors assigned to Cn during classification and then proceeds to the classification step.

The distortion determiner determines the initial codebook C after compressing the entire training database by the selected quantization table, sets a lookup table that maps each codeword to itself, and sets a codeword associated with combinations prohibited in the codebook. , The resulting codebook size is N, set n = 1, and the codebook

Figure 112008068909093-PAT00009
Temporarily remove the code vector, replace it with maps in the code table, change the compressed training set to a new lookup table, and evaluate dm (the total distortion between the original and quantized training databases) to see if n <N. If it is not n <N, then search for m = index {min (dm / 1 = <n <N)}.
Figure 112008068909093-PAT00010
Removes the
Figure 112008068909093-PAT00011
Update the cells in the lookup table that was mapped to to the nearest code vector, replace the compression value of the associated training set with the new value, apply the lookup table, and determine whether N is the size required by the codebook. If it is the size required by, exit.

On the other hand, the distortion determining unit evaluates the distortion, if n <N, in the lookup table

Figure 112008068909093-PAT00012
Restore the code vector and its original map, set n + 1 to n, and then temporarily remove it.

On the other hand, the distortion determining unit, if N is not the size required by the codebook in the termination test, N-1 is set to N, n = 1 and temporarily removed.

This distortion determination unit uses a distortion evaluation algorithm.

According to an aspect of the method of optimizing a speech feature codebook based on vector quantization according to an embodiment of the present invention,

Figure 112008068909093-PAT00013
An initialization step of setting to; Calculate the alignment parameters,
Figure 112008068909093-PAT00014
A sorting step of sorting code vectors of the codebook; Full Training Database Using Vector Quantization
Figure 112008068909093-PAT00015
A classification step of classifying into a codebook; A distortion determination step of determining whether or not the total distortion level is less than or equal to the threshold; And if the total distortion level is less than or equal to a threshold in the distortion determining step, sets m + 1 to m if not less than or equal to a threshold;
Figure 112008068909093-PAT00016
The codebook may include a codebook application step of replacing the calculation with the median of the training vectors assigned to Cn during classification and then proceeding to the classification step.

The distortion determination step may include: a table initialization step of compressing the entire training database by the selected quantization table, determining an initial codebook C, and setting a lookup table for mapping each codeword to T; An initial removal step of removing the codeword associated with the forbidden combinations in the codebook and setting the resulting codebook size to N and n = 1; In the codebook

Figure 112008068909093-PAT00017
Temporarily removing the code vector and replacing it with maps in the code table; A distortion evaluation step of changing the compressed training set to a new lookup table and evaluating dm (total distortion between original and quantized training database) to determine if n <N; In the distortion evaluation step, if n <N, search for m = index {min (dm / 1 = <n <N)}, and in the codebook
Figure 112008068909093-PAT00018
Removes the
Figure 112008068909093-PAT00019
A detailed codebook applying step of updating the cells in the lookup table mapped to to the nearest code vector, simultaneously changing the compression value of the associated training set to a new value, and applying the lookup table; And a termination test step of determining whether N is a size required by the codebook and ending if N is a size required by the codebook.

If n <N in the distortion evaluation step, in the lookup table

Figure 112008068909093-PAT00020
The method may further include a first setting step of restoring the code vector and the original map, setting n + 1 to n, and then proceeding to a temporary elimination step. , N-1 may be set to N, and n = 1 may be further included.

In addition, the distortion determination step uses a distortion evaluation algorithm.

As described above, the apparatus and method for optimizing a speech feature codebook based on vector quantization according to the present invention provides a speech spectral compression method based on vector quantization, thereby providing feature compression for feature transmission that can be used in a distributed speech recognition system. There is an excellent effect of optimizing.

Hereinafter, a preferred embodiment of a vector quantization-based speech feature codebook optimization apparatus and method according to the present invention will be described in detail with reference to the accompanying drawings. At this time, it will be understood by those of ordinary skill in the art that the system configuration described below is a system cited for the purpose of the present invention and does not limit the present invention to the following system.

1 is a diagram showing the configuration of a speech feature codebook optimization apparatus based on vector quantization according to an embodiment of the present invention. The apparatus for quantizing speech feature codebook based on vector quantization according to the present invention includes an initialization unit 100 and an alignment unit 200. ), A classification unit 300, a distortion determination unit 400, and a codebook application unit 500.

First, indexes can be considered scalar variables. However, neighbors of multidimensional vectors can only be represented as vectors. Therefore, the structure of the lookup table for storing neighbor information will be more complicated than a single small size lookup table.

This process is described in H. Abut ed., Vector Quantization, New York, USA, IEEE Press, 1990, E. Agrell, "Spectral coding by fast vector quantization", IEEE Workshop on Speech Coding for Telecommunications, pp. The process of vector quantization is illustrated by 61-62, 1993 et al.

The second point to consider is the approximation process of selecting the central codeword.

This method can be simplified sufficiently by means of low computational cost.

On the other hand, it must be desirable not to discard the complete exact boundary of the vector space.

Finally, the number of codevectors that are neighbors for the central codeword in the internal search will not be too large to reduce the computational efficiency of the quantization method.

Also, it should not be too few to get the best codeword for internal search.

Nevertheless, there is a tradeoff between these aspects in the selection of the number of code vectors in the internal search. And compromises are needed.

In the following description, classified codebook vector quantization is introduced as a way to efficiently perform the basic concepts.

Sorting the codewords using unique functions is an important idea that can simplify the structure of the neighbor information lookup table, and in some cases eliminates the memory conditions of such tables.

Furthermore, the central codeword is chosen by scalar quantization based on a phased approach and is a low complexity technique.

When we start looking at vector quantization with a general formal definition,

Figure 112008068909093-PAT00021
(For example
Figure 112008068909093-PAT00022
Belonging to)
Figure 112008068909093-PAT00023
-Dimensional input vector
Figure 112008068909093-PAT00024
Figure 112008068909093-PAT00025
Is
Figure 112008068909093-PAT00026
Size
Figure 112008068909093-PAT00027
Assume that given with the codebook. Then define the sort parameters as follows:

[Equation 1]

Figure 112008068909093-PAT00028

Here, a scalar by definition is selected from the above-mentioned method, which is an adjacent object vector having a neighbor value of "s".

This function will later call the sorting function.

For example, the sum of each element of the input vector or each element of the input vector can be used in the sorting function. This will appear later in more detail.

First, vector

Figure 112008068909093-PAT00029
Wow
Figure 112008068909093-PAT00030
The index of the codebook is sorted in ascending order of the parameters (parameters) sorted for each code vector.

As s quantization in the first step, s is a scalar quantized using S as the quantization table that codes the object vector.

Assumed

Figure 112008068909093-PAT00031
Is
Figure 112008068909093-PAT00032
And scalar quantization. In other words, it is the same as [Equation 2].

[Equation 2]

Figure 112008068909093-PAT00033

Where index (i) is the central index. It is related to the central codeword.

The central index is an input-dependent parameter and changes each input vector to it.

So in general it is irrelevant to the index of the codevector in the middle of the codebook. (For example, N / 2 and codewords like indexes)

In the next step, the objective vector is a quantized vector using extensive search in the vicinity of the central index.

For example, one can search for a codevector with indices in the range "i + k" to "i-k + 1". Where "k" is an offset value.

The computational complexity of this method increases linearly with "k". That is, it is usually set much less than N.

Obviously, at this stage some weighting functions are used for internal searching.

As will be explained later, in order to apply the spectral code, the selection of the sorting function is an important issue in the sorted codebook vector quantization method.

In addition to performing the quantization process, computational complexity and required memory depend on the choice of this element.

For example, if an improper sorting function is selected, the array of codevectors in the codebook does not reflect those of the proper neighbors.

In other words, a codevector with adjacent indices will not be a neighboring codevector in vector space, causing bad quality quantization for the offset value k.

On the other hand, the selection of a suitable sorting function is a result of the setting of the sorting parameter that provides the neighboring environment you want to have.

However, it may require complex mathematical functions, either to increase arithmetic numerical cost (in internal search storage) or to expand memory requirements (for storage of sorting parameters associated with codevectors) or both.

Since LSF quantization is a classified codebook vector quantization is an entirely new vector quantization method, LSF quantization in this paper is the first standard for future applications. Moreover, this method can be improved in the future.

It is proposed by us to divide the entire LSF vector into three auxiliary vectors, similar to that of the same grouping scheme and coarse vector quantization method.

The division of LSFs into three groups provides subvectors with strong correlations between their elements. It essentially increases the correlation between the aligned parameters and the composition of each subvector. This strong correlation here enhances the performance of the vector quantization method.

The quantization method for each auxiliary vector is as follows.

First, the sorted parameters for each input subvector are calculated by finding the function g (.).

Second, the central codeword is selected by evaluating the distances between the aligned parameter associated with the codeword in the codebook and the aligned parameter matching the input subvector.

The nearest neighbor rule is used for this choice.

The codewords in the codebook are presorted according to the instructions of their ordered parameters, and the selection of the central codeword (relative central index) is only a simple scalar quantization.

Do not store the alignment parameters and just recalculate them at the quantization interval.

The stepwise approach notes in this situation that only the alignment parameters of Log (N) need to be calculated because scalar quantization is applied.

For very simple sort functions, g (.), For example, is a technique that would be particularly suited to the sum of elements in a subvector.

Normally this method is chosen when more attention is paid to less storage costs and computational complexity.

This tradeoff between the memory required and the computational cost is one of the features that have attracted interest in classified codebook vector quantization.

2 is a diagram illustrating LSF 1-3 subvector quantization by a general vector quantization method.

After finding the central codeword, the distance between the input subvector and the code vector with the index around the central index is determined by extensive internal search.

By suggesting an example, this step of the quantization method is clarified.

Assuming that the LSF 1-3 auxiliary vectors are quantized by the vector quantization method, the codebook is equal to 256 (N = 256).

The central index for the input subvector is 89 and also assumes that the offset value k is eight.

In this example, codewords having indices in the range of 82 to 97 will be designated for extensive searching.

Therefore, the 16 auxiliary vectors (2 * k) will find their value at this stage. And the last codeword describing the quantization vector is selected therefrom by the code vector selection block.

For example, in the present example the codeword associated with 93 is selected.

The weighted distortion measurement can utilize both training and coding steps in the codebook by vector quantization.

Firstly, there are two steps that can be used to define the sort function and secondly, the last extensive internal search.

Clearly, in order to achieve the smallest distortion, the weights must be considered in both steps. Because the use of the distortion measure of the weight increases the load of the computational amount of the vector quantization algorithm.

In order to achieve sufficient performance, it was determined for each sub vector. Regardless of the central index, 2 * k codewords were always searched in the last step.

This method for an input subvector with a central index such as k finds the values of the first 2 * k codewords.

In the same way, for the input subvector with a central index greater than N-k, the last 2 * k codevectors are searched.

This technique will be called the same navigation opportunity.

Vector quantization of image signals can be considered as a specific example of classified codebook vector quantization, where a dedicated alignment function is used, and the method is not used for a new type of vector quantization but only for fast codebook search.

Nevertheless, the study demonstrates that vector quantization can be considered as a powerful technique. It can be used to apply differently as a vector quantization method or a fast codebook search method.

Obviously in the current realization of the vector quantization method, for three groups of LSFs, it is described next if three codebooks should train it with a suitable method.

Codebook training for ordered codebook vector quantization has some similarities as well as some differences with codebook training for other vector quantization methods.

The main point of discrimination in the training phase is the presence of alignment parameters that play an important role in this quantization method.

This problem is explained in detail. Then, the optimization method is also provided.

Training is, in principle, a classified codebook vector quantization method that can be used without any further training, using a unique codebook that is not systematically optimized or a systematic codebook.

Therefore, the vector quantization can be considered as vector quantization which is not systematically organized from the standpoint.

Although it is possible to apply any sorting function for quantization of LSFs, which was mentioned earlier, the choice of this function directly affects the performance of the quantizer.

Unfortunately, no analytic solution has been found to calculate the optimal alignment function for common or special vector quantization cases.

However, the statistical special method proposed here represents a good solution.

This method is based on finding the value of the correlation coefficient between an arbitrary function and the elements of the input vector.

The relation coefficient between two time series 'x' and 'y' is defined as shown in [Equation 3].

[Equation 3]

Figure 112008068909093-PAT00034

Where Cov (x, y) is the covariance between the two time series.

Choosing the following procedure among some subjects applies to the appropriate sort function.

Consider a simple example for simplicity. But it's a two-dimensional input vector

Figure 112008068909093-PAT00035
Is quantized.

If the vector "V" is estimated to be time-changing, the elements of this vector ('v1' and 'v2') can be seen in two time series.

In the present example, the vatiables 'v1' and 'v2' have a uniform distribution in the range 0 to 1 and are random. here,

Figure 112008068909093-PAT00036
to be.

The combined distribution of 'v1' and 'v2' is shown in FIG. 3A.

However, the points represent the code vector of the codebook assuming that they have 256 codewords, and represent the input vector.

If all searches are performed by the codebook, the codeword with a circle represents the selected codevector.

Now consider the possible sorting functions for this case.

The three functions (g1, g2 and g3) selected in this example define "x, y, z". The "x, y, z" means the same alignment parameter.

[Equation 4]

Figure 112008068909093-PAT00037

The use of these three sorting functions will occur at different central codewords and select neighbors.

As shown in Figs. 3b to 3d, the codebook and the input vector coincide with the central codeword and part of the neighborhood for the function of aligning in sequence from 'g1' to 'g3'.

The central codeword is represented by a circle, and the codevectors within the two boundaries represent the codevector in the neighborhood of the central codeword.

2, which is proposed in FIG. 2, is merely a representative embodiment for demonstrating a relationship between an alignment function, a central codeword and a neighboring codevector.

There is no attempt to use the training codebook in this example or to fix the offset value.

The choice of sort function is correlated and examined for correlation.

The problem raised here is how to choose between the candidate function 'k' and the sort function.

Training database

Figure 112008068909093-PAT00038
Assuming this, we can use the 'L' vector and each 'P' range. But that's to choose the sort function.

As a first step, each element of the training vector is assumed to be a separate time series. And that is the 'P' time series for the training database.

For example, the pth time series

Figure 112008068909093-PAT00039
Is made by gathering the 'p'-th components of the training vector into a vector.

In the second step, the correlation algorithm uses the first nominal function to compute the alignment parameters for all training vectors.

So a vector or time series with the same length as the training set (L)

Figure 112008068909093-PAT00040
. This step is repeated for all nominal functions.

3A to 3D are diagrams showing examples of using different alignment functions in a planar vector space.

In the next step, correlation factors for all functions are calculated. This vector for the 'k' candidate function is defined by equation (5).

[Equation 5]

Figure 112008068909093-PAT00041

here

Figure 112008068909093-PAT00042
Is
Figure 112008068909093-PAT00043
Wow It shows the correlation coefficient between.

Finally, the algorithm selects the function that provides the maximum correlation vector.

The selected sort function is used for the calculation of the sort parameters associated with each codeword in the codebook. Finally, the codevectors are sorted in ascending order of their alignment parameters.

The training database (10240 vectors) has been replaced by the training codebook (256 vectors). And the same sort function is chosen.

Since this is the last interpretation of the correlation algorithm, a correlation algorithm that does not guarantee an optimal alignment function is shown.

It is also not clear that it provides execution with alignment parameters that have a higher correlation vector.

This can be verified by a simple test.

For other LSF 1-3 subvector values at other values of the correlation vector, the alignment function aligns as shown in [Equation 6]. However, since the arrangement of codewords in the codebook is the same, it is equal to the value of vector quantization quantization.

&Quot; (6) &quot;

Figure 112008068909093-PAT00045

Correlation vectors occur simply because they are not just dependencies of alignment parameters, as well as training vectors.

● Optimization

When it is emphasized early on, the codebook for VQ that can be trained by any arbitrary training method is like a general Lloyd algorithm or a training algorithm that provides an overall optimized codebook.

However, quantization by ordered codebook vector quantization does not mean providing minimum distortion using such codebook.

Nevertheless, if a suitable optimal technique is applied, the appreciator distortion resulting from quantization by the vector quantization method can be reduced.

The following optimization algorithm is proposed by the preparation of another codebook to improve the performance of the vector quantization method.

The original codebook is selected and used for optimization.

This may be a codebook trained by conventional training algorithms.

The alignment parameters are calculated for each codeword by using an alignment function. The codebook is then rearranged in ascending order of the sort parameters.

Then all codevectors are quantized with a codebook prepared by the vector quantization method in the training setup.

This step classifies the entire training database and creates a partitioned setup that connects the clusters. It contains a subset of the training database in each codeword.

The codewords are then reconnected by the center of the training vectors in the same cluster, in which a new codebook is prepared.

All these orders are repeated until less than a certain threshold or until the codevectors converge to the final state, and then until the next algorithm ends.

That is, the progress of coding and decoding is repeated until the final condition is satisfied, and will be described with reference to FIG. 4.

This configuration is for optimizing codebooks.

First, the initialization unit 100 is the initial codebook

Figure 112008068909093-PAT00046
Set to.

And the alignment unit 200 calculates the alignment parameter,

Figure 112008068909093-PAT00047
Sort the code vector of the codebook.

In addition, the classification unit 300 uses the vector quantization of the entire training database.

Figure 112008068909093-PAT00048
Classify as codebook.

Subsequently, the distortion determination unit 400 determines whether or not the total distortion level is less than or equal to the threshold.

If the total distortion is less than or equal to the threshold, the codebook application unit 500 terminates, and if not less than or equal to the threshold, sets m + 1 to m.

Figure 112008068909093-PAT00049
During the codebook calculation classification, the median of the training vector assigned to Cn is replaced with the median and then provided to the classification unit 300.

In addition, the distortion determination unit 400 determines the initial codebook C after compressing the entire training database by the selected quantization table.

The distortion determiner 400 sets a lookup table that maps each codeword to itself.

Thereafter, the distortion determination unit 400 removes the codeword associated with the combinations prohibited in the codebook, and sets the resulting codebook size to n and n = 1.

Subsequently, the distortion determination unit 400 may determine

Figure 112008068909093-PAT00050
Temporarily remove the code vector.

In addition, the distortion determination unit 400 replaces the maps in the code table.

The distortion determiner 400 changes the compressed training set to a new lookup table.

On the other hand, the distortion determining unit 400 evaluates dm (total distortion between the original and the quantized training database) to determine whether n <N. If not n <N, m = index {min (dm / 1 = <n < N)}, and in the codebook

Figure 112008068909093-PAT00051
Removes the
Figure 112008068909093-PAT00052
Updates the cells in the lookup table mapped to to the nearest code vector, applies the lookup table by changing the compression value of the associated training set to a new value, and determines whether N is the size required by the codebook. If it is the size required by, exit.

On the other hand, the distortion determining unit 400 evaluates the distortion, and if n <N, the lookup table is used.

Figure 112008068909093-PAT00053
Restore the code vector and its original map, set n + 1 to n, and provide it as a temporary remover.

On the other hand, if the N is not the size required by the codebook in the termination test, the distortion determination unit 400 sets N-1 to N, sets n = 1, and provides the temporary removal unit.

The distortion determining unit 400 uses a distortion evaluation algorithm.

Next, a method for optimizing a speech feature codebook based on vector quantization according to an embodiment of the present invention having the above configuration will be described with reference to FIG. 5.

First, let's add the initial codebook

Figure 112008068909093-PAT00054
Set to (S100).

Then calculate the alignment parameters,

Figure 112008068909093-PAT00055
The code vectors of the codebook are aligned (S200).

Afterwards, the entire training database using vector quantization

Figure 112008068909093-PAT00056
The codebook is classified (S300).

Next, it is determined whether or not the total distortion level is less than or equal to the threshold (S400).

If the total distortion level is less than or equal to the threshold (YES) in the distortion determination step (S400), the process is terminated while m + 1 is set to m if not less than or equal to the threshold (NO)

Figure 112008068909093-PAT00057
After replacing the codebook with the median of the training vectors assigned to Cn during classification, the process proceeds to the classification step S300 (S500).

A detailed process of the distortion determination step S400 will be described with reference to FIG. 6.

First, after compressing the entire training database by the selected quantization table, an initial codebook C is determined, and a lookup table that maps each codeword by itself is set to T (S301).

Subsequently, the codeword associated with the forbidden combinations in the codebook is removed, and the resulting codebook size is set to N and n = 1 (S302).

Then, in the codebook

Figure 112008068909093-PAT00058
The code vector is temporarily removed and replaced with maps in the code table (S303).

Subsequently, the compressed training set is changed to a new lookup table, and dm (total distortion between the original and the quantized training database) is evaluated to determine whether n <N (S304).

If n <N is not (NO) in the distortion evaluation step (S304), m = index {min (dm / 1 = <n <N)} is searched for, and in the codebook

Figure 112008068909093-PAT00059
Removes the
Figure 112008068909093-PAT00060
The cells in the lookup table mapped to are updated to the nearest code vector, and the compression value of the associated training set is changed to the new value and the lookup table is applied (S305).

Next, it is determined whether N is the size required by the codebook, and if N is the size required by the codebook (YES), the process ends (S306).

On the other hand, if n> N (YES) in the distortion evaluation step (S304), the lookup table

Figure 112008068909093-PAT00061
The code vector and the original map are restored, and n + 1 is set to n, and then the temporary removal step is performed (S307).

On the other hand, if N is not the size required by the codebook in the end test step (S306) (no), N-1 is set to N, and then set to n = 1 (S308).

Although the present invention has been described in detail only with respect to the specific embodiments described, it will be apparent to those skilled in the art that various changes and modifications can be made within the scope of the present invention, and such modifications and modifications belong to the appended claims. .

1 illustrates an LSF based method for a conventional speech codec.

2 illustrates LSF 1-3 subvector quantization by a general vector quantization method.

3A to 3D show examples of using different alignment functions in planar vector space.

 4 is a functional block diagram illustrating a configuration of a speech feature codebook optimization apparatus based on vector quantization according to the present invention.

5 is a flowchart illustrating a method for optimizing speech feature codebook based on vector quantization according to the present invention.

FIG. 6 is a flowchart illustrating a detailed process of step S300 in the vector quantization-based speech feature codebook optimization method of FIG. 5.

<Explanation of symbols for the main parts of the drawings>

100: initialization unit 200: alignment unit

300: classification unit 400: distortion determination unit

500: codebook application unit

Claims (10)

A speech feature codebook optimization method based on vector quantization for server-based distributed speech recognition, The initial codebook
Figure 112008068909093-PAT00062
An initialization step of setting to;
Calculate the alignment parameters,
Figure 112008068909093-PAT00063
A sorting step of sorting code vectors of the codebook;
Full training database using vector quantization
Figure 112008068909093-PAT00064
A classification step of classifying into a codebook;
A distortion determination step of determining whether or not the total distortion level is less than or equal to the threshold; And If the total distortion level is less than or equal to the threshold in the distortion determination step, and if not less than the threshold set m + 1 to m
Figure 112008068909093-PAT00065
A method for optimizing speech feature codebook based on vector quantization comprising replacing a codebook with a median of a training vector assigned to Cn during classification and then proceeding to classification.
The method of claim 1, The distortion determination step, A table initialization step of compressing the entire training database by the selected quantization table, determining an initial codebook C, and setting a lookup table that maps each codeword to itself; An initial removal step of removing the codeword associated with the forbidden combinations in the codebook and setting the resulting codebook size to N and n = 1; In the codebook
Figure 112008068909093-PAT00066
Temporarily removing the code vector and replacing it with maps in the code table;
A distortion evaluation step of changing the compressed training set to a new lookup table and evaluating dm (total distortion between original and quantized training database) to determine if n <N; In the distortion evaluation step, if n <N, search for m = index {min (dm / 1 = <n <N)}, and in the codebook
Figure 112008068909093-PAT00067
Removes the
Figure 112008068909093-PAT00068
A detailed codebook applying step of updating the cells in the lookup table mapped to to the nearest code vector, simultaneously changing the compression value of the associated training set to a new value, and applying the lookup table; And
And a termination test step of determining whether N is a size required by the codebook and ending if N is a size required by the codebook.
3. The method of claim 2, If n <N in the distortion evaluation step, in the lookup table
Figure 112008068909093-PAT00069
And a first setting step of restoring a code vector and a corresponding original map, setting n + 1 to n, and then proceeding to a temporal removal step.
3. The method of claim 2, If N is not the size required by the codebook in the termination test step, the voice based on vector quantization further includes a second setting step of setting N-1 to N, setting n = 1, and then going to a temporary elimination step. Feature codebook optimization method. The method of claim 1, The distortion determination step, A speech feature codebook optimization method based on vector quantization characterized by using a distortion evaluation algorithm. A speech feature codebook optimization apparatus based on vector quantization for server-based distributed speech recognition, The initial codebook
Figure 112008068909093-PAT00070
An initialization unit set to be;
Calculate the alignment parameters,
Figure 112008068909093-PAT00071
An alignment unit for arranging code vectors of the codebook;
Full training database using vector quantization
Figure 112008068909093-PAT00072
A classification unit classified into a codebook;
A distortion determining unit that determines whether or not the total distortion level is less than or equal to the threshold value; And If the total distortion level is less than or equal to the threshold in the distortion determination step, and if not less than the threshold set m + 1 to m
Figure 112008068909093-PAT00073
A speech feature codebook optimization device based on vector quantization comprising a codebook application unit for replacing a codebook with a median of a training vector assigned to Cn during classification and then proceeding to a classification step.
The method of claim 6, The distortion determination unit, After compressing the entire training database by the selected quantization table, determine the initial codebook C, set the lookup table to map each codeword to itself, and Remove the codeword associated with the forbidden combinations in the codebook, and the resulting codebook size is set to N, n = 1, In the codebook
Figure 112008068909093-PAT00074
Temporarily remove the code vector, replace it with the maps in the code table,
Change the compressed training set to a new lookup table and evaluate dm (total distortion between the original and quantized training databases) to determine if n <N, if not n <N, then m = index {min (dm / 1 = <n <N)}, and in the codebook
Figure 112008068909093-PAT00075
Removes the
Figure 112008068909093-PAT00076
Update the cells in the lookup table that was mapped to to the nearest code vector, replace the compression value of the associated training set with the new value, apply the lookup table,
A speech feature codebook optimizing apparatus based on vector quantization, characterized in that it is determined whether N is a size required by the codebook and terminates when N is a size required by the codebook.
The method of claim 6, The distortion determining unit, If the distortion is evaluated and n <N, then the lookup table
Figure 112008068909093-PAT00077
A speech feature codebook optimizer based on vector quantization, restoring a code vector and a corresponding original map, setting n + 1 to n, and then performing a temporary elimination step.
The method of claim 7, wherein The distortion determination unit, In the termination test, if N is not the size required by the codebook, N-1 is set to N, and n = 1 is set, and then the temporal removal step is performed. The method of claim 6, The distortion determination unit, A speech feature codebook optimization device based on vector quantization, characterized by using a distortion evaluation algorithm.
KR1020080096316A 2008-09-30 2008-09-30 Voice feature code book optimization device and method of vector quantization base KR20100036894A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020080096316A KR20100036894A (en) 2008-09-30 2008-09-30 Voice feature code book optimization device and method of vector quantization base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020080096316A KR20100036894A (en) 2008-09-30 2008-09-30 Voice feature code book optimization device and method of vector quantization base

Publications (1)

Publication Number Publication Date
KR20100036894A true KR20100036894A (en) 2010-04-08

Family

ID=42214336

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020080096316A KR20100036894A (en) 2008-09-30 2008-09-30 Voice feature code book optimization device and method of vector quantization base

Country Status (1)

Country Link
KR (1) KR20100036894A (en)

Similar Documents

Publication Publication Date Title
Wang et al. Combination of hyperband and Bayesian optimization for hyperparameter optimization in deep learning
CN106886599B (en) Image retrieval method and device
EP0555017B1 (en) Geometric vector quantization
Chang et al. A fast LBG codebook training algorithm for vector quantization
KR101152707B1 (en) Multi-stage quantizing method and device
WO2007124485A2 (en) Method and apparatus for audio transcoding
KR101958939B1 (en) Method for encoding based on mixture of vector quantization and nearest neighbor search using thereof
CN112994701A (en) Data compression method and device, electronic equipment and computer readable medium
EP0610906B1 (en) Device for encoding speech spectrum parameters with a smallest possible number of bits
CN112598117A (en) Neural network model design method, deployment method, electronic device and storage medium
US8335260B2 (en) Method and device for vector quantization
Verma et al. A" Network Pruning Network''Approach to Deep Model Compression
KR20100036894A (en) Voice feature code book optimization device and method of vector quantization base
Peter et al. Resource-efficient dnns for keyword spotting using neural architecture search and quantization
CN112418298B (en) Data retrieval method, device and computer readable storage medium
Cao et al. A fast search algorithm for vector quantization using a directed graph
JP3093879B2 (en) Vector quantization codebook creation and search device
JP3093868B2 (en) Vector quantization codebook creation device
CN103329198B (en) Low complexity target vector identification
CN113252586B (en) Hyperspectral image reconstruction method, terminal equipment and computer readable storage medium
KR100613106B1 (en) An index assignment method for coding based on Tree-Structured Vector Quantization
Tsai et al. Fast VQ codebook generation method using codeword stability check and finite state concept
JP3977753B2 (en) Information source encoding method, information source encoding device, information source encoding program, and recording medium on which information source encoding program is recorded
CN117217217A (en) Text generation method, device, electronic equipment and storage medium
CN113011595A (en) Feature combination method, device, equipment and storage medium

Legal Events

Date Code Title Description
WITN Withdrawal due to no request for examination