WO2016108722A1

WO2016108722A1 - Method to restore the vocal tract configuration

Info

Publication number: WO2016108722A1
Application number: PCT/RU2015/000198
Authority: WO
Inventors: Ilja Sergeevich MAKAROV
Original assignee: Obshestvo S Ogranichennoj Otvetstvennostyu "Integrirovannye Biometricheskie Reshenija I Sistemy"
Priority date: 2014-12-30
Filing date: 2015-03-30
Publication date: 2016-07-07

Abstract

This invention pertains to the automatic processing of the human voice and can be used in different applications of the speech technologies. Technical result of this invention is improvement of the accuracy of articulation restoration and faster processing of the data during restoration of the vocal tract configuration. The method of restoration of the vocal tract configuration includes the following steps: preliminary processing of the audio signal; determination of the vector of acoustic characteristics for vowels and vowel-like segments; determination of the similar vectors of the acoustic characteristics using articulatory code book; determination of the function of areas and lengths of the cylinder tubes of the vectors of acoustic characteristics; determination of the vocal tract configuration based on the functions of the areas and lengths of the cylinder tubes approximating the vocal tract.

Description

METHOD TO RESTORE THE VOCAL TRACT CONFIGURATION

TECHNICAL FIELD

This invention pertains to the automatic processing of human voice and can be used in different speech technology applications including the areas related to the following tasks: automatic correction of the pronunciation in the foreign language training systems or rehabilitation of different voice and hearing disorders, automatic speech recognition, automatic personal identification and verification based on voice, automatic speech synthesis based on random text, speech coding in mobile communication and VoIP systems.

BACKGROUND

The problem of automatic restoration of the vocal tract configuration using only the acoustic human voice record is called in the specialized literature a speech inverse problem. Speech inverse problem is more accurately formulated as the problem of finding the form of the vocal tract or articulation parameters, or function of the cross section area, or articulation control based on measured acoustic parameters of the speech signal.

The prior art has a known method based on the usage of so-called sensitivity functions, which allows for determination of the function of the cross section area by an iterative procedure of minimization of the discrepancy between measured resonance frequencies and resonance frequencies of the articulatory model (B. Story, Technique for "tuning" vocal tract area functions based on acoustic sensitivity functions // J. Acoust. Soc. Am. 119 (2), February 2006. P. 715-718; S. Adachi, H. Takemoto, T. Kitamura, P. Mokhtari, and K. Honda, Vocal tract length perturbation and its application to male-female vocal tract shape conversion // J. Acoust. Soc. Am. 121 (6), June 2007. P. 3874-3885). This method has the following drawbacks: firstly, this method uses only resonance frequencies of the vocal tract as acoustic parameters. Unfortunately, automatic determination of the resonance frequencies of the tract is a very difficult and non-trivial problem. We do not know any generalizations of this method for other acoustic parameters. Secondly, for launching the iteration process this method uses the same cross- section area function for different sounds. In many cases it results in the need of large number of iterations for achievement of the required accuracy, which in its turn significantly increases processing time.

There is also the known method of regularization which is based on minimization of the discrepancy between parameters of measured acoustic signal and parameters calculated using mathematical models of articulation and acoustics, as well as some additional stabilizing functional (J. Schroeter, M. M. Sondhi, Techniques for estimating vocal tract shapes from the speech signal. IEEE Trans. On Speech and Audio Processing. 1994. Vol. 2. No 1 , Pt. 2. P. 133-150; V. Sorokin, A. Leonov, A. Trushkin, Estimation of stability and accuracy of inverse problem solution for the vocal tract // Speech Communication. Vol. 30. No 1. 2000. P. 55- 74). A special database is used as initial approximation for minimization - so called articulatory code book containing numerous configurations of the vocal tract and corresponding acoustic parameters (J. Schroeter, M. M. Sondhi, Techniques for estimating vocal tract shapes from the speech signal. IEEE Trans. On Speech and Audio Processing. 1994. Vol. 2. No. 1, Pt. 2. P. 133-150). The main drawback of this approach is very significant computing and, as a consequence, time costs: processing speed is around dozens of seconds and even minutes per one second of the speech. At that, the key factors which significantly increase processing time, are as follows: 1) necessity of non-linear minimization with non-linear equality and nonequality constraints, 2) necessity of launching the non-linear minimization from different initial approximations from articulatory code book.

CONCEPT OF INVENTION

This invention is aimed for elimination of the drawbacks of the existing solutions. Technical result of this invention is improvement of the accuracy of the articulation restoration and faster processing of the data during restoration of the vocal tract configuration.

The said technical result is achieved through the use of the articulatory code book containing (unlike the regularization method) not only vocal tract configuration, but also corresponding functions of the cross-section areas (it makes possible to use different cross-section area functions as the initial approximation for different sounds (not only one cross-section area function like in sensitivity function method); as result it significantly reduces number of iterations and considerably improves accuracy of the solution); use as algorithm of minimization of the acoustic parameters discrepancy of the method allowing (unlike the standard sensitivity function method) for using not only the resonance frequencies, but also any standard voice technology parameters describing the acoustic spectrum of sounds (unlike the regularization method this method is less computing intensive and requires less time for processing); use of additional automatic algorithms for pre-processing of the speech signal (noise filtration, speech-nonspeech detection, identification of the boundaries of vowels and vowel-like sounds, etc.) which ensures resolving the speech inverse problem in automatic mode.

The method of restoration of vocal tract includes the following steps: preliminary processing of the audio signal; determination of the vector of acoustic characteristics for vowels and vowel-like segments; determination of the similar vectors of the acoustic characteristics using articulatory code book; determination of the function of areas and lengths of the cylinder tubes of the vectors of acoustic characteristics; determination of the vocal tract configuration based on the functions of the areas and lengths of the cylinder tubes approximating the vocal tract.

The steps of the vocal tract configuration restoration method can be performed in the cyclic manner. Preliminary processing of the audio signal can including noise filtration, extraction of the speech segments from pauses; delineation of the boundaries of the sounds, selection of vowels and vowel-like segments.

Any known articulatory code books can be used as code book.

It is also possible to create an own articulatory code book using any of the known methods.

Resonance frequencies of the vocal tract can be used for determination of the functions of the areas and lengths using initial approximations as acoustic characteristics.

Parameters describing short-time amplitude-frequency spectrum of the speech signal can be used as acoustic characteristics for determination of the functions of the areas and lengths based on initial approximations.

For determination of the configuration of the vocal tract any known algorithm of conversion of the functions of the areas and lengths into corresponding vocal tract configuration based on initial approximations can be used.

This invention can be realized in the form of the vocal tract configuration restoration system including: one or more command processing devices, one or more data storage devices, one ore more programs, where one or more programs are stored in one or more data storage devices and executed on one or more processors while one or more program include the following functions: preliminary processing of the audio signal; determination of the vector of acoustic characteristics for vowels and vowel-like segments; determination of the similar vectors of the acoustic characteristics using articulatory code book; determination of the function of areas and lengths of the cylinder tubes of the vectors of acoustic characteristics; determination of the vocal tract configuration based on the functions of the areas and lengths of the cylinder tubes approximating the vocal tract. The vocal tract configuration restoration method can be performed in the cyclic manner.

Preliminary processing of the audio signal can including noise filtration, speech- nonspeech detection; segmentation of speech into sounds, selection of vowels and vowel-like segments.

Any known articulatory code books can be used as code book.

Parameters describing short-time amplitude-frequency spectrum of the speech signal can be used as acoustic characteristics for determination of the functions of the areas and lengths using initial approximations. For determination of the vocal tract configuration any known algorithm of conversion of the functions of the areas and lengths into corresponding vocal tract configuration based on initial approximations can be used.

BRIEF DESCRIPTION OF DRAWINGS

Fig.l - Diagram of one of the options of the method of the vocal tract configuration restoration.

Fig. 2 - Acoustic wave plot of word "seed".

Fig. 3 - Speech wave plot of word "seed" after additive noise filtration.

Fig. 4 - Result of segmentation of the speech wave into "pause-speech-pause" sections. Vertical lines show boundaries between pause and speech.

Fig. 5 - Results of automatic determination of boundaries of vowel sound in word

"seed". Start and end of vowel are shown by vertical lines. Fig. 6 -Plot of dynamic spectrum of vowel in word "seed" with resonance frequency values marked by white asterisks.

Fig. 7 - Configuration of the vocal tract from articulatory code book, corresponding distribution of the area of cross-section and acoustic spectrum.

Fig. 8 - Initial distribution of the areas of cross-sections SO (top to bottom) and distribution of the areas S^opt, calculated using developed algorithm (bottom to top).

Fig. 9 - Initial configuration of the vocal tract (dashed line) and configuration computed by the developed algorithm (solid line). Both configurations correspond to distribution of the cross-section areas shown on Fig. 8.

DETAILED DESCIPTION OF INVENTION

This invention in its different variants can be implemented as a computer method, in the form of a system or a machine-readable medium containing instructions for using the said method. The invention can be realized as a distributed computer system.

In this invention the system means a computer system, PC (personal computer), CNC (computer numeric control), PLC (programmable logic controller), computerized control systems and any other devices that can perform defined, clearly determined sequence of operations (actions, instructions). Command processing device means an electronic unit or integral circuit (microprocessor) that executes machine instructions (programs).

Command processing device reads and executes machine instructions (programs) from one or more data storage devices. Data storage devices include but are not limited to hard drives (HDD), flash memory, ROM (read-only memory), solid- state drives (SSD), optic drives.

Program means a sequence of instructions intended for execution by computer control device or command processing devices. Some terms which will be further used for description of the invention are reviewed below.

Articulation is work of separate articulatory organs for production of sounds of speech. All active pronouncing organs are engaged in the pronunciation of any speech sound. The position of these organs required for this sound creation forms its articulation, separability of sounds, their clearness .

Approximation is a scientific method comprising substitution of one objects by other objects similar to some extent, but more simple.

Approximation allows for studying numeric characteristics and quality of object reducing the task to studying more simple and convenient objects (e.g. such as characteristics which are easily calculated or whose properties are known).

For improvement of the accuracy of articulation restoration and for reduction of processing time during resolving the speech inverse problem, the method of restoration of vocal tract configuration is proposed that includes the following steps: preliminary processing of the audio signal; determination of the vector of acoustic characteristics for vowels and vowel-like segments; determination of the similar vectors of the acoustic characteristics using articulatory code book; determination of the function of areas and lengths of the cylinder tubes of the vectors of acoustic characteristics; determination of the vocal tract configuration based on the functions of the areas and lengths of the cylinder tubes approximating the vocal tract.

Fig. l shows a diagram of one of the options of the method of the vocal tract configuration restoration.

The input signal is digitized voice record of a random person speaking any language. Input signal sampling rate should be at least 8,000 Hz and minimum quantization level should be 8 bit/sample. The person can pronounce a random speech material (speech material means separate sounds, combinations of sounds, words, phrases, texts in this language. Non-speech sounds like coughing, breathing, chirrup, etc. are not materials). Digitized record can have any acceptable sound format such as wav, mpeg, mp4, etc. Any voice recorder can be used, e.g. microphone, voice recorder, telephone, video camera, etc. The record of English word "seed' pronounced by a male is used here as an example. Fig. 2 shows a chart of acoustic wave of this word (oscillogram).

Preliminary processing of audio signal.

At the first stage digitized human voice record is filtered for removal of noise and distortions (additive noise, distortions, induced by communication channel, reverberations, etc.). Any noise and distortion reduction algorithms can be used as filtration algorithms (for example, algorithms described in S. Vaseghi, Advanced Digital Signal Processing and Noise Reduction, 2nd ed. John Wiley & Sons, Ltd, 2000). Fig. 3 shows a charts of speech wave of word seed after filtration of external noise using spectral subtraction algorithm, see. S. Vaseghi, Chapter 1 1, P. 333-352, Advanced Digital Signal Processing and Noise Reduction, 2nd ed. John Wiley & Sons, Ltd, 2000).

Then digitized voice record filtered from noise is analysed by automatic speech- nonspeech detection algorithm, which determines the boundaries of the beginning and end of all pauses inside digitized record (the pause means any section of digitized record when the person is silent). Any speech-nonspeech detection algorithm described in the international literature can be used (for example, algorithm described in Q. Li, J. Zheng, A. Tsai, and Q. Zhou, Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition, IEEE Transactions on Speech and Audio Process., vol. 10, No. 3, 2002, P. 146-157). Fig. 4 shows as an example the result of the application of pause detection algorithm described in Q. Li, J. Zheng, A. Tsai, and Q. Zhou, Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition, IEEE Transactions on Speech and Audio Process., vol. 10, No. 3, 2002, P. 146-157, for word seed. At that, vertical straight lines show boundaries separating the pause from the speech segment. Further processing is performed in the sections which do not correspond to pauses (i.e. only in the speech segments). Start and end boundaries are defined for each speech segment in the automatic mode. Any algorithm of automatic speech segmentation can be used for automatic detection of the boundaries of sounds (for example, Dynamic Time Warping, DTW, described, for example, in L. Rabiner, A. Rosenberg, J. Wilpon, and T. Zampini, A bootstrapping training technique for obtaining demisyllable reference patterns, J. Acoust. Soc. Amer. 71 (6), June 1982, P. 1588-1595 or the algorithm based on Hidden Markov Models, HMM, described, for example, in F. Brugnara, D. Falavigna, and M. Omologo, Automatic segmentation and labeling of speech based on Hidden Markov Models, Speech Communication 12 (1993), P. 357-370). For further analysis the segments of the digitized record are selected, which correspond to vowel and vowel-like sounds (definition of terms "vowel" and "vowel-like sound" is provided, for example, in monograph P. Ladefoged, I. Maddieson, The Sounds of the World's Languages. Wiley-Blackwell. 1996). Fig. 5 shows as an example the boundaries of vowel in the word "seed".

Determination of the vectors of acoustic characteristics for vowels and vowellike segments.

Acoustic characteristics of each vowel or vowel-like sound in the automatic mode are determined using short-term analysis, see L. Rabiner, R. Schafer, Digital Processing of Speech Signals. Prentice-Hall, Inc. 1976). In the short-term analysis the acoustic characteristics are calculated in the moving time analysis window with duration from 15 msec to 40 msec with offset of over 1 msec. The window has random shape (namely, different popular windows can be used such as Hamming window). Resonance frequencies of vocal tract or any other parameters describing short-term amplitude-frequency spectrum of speech signal can be used as acoustic characteristics (for example, Fast Fourier transform (FFT), linear predictive coding (LPC), mel-frequency cepstral coefficients (MFCC), etc.). If resonance frequencies of vocal tract are used as acoustic characteristics, then to define them it is possible to use any algorithm of automatic evaluation of resonance frequencies described in the international literature for their detection (for example, method based on linear prediction of speech described in J. Markel, A. Gray, Linear Prediction of Speech. Springer- Verlag. 1976). If parameters describing the short- term amplitude-frequency spectrum of speech signal are used as acoustic characteristics, then they can be defined using any algorithm described in the international sources (for example, see X. Huang, A. Acero, H.-W. Hon, Spoken language processing: a guide to theory, algorithm, and system development. Prentice-Hall, Inc. 2001). Fig. 6 shows an example of chart of dynamic spectrum (sonogram) of vowel in word seed with the values of the first three resonance frequencies computed using the algorithms described in J. Markel, A. Gray, Linear Prediction of Speech. Springer- Verlag. 1976. Further, the acoustic characteristics defined in N successive time windows of the analysis will designated as {a^ a^), here a, is a set (vector) of acoustic characteristics computed in time window.

The most similar vectors of acoustic characteristics are defined using articulatory code book.

For further analysis the articulatory code book is loaded from ROM, i.e. special database containing lots of pairs: configuration of the model of vocal tract - acoustic characteristics corresponding to the current configuration of the vocal tract, e.g. model which is based on the fact that one of the sound generation sources is a voice source which is produced during oscillation of the vocal cords, this source participates in the generation of several groups of sounds and in terms of the participation of this voice source the sounds are divided into vowels and consonants. Already existing code books can be used as articulatory code book (for example, see J. Schroeter, M. Sondhi, Techniques for estimating vocal tract shapes from the speech signal. IEEE Trans. On Speech and Audio Processing. 1994. Vol. 2. No. 1, Pt. 2. P. 133-150). It is also possible to develop specific articulatory code book using the methods described in the literature (e.g. method of development of articulatory code books described in J. Schroeter, M. Sondhi, Techniques for estimating vocal tract shapes from the speech signal. IEEE Trans. On Speech and Audio Processing. 1994. Vol. 2. No. 1, Pt. 2. P. 133-150).

For further analysis each such configuration of the vocal tract from articulatory code book should be approximated by the sequence of cylinder tubes of different length and variable cross-section area. Any algorithms described in the international literature can be used for such approximation (e.g. algorithm developed in P. Badin, I.S. Makarov, V.N. Sorokin, Algorithm for calculating the cross-section areas of the vocal tract // Acoustical Physics. Vol. 51. No. 1. 2005. P. 38-43). Fig. 7 as an example shows some configuration of the vocal tract, corresponding distribution of the cross-section areas of the approximating cylinder tubes and corresponding acoustic spectrum. In further description the articulatory code book will be designated as {a£, S£, l^j, k = 1 , .. M, where M is total number of vectors in the articulatory code book, S£ - k^th function of distribution of cross- section areas of cylinder tubes in the code books, l^ - k function of the distribution of the lengths of cylinder tubes in the code book a£ is a vector of acoustic characteristics corresponding to these functions of distribution of cross-section areas and lengths of cylinder tubes.

For each vector of acoustic characteristics a_h calculated for vowel or vowel-like segment of digitized speech the most similar vector of acoustic characteristics a is selected from the articulatory code book. Any metrics can be used as a measure of similarity (e.g. Euclidean). Functions of distribution of the cross-section area S% and length 1 of cylinder tubes corresponding to o , are used as the first approximations for further calculations.

Determination of the functions of the cross-section areas and lengths of cylinder tubes of vectors of acoustic characteristics.

Further calculations significantly depend on the acoustic characteristics calculated using input digitized acoustic signal. If vocal tract resonance frequencies are used as acoustic characteristics the automatic algorithm using S H i as the first approximations iteratively changes the cross section area and length of each cylinder tube so that to reduce the distance in the Euclidean metrics between the measured resonance frequencies and resonance frequencies calculated based on the current distribution of the areas and lengths of the cylinder tubes. Any methods described in the international literature can be used as algorithm of iterative modification of the areas and lengths (e.g., algorithms described in B. Story, Technique for "tuning" vocal tract area functions based on acoustic sensitivity functions // J. Acoust. Soc. Am. 1 19 (2), February 2006. P. 715-718; S. Adachi, H. Takemoto, T. Kitamura, P. Mokhtari, and . Honda, Vocal tract length perturbation and its application to male-female vocal tract shape conversion // J. Acoust. Soc. Am. 121 (6), June 2007. P. 3874-3885). Any algorithms described in the literature can be used as algorithms of generation of resonance frequencies of the tract based on the current distribution of the cross-section areas and lengths of cylinder tubes (e.g., algorithm described in I.S. Makarov, Approximating the vocal tract by conical horns // Acoustical Physics, 2009, vol. 55. No 2. P. 261-269). The result of the algorithm is such distribution of the cross-section areas and lengths of the cylinder tubes {S^opt, l^opt}, which generates the resonance frequencies which are least different in the Euclidean metrics from the resonance frequencies evaluated using vocal digitized signal.

If the parameters describing the short-term amplitude-frequency spectrum of the speech signal are used as acoustic characteristics, the algorithm of determination of {S^opt, l^opt} will be different. The transfer function of the vocal tract approximated

th by N cylinder tubes (where S_t , l_t is the cross-section area and length of the i cylinder tube), is determined as (see I.S. Makarov, Approximating the vocal tract by conical horns // Acoustical Physics, 2009, vol. 55. No 2. P. 261-269): Here j = V— Ϊ, / is the frequency (in Hz) Z_L(j2nf) is the radiation acoustic impedance at the lips, and Aijlizf) and Cijl f) are calculated using the following matrix relations:

(2)

Here c is the speed of propagation of sound waves in the vocal tract, p is the density of air in the vocal tract, σ and γ are coefficients introduced to account the acoustic losses for viscous friction and heat conductivity in the vocal tract and to account acoustic impedance of the walls of the cylinder tubes (different formulae for these coefficients and specific constant values are provided in M. M. Sondhi, J. Scroeter, A Hybrid Time-Frequency Domain Articulatory Speech Synthesizer // IEEE Trans. Acoust., Speecn, and Signal Process. ASSP-35. 1987. P. 955-967; I.S. Makarov, Approximating the vocal tract by conical horns // Acoustical Physics, 2009, vol. 55. No 2. P. 261-269).

From (1) we have the following formulae for local derivatives from T for S_t H l_t

According to (2) the local derivatives of A and C for S_t H Ζ_έ onpe^e^HiOTCH TaK:

(4a)

Having ratios (1-4) we obtain the following algorithm for calculation of {S^opt, l^opt}. Let us introduce the following definitions: / = (f_lt ... , f_v)^Tr- a set of frequencies for calculation of the transfer function, Tr - transposition symbol,

T = ( ijlnf-i), ... , T(J2n )) - set of values of transfer function calculated using values of /, / = - Jacobi matrix, Iter - number of the current

iteration, Iter^ - maximum number of iterations of the algorithm, Thr - desired value of the similarity between the acoustic characteristics.

Step 0: algorithm input data: 1) ø, - set (vector) of acoustic characteristics calculated in the i^th time window, 2) functions of distribution of cross-section areas f and length i of cylinder tubes from articulatory code book. Assume that Iter = 0, Sj_ter = Sf , l_]ter = . Transfer function T is calculated based on a, using relations from X. Huang, A. Acero, H.-W. Hon, Spoken language processing: a guide to theory, algorithm, and system development. Prentice-Hall, Inc. 2001. Using Si_ter and li_ter and equations (1-2) calculate T_Uer.

Step 1: Assume that Iter = Iter + 1.

Step 2: Calculate g^) = (^ ) + [7| , _,]⁺(r - Here symbol "+" means operation of calculation of pseudo-inverse matrix.

Step 3: Using £_/terand l_Iterand equations (1-2) calculate T_Iter.

Step 4: Calculate the measure of similarity d between T_Iter and Γ,

Step 5: If d < Thr or Iter > Iter max, move to Step 6. Otherwise move to Step 1.

Step 6: Assume that S^opt = S_ner, l^opt = liter- Fig. 8 shows as an example 5₀ (top) H 5^opi(bottom), calculated using the described algorithm.

The vocal tract configuration is determined based on the functions of the cross-section areas and lengths of cylinder tubes approximating the vocal tract.

{S^opt, l^opt} is recalculated into respective configuration of the vocal tract. Any algorithm described in the literature can be used for recalculation, e.g., B. Story, On the ability of a physiologically constrained area function model of the vocal tract to produce normal formant patterns under perturbed conditions // J. Acoust. Soc. Amer. 115 (4), April 2004. P. 1760-1770. Fig. 9 shows as an example the original configuration of the vocal tract (corresponding to 5₀ ) (dashed line) and calculated configuration of the vocal tract (corresponding to S^opt) (solid line).

It is evident for a specialist in this field that specific options of implementing the method and system of vocal tract configuration restoration were described here for illustrative purpose, different modifications are acceptable within the framework and concept of the scope of the invention.

Claims

FORMULA

1. Method of restoration of the vocal tract configuration characterized by the following:

• Preliminary processing of audio signal.

• Detection of the vector of acoustic characteristics for vowels and vowel-like segments.

• The most similar vectors of acoustic characteristics are determined using articulatory code book.

• Determination of the functions of the cross-section areas and lengths of cylinder tubes of vectors of acoustic characteristics.

• Determination of the configuration of the vocal tract based on the function of the cross-section areas and lengths of cylinder tubes approximating the vocal tract.

2. Method as per item 1 characterized by the fact that preliminary processing of the audio signal includes noise filtering and/or extraction of the segments of speech from pauses, and/or disposition of the boundaries of the sounds, and/or sampling of vowels and vowel-like sounds.

3. Method as per item 1 characterized by the fact that configuration of the vocal tract is restored in cyclic manner.

4. Method as per item 1 characterized by the fact that any known articulatory code book is used as a code book.

5. Method as per item 1 characterized by the fact that an own articulatory code book using any of the known methods.

6. Method as per item 1 characterized by the fact that in case of calculation of the function of the cross-section areas and lengths using first approximations the resonance frequencies of the vocal tract are used as acoustic characteristics.

7. Method as per item 1 characterized by the fact that in case of calculation of the function of the cross-section areas and lengths using first approximations the parameters describing the short-term amplitude-frequency spectrum of the speech signal are used as acoustic characteristics.

8. Method as per item 1 characterized by the fact that any known algorithm of conversion of the functions of the cross-section areas and lengths based on the first approximations into respective configuration of the vocal tract is used for determination of the configuration of the vocal tract.

9. The system of restoration of the vocal tract configuration contains:

• at least one command processing device;

• at least one data storage device;

• one or more computer programs loaded into at least one of the said data storage devices and executed in at least one of the said command processing devices, while one or more computer programs contain instructions for the usage of the method described in item 1.

10. Machine-readable media containing machine-readable instructions executable by one or more processors, which during their execution implement the method of the vocal tract configuration restoration as described in any of the items 1-8.