CN111415680A

CN111415680A - Method for generating anxiety prediction model based on voice and anxiety prediction system

Info

Publication number: CN111415680A
Application number: CN202010220713.4A
Authority: CN
Inventors: 冯甄陶
Original assignee: Xintu Entropy Technology Suzhou Co ltd
Current assignee: Xintu Entropy Technology Suzhou Co ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2020-07-14
Anticipated expiration: 2040-03-26
Also published as: CN111415680B

Abstract

The application provides a method for generating an anxiety prediction model based on voice and an anxiety prediction system, comprising the following steps: step 1: collecting voice of a text read by a user and SAS scale score of the user, and marking the voice by using the score; step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network. The method and the device realize automatic identification of the tested anxiety state by using the speech of the reading text through machine learning, the system is convenient to operate and deploy, and the convenience of anxiety identification and prediction is improved.

Description

Method for generating anxiety prediction model based on voice and anxiety prediction system

Technical Field

The invention relates to the field of psychology and artificial intelligence, in particular to a method for generating an anxiety prediction model based on voice and an anxiety prediction system.

Background

Anxiety disorder is a chronic disease characterized by uncontrolled, excessive, generalized, persistent anxiety, also known as anxiety neurosis, characterized primarily by an anxious emotional experience. The main performance is as follows: there is no clear and objective stress worry, restlessness, and vegetative nerve dysfunction symptoms such as palpitation, trembling hand, sweating, frequent micturition, and restlessness. Anxiety disorders are not caused by actual threats or their degree of stress and panic is not very relevant to real-world situations. Drug therapy such as anxiolytic drugs and psychotherapy are the main treatment methods for anxiety disorder.

Anxiety disorders can be said to be the most common mood disorders among the population, and shortly before, a "lancet-psychiatry" was issued as an epidemiological current condition study on the prevalence of Chinese mental disorders, which indicated that: among various psychological and mental diseases, the prevalence rate of anxiety disorder is the highest, and the lifetime prevalence rate is 7.57%. It is estimated that there are more than 5 million anxiety patients nationwide. The world health organization indicates that 90% of anxiety patients develop before age 35, with women often being more than men. In recent years, anxiety patients have an increasing trend. According to the estimation of the world health organization, about 4100 ten thousand people in China have anxiety disorder. Therefore, the identification and treatment of anxiety disorders deserves significant attention. It was found that although anxiety disorders are curable, only 36.9% of patients are treated. The largest of these is the identification of anxiety.

To date, there is no specific test for anxiety. Currently, diagnostic methods for anxiety are: (1) screening by self-reporting scale and self-diagnosis, such as anxiety self-rating scale (SAS) assessment; (2) the diagnosis is made by a specialist in accordance with the medical history, family history, clinical symptoms, course of disease and physical examination. Current assessments of anxiety symptoms are primarily made by means of self-reported scales. However, self-reported evaluations take a long time and are dependent on the subjective fit of the subject; the doctor synthesizes various information of the patient for diagnosis, the energy is more, the time is longer, and misdiagnosis is easy to occur. Meanwhile, in the case where long-term monitoring of the anxiety state is required, it is also not feasible to require the user to answer the same question repeatedly and frequently. Therefore, the need for more convenient, objective, and real-time assessment of the anxiety state of a user seems to be more and more urgent.

The Self-Rating Anxiety Scale (i.e., SAS Scale, Self-Rating Anxiety Scale SAS) was compiled by professor chinese Zung (1971). The format of scale construction to the specific assessment is quite similar to the depression self-rating scale (SDS), and is a relatively simple clinical tool for analyzing subjective symptoms of patients. Since anxiety is a common mood disorder in psychological counseling clinics, SAS has been a common scale for understanding anxiety symptoms in counseling clinics in recent years.

SAS uses a 4-level score, which is mainly used to assess the frequency of symptom appearance, and is characterized by the following criteria: "1" means no or little time; "2" means sometimes; "3" means present most of the time; "4" means present for most or all of the time. Of the 20 entries, item l5 was stated as a negative word, and scored in the order of l to 4 as described above. The other 5 items (5, 9, 13, 17, 19) are marked with a positive word and are scored in reverse order of 4-1.

The main statistical indicator of SAS is the total score. Adding the scores of the 20 items to obtain a rough score; the integer part is obtained after multiplying the rough score by 1.25, and the standard score is obtained, or the same conversion can be performed by looking up the table.

Disclosure of Invention

In order to overcome the defects in the prior art, the method collects the voice of the same text read by the user, labels the audio data by using the SAS scale score of the user, and then constructs an anxiety prediction model based on a neural network. And constructing an anxiety state automatic prediction system based on voice by using the anxiety prediction model.

According to an aspect of the present invention, a method for generating a speech-based anxiety prediction model is provided, including: step 1: collecting voice of a text read by a user and SAS scale score of the user, and marking the voice by using the score; step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network.

Preferably, in the step 2, the following steps are included:

s21: setting the length N of the sub-voice and the window x, and intercepting the voice into the sub-voice with the length N

S22: performing windowing segmentation processing on the sub-voice under a window x to generate voice characteristics of the sub-voice under the window x;

s23: dividing the sub-voices into training sub-voices and testing sub-voices;

s24: and taking the voice characteristics of the training sub-voice under the window x as input, taking the SAS scale score of the training sub-voice as output, and constructing an anxiety prediction model under the window x by utilizing a neural network algorithm.

Preferably, the speech features include basic features, derivative value features of the basic features, and statistics of the basic features and the derivative value features over the length of the time window, wherein the basic features include intensity, loudness, zero-crossing rate, unvoiced ratio, fundamental frequency envelope, 8 linear spectrum pairs, 12 mel-frequency cepstral coefficients, and the statistics include mean, standard deviation, kurtosis, skewness, and slope.

Preferably, the method further comprises:

s25: calculating the difference value between the output result and the SAS score of the test sub-voice when the voice characteristics of the test sub-voice under the window x are input into the anxiety prediction model under the window x;

s26: calculating the average difference of the anxiety prediction model under the window x, wherein the formula is as follows: the average difference is the difference/number of the test sub-voices under window x;

s27: and x is 1 to N-1, the steps S22 to S26 are repeated, and the anxiety prediction model under the window x with the minimum average difference value is used as the anxiety prediction model, wherein the window of the anxiety prediction model is the optimal window length.

Wherein, x can also adopt a plurality of numerical values which are manually set and are less than N.

According to another aspect of the present invention, a speech-based anxiety prediction system is provided, comprising: a data acquisition module, a voice feature extraction module, a training sample construction module, a neural network training module, an anxiety prediction model generation module and a prediction module, wherein,

the data acquisition module is used for acquiring tested voice;

the voice feature extraction module is used for receiving voice, a sub-voice length N and a window length x so as to extract and return voice features under the window x;

the training sample construction module is used for collecting the voice of the user and the SAS scale score and marking the voice into the SAS scale score; the training sample construction module can also transmit the voice, the sub-voice length and the window length of the user to the voice feature extraction module, and divide the returned sub-voice into training sub-voice and testing sub-voice according to a set proportion;

the neural network training module is used for constructing an anxiety prediction model under a window x by utilizing a neural network algorithm based on the training sub-speech;

the anxiety prediction model generation module is used for generating an anxiety prediction model and an optimal window length;

and the anxiety prediction module is used for receiving the tested voice, inputting the tested voice and the optimal window length into the voice feature extraction module, transmitting the voice features of the returned sub-voice to the anxiety prediction model, and judging the tested anxiety state according to the tested anxiety state score returned by the anxiety prediction model.

Preferably, in the speech feature extraction module, the speech is intercepted into sub-speech with the length of N, and then windowing segmentation processing is performed to generate the speech feature of the sub-speech under the window x.

Preferably, the neural network training module receives the window length value x and the training sub-speech, takes the speech feature of the training sub-speech under the window x as input, takes the SAS scale score of the training sub-speech as output, and constructs the anxiety prediction model under the window x by using the neural network algorithm.

Preferably, in the anxiety prediction model generation module, the test sub-speech under the window x and the anxiety prediction model under the window x are received, a difference between an output result obtained after the speech feature of the test sub-speech under the window x is input into the anxiety prediction model under the window x and the SAS score of the test sub-speech is calculated, and then an average difference of the anxiety prediction models under the window x is calculated; in the anxiety prediction model generation module, x is traversed from 1 to N-1 to obtain anxiety prediction models under N-1 windows x, and the anxiety prediction model under the window x with the minimum average difference value is selected as the anxiety prediction model; and the window length corresponding to the anxiety prediction model is the optimal window length.

The prediction model obtained based on the reading text voice can automatically and effectively recognize the anxiety condition of the user at the current moment, and the recognition precision reaches more than 70% of accuracy under high and low grouping, so that the prediction model is a convenient and fast way for early warning the psychological state.

Drawings

FIG. 1 is a flow diagram illustrating a method for generating a speech-based anxiety prediction model according to an embodiment of the present invention;

FIG. 2 is a schematic flow diagram of a method of constructing an anxiety prediction model according to one embodiment of the present invention;

fig. 3 is a schematic diagram of a speech-based anxiety prediction system according to an embodiment of the present invention.

To clearly illustrate the structure of embodiments of the present invention, certain dimensions, structures and devices are shown in the drawings, which are for illustrative purposes only and are not intended to limit the invention to the particular dimensions, structures, devices and environments, which may be adjusted or modified by one of ordinary skill in the art according to particular needs and are still included in the scope of the appended claims.

Detailed Description

The following describes in detail an anxiety recognition method and warning system based on specific text reading speech according to the present invention with reference to the accompanying drawings and specific embodiments.

In the following description, various aspects of the invention will be described, however, it will be apparent to those skilled in the art that the invention may be practiced with only some or all of the structures or processes of the present invention. Specific numbers, configurations and sequences are set forth in order to provide clarity of explanation, but it will be apparent that the invention may be practiced without these specific details. In other instances, well-known features have not been set forth in detail in order not to obscure the invention.

In the present invention, the subject refers to the person to be tested, and the user refers to the person who has collected his voice and SAS scale scores.

The invention provides a method for generating an anxiety prediction model based on voice, which comprises the following steps as shown in figure 1: step 1: collecting voice of a text read by a user and SAS scale score of the user, and marking the voice by using the score; step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network.

In step 1, each data in the data acquisition must be of the same caliber, and is required to be comparable. If the user is required to read a specific text, the user is also required to read the same text, and the text can be a short piece of neutral scenic spot introduction, namely 300-500 words. The environment of data acquisition should guarantee the silence of environment as far as possible to guarantee noiselessness in the pronunciation.

In speech-based anxiety recognition and prediction, there are many voices, such as may be used to read the same neutral text, specify self-introduction of synopsis, describe the same picture, etc. During collection, the same mode is used for ensuring the same caliber and comparability.

After the user reads the voice, the user also needs to fill in an SAS scale, the score of the scale corresponds to the read voice, and when the voice is intercepted by sub voice, the score of the scale is marked on the sub voice.

In step 2, 4 steps are included, as shown in fig. 2, which is described in detail below.

S21: setting the sub-speech length N and the window x, the speech is intercepted as a sub-speech of length N, N, x units may be milliseconds. Since the voices of a plurality of users are collected, the voice of each user is divided into a plurality of parts, that is, the number of sub-voices is multiple.

S22: and carrying out windowing segmentation processing on the sub-voice under the window x to generate the voice characteristics of the sub-voice under the window x.

In one embodiment, at feature extraction, 25 basic speech features (intensity, zero-crossing rate, unvoiced-voiced ratio, fundamental frequency envelope, 8 linear spectrum pairs, 12 mel-frequency cepstrum coefficients) are extracted first, in order to express the dynamic change of the speech features, derivative value features are calculated for all the speech features respectively (△), and 5 statistics (mean, standard deviation, kurtosis, skewness, slope, total (25+25) × 5 ═ 250 features) of the basic speech features and the derivative value features are calculated on the window segmentation technique.

Windowing is a prior art, and a plurality of windows with length x can be cut out from the sub-speech with length N, and then speech features of the sub-speech are generated in the windows. The speech features are different when x is different, and the values of x will be explained later, and how to select the prediction model formed according to x when x is different is compared.

S23: dividing the sub-voices into training sub-voices and testing sub-voices;

in the previous step, the sub-voices are subjected to voice feature extraction under the window x, so that the training sub-voices and the testing sub-voices both contain voice features under the window x, and meanwhile, the sub-voices also carry the SAS scale scores of the voices. In one embodiment, a set percentage (e.g., 80%) of the training samples (i.e., the collected samples, one sample including the voice of a user, the SAS scale score of the user) are randomly selected as training data, and the remaining samples are used as test data.

That is, the training data of the same sampling time window x is sent to the neural network to train the neural network system, that is, the relevant parameters of the neural network are obtained, so as to obtain the anxiety prediction model under the window x. Techniques for training neural networks based on input and output data are well known to those skilled in the art and have a well established framework for programming. Thus, different anxiety prediction models can be obtained according to different x.

In one embodiment, x is traversed [1, N-1], and N-1 different windows x of the anxiety prediction model are obtained through S22-S24 as described above. The following is performed for the anxiety prediction model for each window x:

s26: calculating the average difference of the anxiety prediction model under the window x, wherein the formula is as follows: average difference-the difference/number of the test sub-voices under window x.

And the anxiety prediction model under the window x with the minimum average difference value is the anxiety prediction model, and the window length corresponding to the anxiety prediction model is the optimal window length. The above x may also take several preset values, so that the operation speed is increased.

When the method is used, only the tested reading voice needs to be collected, then the voice is divided into sub-voices according to N and x in the model, voice characteristics under a window x are generated, then the voice characteristics are input into the anxiety prediction model, the anxiety state score can be obtained, whether the prediction value is in a safe range or not is judged according to the rule base, if the prediction value is in the safe range, the user psychological state is good, and if the prediction value is not in the safe range, the psychological state is abnormal. The rule base adopts the standard of SAS.

In one embodiment, to speed up the calculation, the window length x may be set manually instead of being traversed. The audio length is 6000ms, and the neutral text reading audio is set to x being 15ms, x being 30ms, and x being 45ms to construct three groups of sample data. For each set of sample data, some data are randomly selected as training data, for example, 80% of the samples are used as training data, that is, 8 samples, and the remaining 2 samples are used as test data. After a neural network model of anxiety prediction is established by using training data, test data is input to obtain a prediction result, then the error between the prediction result and an actual SAS result is calculated, and the mean value of the errors is used as an evaluation value of the performance of the anxiety prediction model. Training for different x results in 3 neural networks with performance evaluation values of 0.45, 0.23 and 0.30. And comparing three different values of x, and selecting the model with the minimum error value of x being 30ms as the optimal anxiety prediction model, namely the final anxiety prediction model.

According to another aspect of the present invention, a speech-based anxiety prediction system is provided, as shown in fig. 3, including a data acquisition module, a speech feature extraction module, a training sample construction module, a neural network training module, an anxiety prediction model generation module, and a prediction module, wherein,

the data acquisition module is used for acquiring tested voice;

the voice feature extraction module is used for receiving the voice, the sub-voice length N and the window length x so as to extract and return the voice feature under the window x;

the training sample construction module is used for collecting the voice of the user and the SAS scale score and marking the SAS scale score for the voice; the training sample construction module can also transmit the voice, the sub-voice length and the window length of the user to the voice feature extraction module, and divide the returned sub-voice into training sub-voice and testing sub-voice according to a set proportion;

the neural network training module is used for constructing an anxiety prediction model under the window x by utilizing a neural network algorithm based on the training sub-speech;

the anxiety prediction model generation module is used for generating an anxiety prediction model and an optimal window length; the output of the anxiety prediction model is an anxiety state score;

and the anxiety prediction module is used for receiving the tested voice, inputting the tested voice and the optimal window length into the voice feature extraction module, transmitting the voice features of the returned sub-voice to the anxiety prediction module, and judging the anxiety state of the test according to the returned anxiety state score of the test.

In the speech feature extraction module, 25 basic features (intensity, zero-crossing rate, unvoiced-turbid rate, fundamental frequency envelope, 8 linear spectrum pairs, and 12 mel-frequency cepstrum coefficients) are extracted, in order to express the dynamic change of the speech feature, derivative value features (△) are respectively calculated for all the basic features, and 5 statistics (mean, standard deviation, kurtosis, skewness, and slope) are calculated on the window segmentation technology, so that (25+25) × 5 ═ 250 features are obtained in total.

The neural network training module is used for training the anxiety prediction model through the training sub-speech transmitted by the training sample construction module; and using the training sub-voice with the same sampling time window length x as the input of the anxiety prediction model, and outputting the SAS scale score of the voice to obtain the anxiety prediction model under the window x. The training sub-speech has the SAS scale score, as well as the speech features under window x.

The anxiety prediction model generation module is used for generating an anxiety prediction model; specifically, receiving a test sub-voice under a window x and an anxiety prediction model under the window x, calculating a difference value between an output result obtained after voice characteristics of the test sub-voice under the window x are input into the focus prediction model under the window x and an SAS score of the test sub-voice, and then calculating an average difference value of the anxiety prediction model under the window x; in the anxiety prediction model generation module, x is traversed from 1 to N-1 to obtain anxiety prediction models under N-1 windows x, and the anxiety prediction model under the window x with the minimum average difference value is selected as the anxiety prediction model; the window length corresponding to the anxiety prediction model is the optimal window length, and the output is the anxiety state score.

In order to increase the calculation speed, x can also be set manually, and does not need to traverse N-1 times.

And the anxiety prediction module is used for receiving the tested voice, inputting the voice and the optimal sampling time window length into the voice feature extraction module to generate voice features under the optimal sampling time window, inputting the voice features into the anxiety prediction model to obtain a score of the tested anxiety state, and further comparing the score with a judgment rule to judge whether the psychological state is abnormal or not. The decision rule may be an SAS standard rule.

Finally, it should be noted that the above examples are only intended to describe the technical solutions of the present invention and not to limit the technical methods, the present invention can be extended in application to other modifications, variations, applications and embodiments, and therefore all such modifications, variations, applications, embodiments are considered to be within the spirit and teaching scope of the present invention.

Claims

1. A method of generating a speech-based anxiety prediction model, comprising:

step 1: collecting voice of a text read by a user and SAS scale score of the user, and marking the voice by using the score;

step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network.

2. The method according to claim 1, characterized in that in step 2, the following steps are included:

s23: dividing the sub-voices into training sub-voices and testing sub-voices;

3. The method of claim 1, wherein the speech features comprise basic features, derivative features of the basic features, and statistics of the basic features and the derivative features over a length of a time window, wherein the basic features comprise intensity, loudness, zero-crossing rate, unvoiced ratio, fundamental frequency envelope, 8 linear spectral pairs, 12 mel-frequency cepstral coefficients, and the statistics comprise mean, standard deviation, kurtosis, skewness, and slope.

4. The method of claim 2, further comprising:

5. The method according to claim 4, wherein in step S27, x is manually set to a number of values smaller than N.

6. A speech-based anxiety prediction system, the anxiety prediction system comprising: a data acquisition module, a voice feature extraction module, a training sample construction module, a neural network training module, an anxiety prediction model generation module and a prediction module, wherein,

the data acquisition module is used for acquiring tested voice;

7. The anxiety prediction system according to claim 6, wherein the speech feature extraction module cuts the speech into sub-speech with length N, and then performs windowing segmentation process to generate speech features of the sub-speech in window x.

8. The anxiety prediction system of claim 6, wherein the speech features comprise basis features, derivative features of the basis features, and statistics of the basis features and the derivative features over a length of the time window, wherein the basis features comprise intensity, loudness, zero-crossing rate, unvoiced ratio, fundamental frequency envelope, 8 linear spectral pairs, 12 mel-frequency cepstral coefficients, and the statistics comprise mean, standard deviation, kurtosis, skewness, slope.

9. The anxiety prediction system of claim 7, wherein the neural network training module receives the window length value x and the training sub-speech, takes speech features of the training sub-speech at the window x as input, takes SAS scale scores of the training sub-speech as output, and utilizes a neural network algorithm to construct the anxiety prediction model at the window x.

10. The anxiety prediction system according to claim 9, wherein the anxiety prediction model generation module receives the test sub-speech in window x and the anxiety prediction model in window x, calculates a difference between an output result obtained after the speech feature of the test sub-speech in window x is input to the anxiety prediction model in window x and the SAS score of the test sub-speech, and then calculates an average difference of the anxiety prediction models in window x; in the anxiety prediction model generation module, x is traversed from 1 to N-1 to obtain anxiety prediction models under N-1 windows x, and the anxiety prediction model under the window x with the minimum average difference value is selected as the anxiety prediction model; the window length corresponding to the anxiety prediction model is the optimal window length, and the output of the anxiety prediction model is the anxiety state score.