CN111415680A - Method for generating anxiety prediction model based on voice and anxiety prediction system - Google Patents

Method for generating anxiety prediction model based on voice and anxiety prediction system Download PDF

Info

Publication number
CN111415680A
CN111415680A CN202010220713.4A CN202010220713A CN111415680A CN 111415680 A CN111415680 A CN 111415680A CN 202010220713 A CN202010220713 A CN 202010220713A CN 111415680 A CN111415680 A CN 111415680A
Authority
CN
China
Prior art keywords
anxiety
voice
window
sub
prediction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010220713.4A
Other languages
Chinese (zh)
Other versions
CN111415680B (en
Inventor
冯甄陶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xintu Entropy Technology Suzhou Co ltd
Original Assignee
Xintu Entropy Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xintu Entropy Technology Suzhou Co ltd filed Critical Xintu Entropy Technology Suzhou Co ltd
Priority to CN202010220713.4A priority Critical patent/CN111415680B/en
Publication of CN111415680A publication Critical patent/CN111415680A/en
Application granted granted Critical
Publication of CN111415680B publication Critical patent/CN111415680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The application provides a method for generating an anxiety prediction model based on voice and an anxiety prediction system, comprising the following steps: step 1: collecting voice of a text read by a user and SAS scale score of the user, and marking the voice by using the score; step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network. The method and the device realize automatic identification of the tested anxiety state by using the speech of the reading text through machine learning, the system is convenient to operate and deploy, and the convenience of anxiety identification and prediction is improved.

Description

Method for generating anxiety prediction model based on voice and anxiety prediction system
Technical Field
The invention relates to the field of psychology and artificial intelligence, in particular to a method for generating an anxiety prediction model based on voice and an anxiety prediction system.
Background
Anxiety disorder is a chronic disease characterized by uncontrolled, excessive, generalized, persistent anxiety, also known as anxiety neurosis, characterized primarily by an anxious emotional experience. The main performance is as follows: there is no clear and objective stress worry, restlessness, and vegetative nerve dysfunction symptoms such as palpitation, trembling hand, sweating, frequent micturition, and restlessness. Anxiety disorders are not caused by actual threats or their degree of stress and panic is not very relevant to real-world situations. Drug therapy such as anxiolytic drugs and psychotherapy are the main treatment methods for anxiety disorder.
Anxiety disorders can be said to be the most common mood disorders among the population, and shortly before, a "lancet-psychiatry" was issued as an epidemiological current condition study on the prevalence of Chinese mental disorders, which indicated that: among various psychological and mental diseases, the prevalence rate of anxiety disorder is the highest, and the lifetime prevalence rate is 7.57%. It is estimated that there are more than 5 million anxiety patients nationwide. The world health organization indicates that 90% of anxiety patients develop before age 35, with women often being more than men. In recent years, anxiety patients have an increasing trend. According to the estimation of the world health organization, about 4100 ten thousand people in China have anxiety disorder. Therefore, the identification and treatment of anxiety disorders deserves significant attention. It was found that although anxiety disorders are curable, only 36.9% of patients are treated. The largest of these is the identification of anxiety.
To date, there is no specific test for anxiety. Currently, diagnostic methods for anxiety are: (1) screening by self-reporting scale and self-diagnosis, such as anxiety self-rating scale (SAS) assessment; (2) the diagnosis is made by a specialist in accordance with the medical history, family history, clinical symptoms, course of disease and physical examination. Current assessments of anxiety symptoms are primarily made by means of self-reported scales. However, self-reported evaluations take a long time and are dependent on the subjective fit of the subject; the doctor synthesizes various information of the patient for diagnosis, the energy is more, the time is longer, and misdiagnosis is easy to occur. Meanwhile, in the case where long-term monitoring of the anxiety state is required, it is also not feasible to require the user to answer the same question repeatedly and frequently. Therefore, the need for more convenient, objective, and real-time assessment of the anxiety state of a user seems to be more and more urgent.
The Self-Rating Anxiety Scale (i.e., SAS Scale, Self-Rating Anxiety Scale SAS) was compiled by professor chinese Zung (1971). The format of scale construction to the specific assessment is quite similar to the depression self-rating scale (SDS), and is a relatively simple clinical tool for analyzing subjective symptoms of patients. Since anxiety is a common mood disorder in psychological counseling clinics, SAS has been a common scale for understanding anxiety symptoms in counseling clinics in recent years.
SAS uses a 4-level score, which is mainly used to assess the frequency of symptom appearance, and is characterized by the following criteria: "1" means no or little time; "2" means sometimes; "3" means present most of the time; "4" means present for most or all of the time. Of the 20 entries, item l5 was stated as a negative word, and scored in the order of l to 4 as described above. The other 5 items (5, 9, 13, 17, 19) are marked with a positive word and are scored in reverse order of 4-1.
The main statistical indicator of SAS is the total score. Adding the scores of the 20 items to obtain a rough score; the integer part is obtained after multiplying the rough score by 1.25, and the standard score is obtained, or the same conversion can be performed by looking up the table.
Disclosure of Invention
In order to overcome the defects in the prior art, the method collects the voice of the same text read by the user, labels the audio data by using the SAS scale score of the user, and then constructs an anxiety prediction model based on a neural network. And constructing an anxiety state automatic prediction system based on voice by using the anxiety prediction model.
According to an aspect of the present invention, a method for generating a speech-based anxiety prediction model is provided, including: step 1: collecting voice of a text read by a user and SAS scale score of the user, and marking the voice by using the score; step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network.
Preferably, in the step 2, the following steps are included:
s21: setting the length N of the sub-voice and the window x, and intercepting the voice into the sub-voice with the length N
S22: performing windowing segmentation processing on the sub-voice under a window x to generate voice characteristics of the sub-voice under the window x;
s23: dividing the sub-voices into training sub-voices and testing sub-voices;
s24: and taking the voice characteristics of the training sub-voice under the window x as input, taking the SAS scale score of the training sub-voice as output, and constructing an anxiety prediction model under the window x by utilizing a neural network algorithm.
Preferably, the speech features include basic features, derivative value features of the basic features, and statistics of the basic features and the derivative value features over the length of the time window, wherein the basic features include intensity, loudness, zero-crossing rate, unvoiced ratio, fundamental frequency envelope, 8 linear spectrum pairs, 12 mel-frequency cepstral coefficients, and the statistics include mean, standard deviation, kurtosis, skewness, and slope.
Preferably, the method further comprises:
s25: calculating the difference value between the output result and the SAS score of the test sub-voice when the voice characteristics of the test sub-voice under the window x are input into the anxiety prediction model under the window x;
s26: calculating the average difference of the anxiety prediction model under the window x, wherein the formula is as follows: the average difference is the difference/number of the test sub-voices under window x;
s27: and x is 1 to N-1, the steps S22 to S26 are repeated, and the anxiety prediction model under the window x with the minimum average difference value is used as the anxiety prediction model, wherein the window of the anxiety prediction model is the optimal window length.
Wherein, x can also adopt a plurality of numerical values which are manually set and are less than N.
According to another aspect of the present invention, a speech-based anxiety prediction system is provided, comprising: a data acquisition module, a voice feature extraction module, a training sample construction module, a neural network training module, an anxiety prediction model generation module and a prediction module, wherein,
the data acquisition module is used for acquiring tested voice;
the voice feature extraction module is used for receiving voice, a sub-voice length N and a window length x so as to extract and return voice features under the window x;
the training sample construction module is used for collecting the voice of the user and the SAS scale score and marking the voice into the SAS scale score; the training sample construction module can also transmit the voice, the sub-voice length and the window length of the user to the voice feature extraction module, and divide the returned sub-voice into training sub-voice and testing sub-voice according to a set proportion;
the neural network training module is used for constructing an anxiety prediction model under a window x by utilizing a neural network algorithm based on the training sub-speech;
the anxiety prediction model generation module is used for generating an anxiety prediction model and an optimal window length;
and the anxiety prediction module is used for receiving the tested voice, inputting the tested voice and the optimal window length into the voice feature extraction module, transmitting the voice features of the returned sub-voice to the anxiety prediction model, and judging the tested anxiety state according to the tested anxiety state score returned by the anxiety prediction model.
Preferably, in the speech feature extraction module, the speech is intercepted into sub-speech with the length of N, and then windowing segmentation processing is performed to generate the speech feature of the sub-speech under the window x.
Preferably, the speech features include basic features, derivative value features of the basic features, and statistics of the basic features and the derivative value features over the length of the time window, wherein the basic features include intensity, loudness, zero-crossing rate, unvoiced ratio, fundamental frequency envelope, 8 linear spectrum pairs, 12 mel-frequency cepstral coefficients, and the statistics include mean, standard deviation, kurtosis, skewness, and slope.
Preferably, the neural network training module receives the window length value x and the training sub-speech, takes the speech feature of the training sub-speech under the window x as input, takes the SAS scale score of the training sub-speech as output, and constructs the anxiety prediction model under the window x by using the neural network algorithm.
Preferably, in the anxiety prediction model generation module, the test sub-speech under the window x and the anxiety prediction model under the window x are received, a difference between an output result obtained after the speech feature of the test sub-speech under the window x is input into the anxiety prediction model under the window x and the SAS score of the test sub-speech is calculated, and then an average difference of the anxiety prediction models under the window x is calculated; in the anxiety prediction model generation module, x is traversed from 1 to N-1 to obtain anxiety prediction models under N-1 windows x, and the anxiety prediction model under the window x with the minimum average difference value is selected as the anxiety prediction model; and the window length corresponding to the anxiety prediction model is the optimal window length.
The prediction model obtained based on the reading text voice can automatically and effectively recognize the anxiety condition of the user at the current moment, and the recognition precision reaches more than 70% of accuracy under high and low grouping, so that the prediction model is a convenient and fast way for early warning the psychological state.
Drawings
FIG. 1 is a flow diagram illustrating a method for generating a speech-based anxiety prediction model according to an embodiment of the present invention;
FIG. 2 is a schematic flow diagram of a method of constructing an anxiety prediction model according to one embodiment of the present invention;
fig. 3 is a schematic diagram of a speech-based anxiety prediction system according to an embodiment of the present invention.
To clearly illustrate the structure of embodiments of the present invention, certain dimensions, structures and devices are shown in the drawings, which are for illustrative purposes only and are not intended to limit the invention to the particular dimensions, structures, devices and environments, which may be adjusted or modified by one of ordinary skill in the art according to particular needs and are still included in the scope of the appended claims.
Detailed Description
The following describes in detail an anxiety recognition method and warning system based on specific text reading speech according to the present invention with reference to the accompanying drawings and specific embodiments.
In the following description, various aspects of the invention will be described, however, it will be apparent to those skilled in the art that the invention may be practiced with only some or all of the structures or processes of the present invention. Specific numbers, configurations and sequences are set forth in order to provide clarity of explanation, but it will be apparent that the invention may be practiced without these specific details. In other instances, well-known features have not been set forth in detail in order not to obscure the invention.
In the present invention, the subject refers to the person to be tested, and the user refers to the person who has collected his voice and SAS scale scores.
The invention provides a method for generating an anxiety prediction model based on voice, which comprises the following steps as shown in figure 1: step 1: collecting voice of a text read by a user and SAS scale score of the user, and marking the voice by using the score; step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network.
In step 1, each data in the data acquisition must be of the same caliber, and is required to be comparable. If the user is required to read a specific text, the user is also required to read the same text, and the text can be a short piece of neutral scenic spot introduction, namely 300-500 words. The environment of data acquisition should guarantee the silence of environment as far as possible to guarantee noiselessness in the pronunciation.
In speech-based anxiety recognition and prediction, there are many voices, such as may be used to read the same neutral text, specify self-introduction of synopsis, describe the same picture, etc. During collection, the same mode is used for ensuring the same caliber and comparability.
After the user reads the voice, the user also needs to fill in an SAS scale, the score of the scale corresponds to the read voice, and when the voice is intercepted by sub voice, the score of the scale is marked on the sub voice.
In step 2, 4 steps are included, as shown in fig. 2, which is described in detail below.
S21: setting the sub-speech length N and the window x, the speech is intercepted as a sub-speech of length N, N, x units may be milliseconds. Since the voices of a plurality of users are collected, the voice of each user is divided into a plurality of parts, that is, the number of sub-voices is multiple.
S22: and carrying out windowing segmentation processing on the sub-voice under the window x to generate the voice characteristics of the sub-voice under the window x.
In one embodiment, at feature extraction, 25 basic speech features (intensity, zero-crossing rate, unvoiced-voiced ratio, fundamental frequency envelope, 8 linear spectrum pairs, 12 mel-frequency cepstrum coefficients) are extracted first, in order to express the dynamic change of the speech features, derivative value features are calculated for all the speech features respectively (△), and 5 statistics (mean, standard deviation, kurtosis, skewness, slope, total (25+25) × 5 ═ 250 features) of the basic speech features and the derivative value features are calculated on the window segmentation technique.
Windowing is a prior art, and a plurality of windows with length x can be cut out from the sub-speech with length N, and then speech features of the sub-speech are generated in the windows. The speech features are different when x is different, and the values of x will be explained later, and how to select the prediction model formed according to x when x is different is compared.
S23: dividing the sub-voices into training sub-voices and testing sub-voices;
in the previous step, the sub-voices are subjected to voice feature extraction under the window x, so that the training sub-voices and the testing sub-voices both contain voice features under the window x, and meanwhile, the sub-voices also carry the SAS scale scores of the voices. In one embodiment, a set percentage (e.g., 80%) of the training samples (i.e., the collected samples, one sample including the voice of a user, the SAS scale score of the user) are randomly selected as training data, and the remaining samples are used as test data.
S24: and taking the voice characteristics of the training sub-voice under the window x as input, taking the SAS scale score of the training sub-voice as output, and constructing an anxiety prediction model under the window x by utilizing a neural network algorithm.
That is, the training data of the same sampling time window x is sent to the neural network to train the neural network system, that is, the relevant parameters of the neural network are obtained, so as to obtain the anxiety prediction model under the window x. Techniques for training neural networks based on input and output data are well known to those skilled in the art and have a well established framework for programming. Thus, different anxiety prediction models can be obtained according to different x.
In one embodiment, x is traversed [1, N-1], and N-1 different windows x of the anxiety prediction model are obtained through S22-S24 as described above. The following is performed for the anxiety prediction model for each window x:
s25: calculating the difference value between the output result and the SAS score of the test sub-voice when the voice characteristics of the test sub-voice under the window x are input into the anxiety prediction model under the window x;
s26: calculating the average difference of the anxiety prediction model under the window x, wherein the formula is as follows: average difference-the difference/number of the test sub-voices under window x.
And the anxiety prediction model under the window x with the minimum average difference value is the anxiety prediction model, and the window length corresponding to the anxiety prediction model is the optimal window length. The above x may also take several preset values, so that the operation speed is increased.
When the method is used, only the tested reading voice needs to be collected, then the voice is divided into sub-voices according to N and x in the model, voice characteristics under a window x are generated, then the voice characteristics are input into the anxiety prediction model, the anxiety state score can be obtained, whether the prediction value is in a safe range or not is judged according to the rule base, if the prediction value is in the safe range, the user psychological state is good, and if the prediction value is not in the safe range, the psychological state is abnormal. The rule base adopts the standard of SAS.
In one embodiment, to speed up the calculation, the window length x may be set manually instead of being traversed. The audio length is 6000ms, and the neutral text reading audio is set to x being 15ms, x being 30ms, and x being 45ms to construct three groups of sample data. For each set of sample data, some data are randomly selected as training data, for example, 80% of the samples are used as training data, that is, 8 samples, and the remaining 2 samples are used as test data. After a neural network model of anxiety prediction is established by using training data, test data is input to obtain a prediction result, then the error between the prediction result and an actual SAS result is calculated, and the mean value of the errors is used as an evaluation value of the performance of the anxiety prediction model. Training for different x results in 3 neural networks with performance evaluation values of 0.45, 0.23 and 0.30. And comparing three different values of x, and selecting the model with the minimum error value of x being 30ms as the optimal anxiety prediction model, namely the final anxiety prediction model.
According to another aspect of the present invention, a speech-based anxiety prediction system is provided, as shown in fig. 3, including a data acquisition module, a speech feature extraction module, a training sample construction module, a neural network training module, an anxiety prediction model generation module, and a prediction module, wherein,
the data acquisition module is used for acquiring tested voice;
the voice feature extraction module is used for receiving the voice, the sub-voice length N and the window length x so as to extract and return the voice feature under the window x;
the training sample construction module is used for collecting the voice of the user and the SAS scale score and marking the SAS scale score for the voice; the training sample construction module can also transmit the voice, the sub-voice length and the window length of the user to the voice feature extraction module, and divide the returned sub-voice into training sub-voice and testing sub-voice according to a set proportion;
the neural network training module is used for constructing an anxiety prediction model under the window x by utilizing a neural network algorithm based on the training sub-speech;
the anxiety prediction model generation module is used for generating an anxiety prediction model and an optimal window length; the output of the anxiety prediction model is an anxiety state score;
and the anxiety prediction module is used for receiving the tested voice, inputting the tested voice and the optimal window length into the voice feature extraction module, transmitting the voice features of the returned sub-voice to the anxiety prediction module, and judging the anxiety state of the test according to the returned anxiety state score of the test.
In the speech feature extraction module, 25 basic features (intensity, zero-crossing rate, unvoiced-turbid rate, fundamental frequency envelope, 8 linear spectrum pairs, and 12 mel-frequency cepstrum coefficients) are extracted, in order to express the dynamic change of the speech feature, derivative value features (△) are respectively calculated for all the basic features, and 5 statistics (mean, standard deviation, kurtosis, skewness, and slope) are calculated on the window segmentation technology, so that (25+25) × 5 ═ 250 features are obtained in total.
The neural network training module is used for training the anxiety prediction model through the training sub-speech transmitted by the training sample construction module; and using the training sub-voice with the same sampling time window length x as the input of the anxiety prediction model, and outputting the SAS scale score of the voice to obtain the anxiety prediction model under the window x. The training sub-speech has the SAS scale score, as well as the speech features under window x.
The anxiety prediction model generation module is used for generating an anxiety prediction model; specifically, receiving a test sub-voice under a window x and an anxiety prediction model under the window x, calculating a difference value between an output result obtained after voice characteristics of the test sub-voice under the window x are input into the focus prediction model under the window x and an SAS score of the test sub-voice, and then calculating an average difference value of the anxiety prediction model under the window x; in the anxiety prediction model generation module, x is traversed from 1 to N-1 to obtain anxiety prediction models under N-1 windows x, and the anxiety prediction model under the window x with the minimum average difference value is selected as the anxiety prediction model; the window length corresponding to the anxiety prediction model is the optimal window length, and the output is the anxiety state score.
In order to increase the calculation speed, x can also be set manually, and does not need to traverse N-1 times.
And the anxiety prediction module is used for receiving the tested voice, inputting the voice and the optimal sampling time window length into the voice feature extraction module to generate voice features under the optimal sampling time window, inputting the voice features into the anxiety prediction model to obtain a score of the tested anxiety state, and further comparing the score with a judgment rule to judge whether the psychological state is abnormal or not. The decision rule may be an SAS standard rule.
Finally, it should be noted that the above examples are only intended to describe the technical solutions of the present invention and not to limit the technical methods, the present invention can be extended in application to other modifications, variations, applications and embodiments, and therefore all such modifications, variations, applications, embodiments are considered to be within the spirit and teaching scope of the present invention.

Claims (10)

1. A method of generating a speech-based anxiety prediction model, comprising:
step 1: collecting voice of a text read by a user and SAS scale score of the user, and marking the voice by using the score;
step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network.
2. The method according to claim 1, characterized in that in step 2, the following steps are included:
s21: setting the length N of the sub-voice and the window x, and intercepting the voice into the sub-voice with the length N
S22: performing windowing segmentation processing on the sub-voice under a window x to generate voice characteristics of the sub-voice under the window x;
s23: dividing the sub-voices into training sub-voices and testing sub-voices;
s24: and taking the voice characteristics of the training sub-voice under the window x as input, taking the SAS scale score of the training sub-voice as output, and constructing an anxiety prediction model under the window x by utilizing a neural network algorithm.
3. The method of claim 1, wherein the speech features comprise basic features, derivative features of the basic features, and statistics of the basic features and the derivative features over a length of a time window, wherein the basic features comprise intensity, loudness, zero-crossing rate, unvoiced ratio, fundamental frequency envelope, 8 linear spectral pairs, 12 mel-frequency cepstral coefficients, and the statistics comprise mean, standard deviation, kurtosis, skewness, and slope.
4. The method of claim 2, further comprising:
s25: calculating the difference value between the output result and the SAS score of the test sub-voice when the voice characteristics of the test sub-voice under the window x are input into the anxiety prediction model under the window x;
s26: calculating the average difference of the anxiety prediction model under the window x, wherein the formula is as follows: the average difference is the difference/number of the test sub-voices under window x;
s27: and x is 1 to N-1, the steps S22 to S26 are repeated, and the anxiety prediction model under the window x with the minimum average difference value is used as the anxiety prediction model, wherein the window of the anxiety prediction model is the optimal window length.
5. The method according to claim 4, wherein in step S27, x is manually set to a number of values smaller than N.
6. A speech-based anxiety prediction system, the anxiety prediction system comprising: a data acquisition module, a voice feature extraction module, a training sample construction module, a neural network training module, an anxiety prediction model generation module and a prediction module, wherein,
the data acquisition module is used for acquiring tested voice;
the voice feature extraction module is used for receiving voice, a sub-voice length N and a window length x so as to extract and return voice features under the window x;
the training sample construction module is used for collecting the voice of the user and the SAS scale score and marking the voice into the SAS scale score; the training sample construction module can also transmit the voice, the sub-voice length and the window length of the user to the voice feature extraction module, and divide the returned sub-voice into training sub-voice and testing sub-voice according to a set proportion;
the neural network training module is used for constructing an anxiety prediction model under a window x by utilizing a neural network algorithm based on the training sub-speech;
the anxiety prediction model generation module is used for generating an anxiety prediction model and an optimal window length;
and the anxiety prediction module is used for receiving the tested voice, inputting the tested voice and the optimal window length into the voice feature extraction module, transmitting the voice features of the returned sub-voice to the anxiety prediction model, and judging the tested anxiety state according to the tested anxiety state score returned by the anxiety prediction model.
7. The anxiety prediction system according to claim 6, wherein the speech feature extraction module cuts the speech into sub-speech with length N, and then performs windowing segmentation process to generate speech features of the sub-speech in window x.
8. The anxiety prediction system of claim 6, wherein the speech features comprise basis features, derivative features of the basis features, and statistics of the basis features and the derivative features over a length of the time window, wherein the basis features comprise intensity, loudness, zero-crossing rate, unvoiced ratio, fundamental frequency envelope, 8 linear spectral pairs, 12 mel-frequency cepstral coefficients, and the statistics comprise mean, standard deviation, kurtosis, skewness, slope.
9. The anxiety prediction system of claim 7, wherein the neural network training module receives the window length value x and the training sub-speech, takes speech features of the training sub-speech at the window x as input, takes SAS scale scores of the training sub-speech as output, and utilizes a neural network algorithm to construct the anxiety prediction model at the window x.
10. The anxiety prediction system according to claim 9, wherein the anxiety prediction model generation module receives the test sub-speech in window x and the anxiety prediction model in window x, calculates a difference between an output result obtained after the speech feature of the test sub-speech in window x is input to the anxiety prediction model in window x and the SAS score of the test sub-speech, and then calculates an average difference of the anxiety prediction models in window x; in the anxiety prediction model generation module, x is traversed from 1 to N-1 to obtain anxiety prediction models under N-1 windows x, and the anxiety prediction model under the window x with the minimum average difference value is selected as the anxiety prediction model; the window length corresponding to the anxiety prediction model is the optimal window length, and the output of the anxiety prediction model is the anxiety state score.
CN202010220713.4A 2020-03-26 2020-03-26 Voice-based anxiety prediction model generation method and anxiety prediction system Active CN111415680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010220713.4A CN111415680B (en) 2020-03-26 2020-03-26 Voice-based anxiety prediction model generation method and anxiety prediction system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010220713.4A CN111415680B (en) 2020-03-26 2020-03-26 Voice-based anxiety prediction model generation method and anxiety prediction system

Publications (2)

Publication Number Publication Date
CN111415680A true CN111415680A (en) 2020-07-14
CN111415680B CN111415680B (en) 2023-05-23

Family

ID=71494595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010220713.4A Active CN111415680B (en) 2020-03-26 2020-03-26 Voice-based anxiety prediction model generation method and anxiety prediction system

Country Status (1)

Country Link
CN (1) CN111415680B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507959A (en) * 2020-12-21 2021-03-16 中国科学院心理研究所 Method for establishing emotion perception model based on individual face analysis in video
CN112800908A (en) * 2021-01-19 2021-05-14 中国科学院心理研究所 Method for establishing anxiety perception model based on individual gait analysis in video

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6691082B1 (en) * 1999-08-03 2004-02-10 Lucent Technologies Inc Method and system for sub-band hybrid coding
TW200823865A (en) * 2006-11-30 2008-06-01 Inst Information Industry Voice detection apparatus, method, application program, and computer readable medium for adjusting a window size dynamically
US20160261961A1 (en) * 2013-11-28 2016-09-08 Widex A/S Method of operating a hearing aid system and a hearing aid system
CN106504772A (en) * 2016-11-04 2017-03-15 东南大学 Speech-emotion recognition method based on weights of importance support vector machine classifier
CN106725532A (en) * 2016-12-13 2017-05-31 兰州大学 Depression automatic evaluation system and method based on phonetic feature and machine learning
CN107633851A (en) * 2017-07-31 2018-01-26 中国科学院自动化研究所 Discrete voice emotion identification method, apparatus and system based on the prediction of emotion dimension
CN108389631A (en) * 2018-02-07 2018-08-10 平安科技(深圳)有限公司 Varicella morbidity method for early warning, server and computer readable storage medium
CN108806724A (en) * 2018-08-15 2018-11-13 太原理工大学 A kind of emotional speech PAD values prediction technique and system
CN109036466A (en) * 2018-08-01 2018-12-18 太原理工大学 The emotion dimension PAD prediction technique of Emotional Speech identification
CN109599129A (en) * 2018-11-13 2019-04-09 杭州电子科技大学 Voice depression recognition methods based on attention mechanism and convolutional neural networks
US20200075040A1 (en) * 2018-08-31 2020-03-05 The Regents Of The University Of Michigan Automatic speech-based longitudinal emotion and mood recognition for mental health treatment

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6691082B1 (en) * 1999-08-03 2004-02-10 Lucent Technologies Inc Method and system for sub-band hybrid coding
TW200823865A (en) * 2006-11-30 2008-06-01 Inst Information Industry Voice detection apparatus, method, application program, and computer readable medium for adjusting a window size dynamically
US20160261961A1 (en) * 2013-11-28 2016-09-08 Widex A/S Method of operating a hearing aid system and a hearing aid system
CN106504772A (en) * 2016-11-04 2017-03-15 东南大学 Speech-emotion recognition method based on weights of importance support vector machine classifier
CN106725532A (en) * 2016-12-13 2017-05-31 兰州大学 Depression automatic evaluation system and method based on phonetic feature and machine learning
CN107633851A (en) * 2017-07-31 2018-01-26 中国科学院自动化研究所 Discrete voice emotion identification method, apparatus and system based on the prediction of emotion dimension
CN108389631A (en) * 2018-02-07 2018-08-10 平安科技(深圳)有限公司 Varicella morbidity method for early warning, server and computer readable storage medium
CN109036466A (en) * 2018-08-01 2018-12-18 太原理工大学 The emotion dimension PAD prediction technique of Emotional Speech identification
CN108806724A (en) * 2018-08-15 2018-11-13 太原理工大学 A kind of emotional speech PAD values prediction technique and system
US20200075040A1 (en) * 2018-08-31 2020-03-05 The Regents Of The University Of Michigan Automatic speech-based longitudinal emotion and mood recognition for mental health treatment
CN109599129A (en) * 2018-11-13 2019-04-09 杭州电子科技大学 Voice depression recognition methods based on attention mechanism and convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张茵;徐蕾;赵雅莉;胡淑霞;: "聚焦冥想对中老年高血压病人焦虑水平和睡眠质量的影响" *
张茵;徐蕾;赵雅莉;胡淑霞;: "聚焦冥想对中老年高血压病人焦虑水平和睡眠质量的影响", 护理研究 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507959A (en) * 2020-12-21 2021-03-16 中国科学院心理研究所 Method for establishing emotion perception model based on individual face analysis in video
CN112800908A (en) * 2021-01-19 2021-05-14 中国科学院心理研究所 Method for establishing anxiety perception model based on individual gait analysis in video
CN112800908B (en) * 2021-01-19 2024-03-26 中国科学院心理研究所 Method for establishing anxiety perception model based on individual gait analysis in video

Also Published As

Publication number Publication date
CN111415680B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
Alghowinem et al. Detecting depression: a comparison between spontaneous and read speech
CN106725532B (en) Depression automatic evaluation system and method based on phonetic feature and machine learning
Yancheva et al. Using linguistic features longitudinally to predict clinical scores for Alzheimer’s disease and related dementias
CN106073706B (en) A kind of customized information and audio data analysis method and system towards Mini-mental Status Examination
WO2014062441A1 (en) Screening for neurologial disease using speech articulation characteristics
CN109841231B (en) Early AD (AD) speech auxiliary screening system for Chinese mandarin
Yin et al. Towards automatic cognitive load measurement from speech analysis
Chandler et al. Overcoming the bottleneck in traditional assessments of verbal memory: Modeling human ratings and classifying clinical group membership
CN111415680B (en) Voice-based anxiety prediction model generation method and anxiety prediction system
Weiner et al. Selecting features for automatic screening for dementia based on speech
Kokkinakis et al. Data collection from persons with mild forms of cognitive impairment and healthy controls-infrastructure for classification and prediction of dementia
CN114916921A (en) Rapid speech cognition assessment method and device
Sharma et al. Prediction of specific language impairment in children using speech linear predictive coding coefficients
CN108962397B (en) Pen and voice-based cooperative task nervous system disease auxiliary diagnosis system
CN111028863B (en) Method for diagnosing post-stroke dysarthria tone errors based on neural network and diagnostic device thereof
CN116110578A (en) Screening device for diagnosis of depression symptoms assisted by computer
Milani et al. A real-time application to detect human voice disorders
Cebola et al. Speech-Based Supervised Learning Towards the Diagnosis of Amyotrophic Lateral Sclerosis.
Li et al. The far side of failure: Investigating the impact of speech recognition errors on subsequent dementia classification
Jenei et al. Severity estimation of depression using convolutional neural network
Dutta et al. A Fine-Tuned CatBoost-Based Speech Disorder Detection Model
Kiss et al. Seasonal affective disorder speech detection on the base of acoustic phonetic speech parameters
Aluru et al. Parkinson’s Disease Detection Using Machine Learning Techniques
CN111419249B (en) Depression prediction model generation method and prediction system
Moro Velázquez Towards the differential evaluation of Parkinson’s Disease by means of voice and speech processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant