CN111415680B - Voice-based anxiety prediction model generation method and anxiety prediction system - Google Patents

Voice-based anxiety prediction model generation method and anxiety prediction system Download PDF

Info

Publication number
CN111415680B
CN111415680B CN202010220713.4A CN202010220713A CN111415680B CN 111415680 B CN111415680 B CN 111415680B CN 202010220713 A CN202010220713 A CN 202010220713A CN 111415680 B CN111415680 B CN 111415680B
Authority
CN
China
Prior art keywords
anxiety
voice
window
prediction model
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010220713.4A
Other languages
Chinese (zh)
Other versions
CN111415680A (en
Inventor
冯甄陶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xintu Entropy Technology Suzhou Co ltd
Original Assignee
Xintu Entropy Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xintu Entropy Technology Suzhou Co ltd filed Critical Xintu Entropy Technology Suzhou Co ltd
Priority to CN202010220713.4A priority Critical patent/CN111415680B/en
Publication of CN111415680A publication Critical patent/CN111415680A/en
Application granted granted Critical
Publication of CN111415680B publication Critical patent/CN111415680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The application provides a method for generating an anxiety prediction model based on voice and an anxiety prediction system, comprising the following steps: step 1: collecting the voice of a user reading text and the SAS scale score of the user, and marking the voice by using the score; step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network. According to the invention, the automatic recognition of the anxiety state to be tested is realized by using the voice of the reading text and through machine learning, the system is convenient to operate and deploy, and the convenience of anxiety recognition and prediction is improved.

Description

Voice-based anxiety prediction model generation method and anxiety prediction system
Technical Field
The present invention relates to the field of psychology and artificial intelligence, and more particularly to a method and system for generating a speech-based anxiety prediction model.
Background
Anxiety disorder is a chronic disorder characterized by uncontrolled, excessive, extensive, persistent anxiety, also known as anxiety neurosis, characterized primarily by an anxiety emotional experience. The main manifestations are: there is no stress concern, restlessness, and autonomic dysfunction symptoms such as palpitations, tremble hands, sweating, frequent urination, etc., and restlessness. Anxiety disorders are not caused by a real threat or their degree of panic is quite disproportionate to reality. Drug treatment and psychological treatment such as anxiolytic are main treatment methods for anxiety disorder.
Anxiety disorder can be said to be the most common mood disorder in the population, and the "willow-psychiatry" issued "epidemiological current study on prevalence of chinese mental disorders", which indicated that: among various psychological and mental diseases, anxiety disorder has the highest prevalence rate, and the lifetime prevalence rate is 7.57%. It is estimated that more than about 5 tens of millions of anxiety patients exist nationally. World health organization indicates that 90% of anxiety patients develop before age 35, with women often being more than men. In recent years, anxiety patients have a continuously rising trend. According to the world health organization, about 4100 ten thousand people in China have anxiety disorder. Therefore, the identification and treatment of anxiety disorders is of great interest. It was found that although anxiety disorder was curable, only 36.9% of patients were treated. The biggest one of these disorders is the recognition of anxiety.
To date, there is no specific examination for anxiety disorders. At present, the diagnosis method of anxiety comprises the following steps: (1) Screening by self-reported scales and self-diagnosis, such as anxiety self-assessment (SAS) scales; (2) Diagnosis is made by specialists based on medical history, family history, clinical symptoms, course of disease and physical examination. The current assessment of anxiety symptoms is primarily by means of a self-reported scale. However, the self-reporting assessment takes a long time and depends on subjective coordination of the subject; the doctor synthesizes various information of the patient to make diagnosis, which takes more energy and longer time, and is easy to misdiagnose. Meanwhile, in the case where long-term monitoring of the state of focus is required, it is also not feasible to require the user to answer the same questions repeatedly and frequently. Therefore, the anxiety states of users are more convenient and objective, and the real-time evaluation needs seem to be more urgent.
Anxiety Self-assessment scale (i.e., SAS scale, self-Rating Anxiety Scale SAS) was compiled by the professor of chinese Zung (1971). The method from the form of scale construction to specific assessment is quite similar to the self-assessment of depression (SDS), and is a relatively simple clinical tool for analyzing subjective symptoms of patients. Since anxiety is a more common mood disorder in psychological consultation clinics, SAS has been a common scale for understanding anxiety symptoms in consultation clinics in recent years.
SAS uses a 4-level score, mainly evaluating the frequency of symptom occurrence, with the criteria: "1" means that there is no or little time; "2" means sometimes; "3" means most of the time; "4" means most or all of the time. The item l5 in the 20 items is stated by negative words and is scored according to the order l-4. The remaining 5 (5, 9, 13, 17, 19) are positively stated and are scored in reverse order 4-1.
The main statistical index of SAS is the total score. Adding the scores of the 20 items to obtain a coarse score; the standard fraction is obtained by multiplying the rough fraction by 1.25 and then taking the integer fraction, or the same conversion can be performed by looking up a table.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention collects the voice of the same text read by the user, marks the audio data by using the SAS scale score of the user, and then builds the anxiety prediction model based on the neural network. And constructing an automatic anxiety state prediction system based on the voice by using the anxiety prediction model.
According to one aspect of the present invention, a method for generating a speech-based anxiety prediction model is provided, comprising: step 1: collecting the voice of a user reading text and the SAS scale score of the user, and marking the voice by using the score; step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network.
Preferably, in the step 2, the method includes the following steps:
s21: setting a sub-voice length N and a window x, and intercepting the voice into sub-voices with the length N
S22: windowing and segmenting the sub-voices under a window x to generate voice characteristics of the sub-voices under the window x;
s23: dividing the sub-voices into training sub-voices and test sub-voices;
s24: and taking the voice characteristics of the training sub-voices under the window x as input, taking the SAS scale score of the training sub-voices as output, and constructing the anxiety prediction model under the window x by using a neural network algorithm.
Preferably, the speech features include basic features including intensity, loudness, zero crossing rate, pitch, fundamental frequency envelope, 8 linear spectrum pairs, 12 mel-frequency cepstrum coefficients, and statistics of the basic features and statistics of the derivative value features over a time window length, the statistics including mean, standard deviation, kurtosis, skewness, and slope.
Preferably, the method further comprises:
s25: calculating the voice characteristics of the test sub-voice under a window x, inputting the voice characteristics of the test sub-voice into an anxiety prediction model under the window x to obtain a difference value between an output result and the SAS score of the test sub-voice;
s26: calculating the average difference value of the anxiety prediction model under the window x, wherein the formula is as follows: average difference = the difference/the number of test sub-voices under window x;
s27: and x is 1 to N-1, the steps S22 to S26 are repeated, the anxiety prediction model under the window x with the smallest average difference value is used as the anxiety prediction model, and the window of the anxiety prediction model is the optimal window length.
Wherein x can also be a plurality of manually set values smaller than N.
According to another aspect of the present invention, there is provided a speech-based anxiety prediction system comprising: the system comprises a data acquisition module, a voice characteristic extraction module, a training sample construction module, a neural network training module, an anxiety prediction model generation module and a prediction module, wherein,
the data acquisition module is used for acquiring tested voice;
the voice characteristic extraction module is used for receiving voice, the sub-voice length N and the window length x so as to extract and return voice characteristics under the window x;
the training sample construction module is used for collecting the voice of the user and the SAS scale score and labeling the voice with the SAS scale score; the training sample construction module can also transmit the voice, the sub-voice length and the window length of the user to the voice feature extraction module, and divide the returned sub-voice into training sub-voice and test sub-voice according to a set proportion;
the neural network training module is used for constructing an anxiety prediction model under a window x by utilizing a neural network algorithm based on the training sub-voices;
the anxiety prediction model generation module is used for generating an anxiety prediction model and an optimal window length;
and the anxiety prediction module is used for receiving the tested voice, inputting the tested voice and the optimal window length into the voice characteristic extraction module, transmitting the voice characteristics of the returned sub-voices to the anxiety prediction model, and judging the tested anxiety state according to the tested anxiety state score returned by the anxiety prediction model.
Preferably, in the voice feature extraction module, voice is intercepted into sub-voice with length of N, and then windowing and segmentation processing is performed to generate voice features of the sub-voice under the window x.
Preferably, the speech features include basic features including intensity, loudness, zero crossing rate, pitch, fundamental frequency envelope, 8 linear spectrum pairs, 12 mel-frequency cepstrum coefficients, and statistics of the basic features and statistics of the derivative value features over a time window length, the statistics including mean, standard deviation, kurtosis, skewness, and slope.
Preferably, the neural network training module receives the window length value x and the training sub-voice, takes the voice characteristic of the training sub-voice under the window x as input, takes the SAS scale score of the training sub-voice as output, and utilizes a neural network algorithm to construct the anxiety prediction model under the window x.
Preferably, in the anxiety prediction model generation module, a test sub-voice under a window x and an anxiety prediction model under the window x are received, a difference value between an output result obtained after the voice characteristic of the test sub-voice under the window x is input into the anxiety prediction model under the window x and an SAS score of the test sub-voice is calculated, and then an average difference value of the anxiety prediction model under the window x is calculated; in the anxiety prediction model generation module, traversing x from 1 to N-1 to obtain anxiety prediction models under N-1 windows x, and selecting the anxiety prediction model under the window x with the smallest average difference as the anxiety prediction model; and the window length corresponding to the anxiety prediction model is the optimal window length.
The prediction model obtained based on the text-based speech can automatically and effectively identify the anxiety condition of the user at the current moment, and the identification accuracy reaches more than 70% under high-low grouping, so that the prediction model is a convenient mode capable of early warning the psychological state.
Drawings
FIG. 1 is a flow chart of a method for generating a speech-based anxiety prediction model according to one embodiment of the present invention;
FIG. 2 is a flow chart of a method of constructing an anxiety predictive model in accordance with one embodiment of the invention;
fig. 3 is a schematic diagram of the structure of a speech-based anxiety prediction system according to one embodiment of the present invention.
Specific dimensions, structures and devices are labeled in the drawings in order to clearly realize the structure of the embodiment of the present invention, but this is only for illustrative purposes and is not intended to limit the present invention to the specific dimensions, structures, devices and environments, and those skilled in the art may make adjustments or modifications to these devices and environments according to specific needs, and the adjustments or modifications made remain included in the scope of the appended claims.
Detailed Description
The following describes the detailed description of the anxiety recognition method and the early warning system based on specific text-to-speech with reference to the accompanying drawings and specific embodiments.
In the following description, various aspects of the present invention will be described, however, it will be apparent to those skilled in the art that the present invention may be practiced with only some or all of the structures or processes of the present invention. For purposes of explanation, specific numbers, configurations and orders are set forth, it is apparent that the invention may be practiced without these specific details. In other instances, well-known features will not be described in detail so as not to obscure the invention.
In the invention, the tested person refers to the person to be tested, and the user refers to the person who collects the voice and the SAS scale score.
The invention provides a method for generating an anxiety prediction model based on voice, which is shown in fig. 1 and comprises the following steps: step 1: collecting the voice of a user reading text and the SAS scale score of the user, and marking the voice by using the score; step 2: and extracting the voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network.
In step 1, the individual data in the data acquisition must be of the same caliber, requiring comparability. If the user is required to read a specific text, the text is also read, and the text can be a short section of a neutral scene showplace introduction, 300-500 words. The environment for data collection should be as quiet as possible to ensure that there is no noise in the speech.
In speech-based anxiety recognition and prediction, there are many voices, such as the same neutral text that can be spoken, self-introduction of a specified outline, description of the same picture, etc. During collection, the same mode is used to ensure that the calibers are the same and the calibers are comparable.
After the user reads, the SAS scale is filled in, the scale score corresponds to the read voice, and when the voice is intercepted by the sub-voice, the scale score is marked on the sub-voice.
In step 2, 4 steps are included, as shown in fig. 2, and described in detail below.
S21: the sub-voice length N and the window x are set, the voice is intercepted into sub-voices with the length N, and N, x units can be milliseconds. Since voices of a plurality of users are collected, each user's voice is divided into a plurality of parts, that is, a plurality of sub-voices.
S22: and carrying out windowing and segmentation processing on the sub-voices under the window x to generate voice characteristics of the sub-voices under the window x.
In one embodiment, on feature extraction, 25 basic speech features (intensity, zero crossing rate, pitch, fundamental frequency envelope, 8 linear spectral pairs, 12 mel-cepstral coefficients) are first extracted, derivative value features (Δ) are calculated for all speech features separately for expressing dynamic changes of the speech features, and 5 statistics (mean, standard deviation, kurtosis, skewness, slope. Total (25+25) ×5=250 features) of the basic speech features and derivative value features are calculated on a window slicing technique.
The windowing process is prior art, in which a number of windows of x length can be truncated in a sub-speech of length N, and then speech features of the sub-speech are generated in the windows. The values of x are different, the voice characteristics are different, the values of x are described later, and the comparison of how x is different selects a prediction model formed according to x.
S23: dividing the sub-voices into training sub-voices and test sub-voices;
in the operation, the sub-voices of all users are divided into training sub-voices and test sub-voices, and in the last step, the sub-voices are subjected to voice feature extraction under the window x, so that the training sub-voices and the test sub-voices also contain the voice features under the window x, and meanwhile, the sub-voices also carry with the SAS scale scores of the voices. In one embodiment, a set proportion (e.g., 80%) of training samples (i.e., collected samples, one sample including one user's voice, the SAS gauge score of that user) are randomly selected as training data, with the remaining samples as test data.
S24: and taking the voice characteristics of the training sub-voices under the window x as input, taking the SAS scale score of the training sub-voices as output, and constructing the anxiety prediction model under the window x by using a neural network algorithm.
That is, the training data of the same sampling time window x is sent to the neural network, and the neural network system is trained to obtain the relevant parameters of the neural network, so as to obtain the anxiety prediction model under the window x. The technique of training a neural network based on input and output data is a common technique for those skilled in the art, with a well-established framework for programming. Thus, different anxiety prediction models can be obtained according to different x.
In one embodiment, x traversals [1, N-1], through S22-S24 described above, can result in a model of anxiety prediction under N-1 different windows x. The anxiety prediction model under each window x is processed as follows:
s25: calculating the voice characteristics of the test sub-voice under a window x, inputting the voice characteristics of the test sub-voice into an anxiety prediction model under the window x to obtain a difference value between an output result and the SAS score of the test sub-voice;
s26: calculating the average difference value of the anxiety prediction model under the window x, wherein the formula is as follows: average difference = the difference/the number of test sub-voices under window x.
The anxiety prediction model under the window x with the smallest average difference value is the anxiety prediction model, and the window length corresponding to the anxiety prediction model is the optimal window length. The x may also take several preset values, so that the running speed is increased.
When the method is used, only the tested read-aloud voice is required to be collected, then the voice is divided into sub-voices according to N and x in the model, voice characteristics under a window x are generated, the voice characteristics are input into the anxiety prediction model, the anxiety state score can be obtained, whether the predicted value is in a safety range or not is judged according to the rule base, if the predicted value is in the safety range, the psychological state of the user is good, and otherwise, the psychological state abnormality is indicated. The rule base adopts the standard of SAS.
In one embodiment, to increase the calculation speed, the window length x may be set manually instead of by traversing. The audio length is 6000ms neutral text read audio, x=15ms, x=30ms and x=45ms are respectively set, and three groups of sample data are constructed. For each set of sample data, some data are randomly selected as training data, for example 80% of the samples are used as training data, namely 8, and the rest 2 are used as test data. After a neural network model of anxiety prediction is established by using training data, test data are input to obtain a prediction result, then an error between the prediction result and an actual SAS result is calculated, and the average value of the error is used as an evaluation value of the performance of the anxiety prediction model. 3 neural networks were trained for different x, with their performance evaluation values of 0.45,0.23,0.30, respectively. By comparing the three different x values, the x=30ms model with the smallest error value is selected as the optimal anxiety prediction model, namely the final anxiety prediction model.
According to another aspect of the present invention, a speech-based anxiety prediction system is provided, as shown in fig. 3, comprising a data acquisition module, a speech feature extraction module, a training sample construction module, a neural network training module, an anxiety prediction model generation module, and a prediction module, wherein,
the data acquisition module is used for acquiring tested voice;
the voice characteristic extraction module is used for receiving voice, the sub-voice length N and the window length x so as to extract and return voice characteristics under the window x;
the training sample construction module is used for collecting the voice of the user and the SAS scale score and labeling the voice with the SAS scale score; the training sample construction module can also transmit the voice, the sub-voice length and the window length of the user to the voice characteristic extraction module, and divide the returned sub-voice into training sub-voice and test sub-voice according to the set proportion;
the neural network training module is used for constructing an anxiety prediction model under a window x by utilizing a neural network algorithm based on training sub-voices;
the anxiety prediction model generation module is used for generating an anxiety prediction model and an optimal window length; the output of the anxiety predictive model is an anxiety state score;
the anxiety prediction module is used for receiving the tested voice, inputting the tested voice and the optimal window length into the voice feature extraction module, transmitting the voice features of the returned sub-voices to the anxiety prediction model, and judging the tested anxiety state according to the returned tested anxiety state scores.
In the voice feature extraction module, 25 basic features (intensity, zero crossing rate, turbidity ratio, fundamental frequency envelope, 8 linear spectrum pairs and 12 mel cepstrum coefficients) are firstly extracted, derivative value features (delta) are respectively calculated for all the basic features in order to express the dynamic change of the voice features, and 5 statistics (mean, standard deviation, kurtosis, skewness and slope) are calculated on the window segmentation technology. Thus (25+25) 5=250 features are obtained in total.
The neural network training module is used for training the anxiety prediction model through the training sub-voices transmitted by the training sample construction module; training sub-voices with the same sampling time window length x are used as input of the anxiety prediction model, and SAS scale scores of the voices are output, so that the anxiety prediction model under the window x is obtained. Training sub-speech has SAS scale scores and speech features under window x.
The anxiety prediction model generation module is used for generating an anxiety prediction model; specifically, receiving a test sub-voice under a window x and an anxiety prediction model under the window x, calculating a difference value between an output result obtained after voice characteristics of the test sub-voice under the window x are input into the anxiety prediction model under the window x and an SAS score of the test sub-voice, and then calculating an average difference value of the anxiety prediction model under the window x; in the anxiety prediction model generation module, traversing x from 1 to N-1 to obtain anxiety prediction models under N-1 windows x, and selecting the anxiety prediction model under the window x with the smallest average difference as the anxiety prediction model; and the window length corresponding to the anxiety prediction model is the optimal window length, and the output is the anxiety state score.
In order to accelerate the calculation speed, x can also be set manually without traversing N-1 times.
The anxiety prediction module is used for receiving tested voice, inputting the voice and the optimal sampling time window length into the voice feature extraction module so as to generate voice features under the optimal sampling time window, inputting the voice features into the anxiety prediction model to obtain the score of the tested anxiety state, and comparing the score with a judgment rule so as to judge whether the psychological state of the score is abnormal. The judgment rule may be SAS standard rule.
Finally, it should be noted that the above embodiments are only intended to describe the technical solution of the present invention and not to limit the technical method, the present invention extends to other modifications, variations, applications and embodiments in application, and therefore all such modifications, variations, applications, embodiments are considered to be within the spirit and scope of the teachings of the present invention.

Claims (9)

1. A method of generating a speech-based anxiety prediction model, comprising:
step 1: collecting the voice of a user reading text and the SAS scale score of the user, and marking the voice by using the score;
step 2: extracting voice characteristics of the voice, and constructing an anxiety prediction model by utilizing a neural network; comprising the following steps:
s25: calculating the difference between the output result obtained from the anxiety prediction model of the test sub-voice under the voice characteristic input window x under the window x and the SAS score of the test sub-voice;
s26: calculating the average difference value of the anxiety prediction model under the window x;
s27: taking 1 to N-1, repeating the steps S25 and S26, and taking an anxiety prediction model under a window x with the smallest average difference value as an anxiety prediction model, wherein the window of the anxiety prediction model is the optimal window length;
where N is the sub-speech length.
2. The method according to claim 1, wherein in the step 2, the method for constructing the anxiety prediction model under the window x includes the steps of:
s21: setting a sub-voice length N and a window x, and intercepting the voice into a sub-voice with the length N;
s22: windowing and segmenting the sub-voices under a window x to generate voice characteristics of the sub-voices under the window x;
s23: dividing the sub-voices into training sub-voices and test sub-voices;
s24: and taking the voice characteristics of the training sub-voices under the window x as input, taking the SAS scale score of the training sub-voices as output, and constructing the anxiety prediction model under the window x by using a neural network algorithm.
3. The method of claim 1, wherein the speech features include a base feature, a derivative value feature of the base feature, and statistics of the base feature and the derivative value feature over a length of a time window, wherein the base feature includes intensity, loudness, zero crossing rate, opacity ratio, fundamental frequency envelope, 8 linear spectral pairs, 12 mel-frequency cepstral coefficients, and the statistics include mean, standard deviation, kurtosis, skewness, slope.
4. The method according to claim 1, wherein in step S26,
the average difference formula of the anxiety prediction model under window x is: average difference = the difference/the number of test sub-voices under window x.
5. The method according to claim 1, wherein in the step S27, x is a number of values smaller than N manually set.
6. A speech-based anxiety prediction system, the anxiety prediction system comprising: the system comprises a data acquisition module, a voice characteristic extraction module, a training sample construction module, a neural network training module, an anxiety prediction model generation module and a prediction module, wherein,
the data acquisition module is used for acquiring tested voice;
the voice characteristic extraction module is used for receiving voice, the sub-voice length N and the window length x so as to extract and return voice characteristics under the window x;
the training sample construction module is used for collecting the voice of the user and the SAS scale score and labeling the voice with the SAS scale score; the training sample construction module can also transmit the voice, the sub-voice length and the window length of the user to the voice feature extraction module, and divide the returned sub-voice into training sub-voice and test sub-voice according to a set proportion;
the neural network training module is used for constructing an anxiety prediction model under a window x by utilizing a neural network algorithm based on the training sub-voices;
the anxiety prediction model generation module is used for generating an anxiety prediction model and an optimal window length;
the anxiety prediction module is used for receiving the tested voice, inputting the tested voice and the optimal window length into the voice feature extraction module, transmitting the voice features of the returned sub-voices to the anxiety prediction model, and judging the tested anxiety state according to the tested anxiety state score returned by the anxiety prediction model;
in the anxiety prediction model generation module, a test sub-voice under a window x and an anxiety prediction model under the window x are received, a difference value between an output result obtained after voice characteristics of the test sub-voice under the window x are input into the anxiety prediction model under the window x and an SAS score of the test sub-voice is calculated, and then an average difference value of the anxiety prediction model under the window x is calculated; in the anxiety prediction model generation module, traversing x from 1 to N-1 to obtain anxiety prediction models under N-1 windows x, and selecting the anxiety prediction model under the window x with the smallest average difference as the anxiety prediction model; and the window length corresponding to the anxiety prediction model is the optimal window length, and the output of the anxiety prediction model is the anxiety state score.
7. The anxiety prediction system of claim 6 wherein said speech feature extraction module intercepts speech into sub-speech of length N and then performs a windowed segmentation process to generate speech features of the sub-speech under window x.
8. The anxiety prediction system of claim 6 wherein said speech features comprise basic features including intensity, loudness, zero crossing rate, blushing ratio, fundamental frequency envelope, 8 linear spectral pairs, 12 mel-frequency cepstral coefficients, statistics including mean, standard deviation, kurtosis, skewness, slope, and statistics of basic features and derivative value features over a length of a time window.
9. The anxiety prediction system of claim 7 wherein said neural network training module receives a window length value x and training sub-voices, takes as input the voice characteristics of the training sub-voices under window x, takes as output the SAS scale score of the training sub-voices, and constructs an anxiety prediction model under window x using a neural network algorithm.
CN202010220713.4A 2020-03-26 2020-03-26 Voice-based anxiety prediction model generation method and anxiety prediction system Active CN111415680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010220713.4A CN111415680B (en) 2020-03-26 2020-03-26 Voice-based anxiety prediction model generation method and anxiety prediction system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010220713.4A CN111415680B (en) 2020-03-26 2020-03-26 Voice-based anxiety prediction model generation method and anxiety prediction system

Publications (2)

Publication Number Publication Date
CN111415680A CN111415680A (en) 2020-07-14
CN111415680B true CN111415680B (en) 2023-05-23

Family

ID=71494595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010220713.4A Active CN111415680B (en) 2020-03-26 2020-03-26 Voice-based anxiety prediction model generation method and anxiety prediction system

Country Status (1)

Country Link
CN (1) CN111415680B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507959A (en) * 2020-12-21 2021-03-16 中国科学院心理研究所 Method for establishing emotion perception model based on individual face analysis in video
CN112800908B (en) * 2021-01-19 2024-03-26 中国科学院心理研究所 Method for establishing anxiety perception model based on individual gait analysis in video

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6691082B1 (en) * 1999-08-03 2004-02-10 Lucent Technologies Inc Method and system for sub-band hybrid coding

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI312981B (en) * 2006-11-30 2009-08-01 Inst Information Industr Voice detection apparatus, method, computer program product, and computer readable medium for adjusting a window size dynamically
WO2015078501A1 (en) * 2013-11-28 2015-06-04 Widex A/S Method of operating a hearing aid system and a hearing aid system
CN106504772B (en) * 2016-11-04 2019-08-20 东南大学 Speech-emotion recognition method based on weights of importance support vector machine classifier
CN106725532B (en) * 2016-12-13 2018-04-24 兰州大学 Depression automatic evaluation system and method based on phonetic feature and machine learning
CN107633851B (en) * 2017-07-31 2020-07-28 极限元(杭州)智能科技股份有限公司 Discrete speech emotion recognition method, device and system based on emotion dimension prediction
CN108389631A (en) * 2018-02-07 2018-08-10 平安科技(深圳)有限公司 Varicella morbidity method for early warning, server and computer readable storage medium
CN109036466B (en) * 2018-08-01 2022-11-29 太原理工大学 Emotion dimension PAD prediction method for emotion voice recognition
CN108806724B (en) * 2018-08-15 2020-08-25 太原理工大学 Method and system for predicting sentiment voice PAD value
US11545173B2 (en) * 2018-08-31 2023-01-03 The Regents Of The University Of Michigan Automatic speech-based longitudinal emotion and mood recognition for mental health treatment
CN109599129B (en) * 2018-11-13 2021-09-14 杭州电子科技大学 Voice depression recognition system based on attention mechanism and convolutional neural network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6691082B1 (en) * 1999-08-03 2004-02-10 Lucent Technologies Inc Method and system for sub-band hybrid coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
聚焦冥想对中老年高血压病人焦虑水平和睡眠质量的影响;张茵;徐蕾;赵雅莉;胡淑霞;;护理研究(第13期) *

Also Published As

Publication number Publication date
CN111415680A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
Alghowinem et al. Detecting depression: a comparison between spontaneous and read speech
CN106725532A (en) Depression automatic evaluation system and method based on phonetic feature and machine learning
CN106073706B (en) A kind of customized information and audio data analysis method and system towards Mini-mental Status Examination
CN110570941B (en) System and device for assessing psychological state based on text semantic vector model
CN111415680B (en) Voice-based anxiety prediction model generation method and anxiety prediction system
CN112006697A (en) Gradient boosting decision tree depression recognition method based on voice signals
Ravi et al. Fraug: A frame rate based data augmentation method for depression detection from speech signals
Weiner et al. Selecting features for automatic screening for dementia based on speech
Bonin et al. Determinants of naming latencies, object comprehension times, and new norms for the Russian standardized set of the colorized version of the Snodgrass and Vanderwart pictures
Dumpala et al. Estimating severity of depression from acoustic features and embeddings of natural speech
CN111145903A (en) Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system
CN113111151A (en) Cross-modal depression detection method based on intelligent voice question answering
Berisha et al. Are reported accuracies in the clinical speech machine learning literature overoptimistic?
An et al. Mental health detection from speech signal: A convolution neural networks approach
CN116110578A (en) Screening device for diagnosis of depression symptoms assisted by computer
Benba et al. Using RASTA-PLP for discriminating between different neurological diseases
CN114038562A (en) Psychological development assessment method, device and system and electronic equipment
Danner et al. Advancing Mental Health Diagnostics: GPT-Based Method for Depression Detection
CN111341346A (en) Language expression capability evaluation method and system for fusion depth language generation model
Wang et al. MFCC-based deep convolutional neural network for audio depression recognition
Jenei et al. Severity estimation of depression using convolutional neural network
Li et al. The far side of failure: Investigating the impact of speech recognition errors on subsequent dementia classification
Singh et al. Analyzing machine learning algorithms for speech impairment related issues
Gillespie et al. Exploratory analysis of speech features related to depression in adults with Aphasia
TW201828216A (en) Automated language evaluation method including a preparation step, a pronunciation step, and a grading step

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant