CN112820319A - Human snore recognition method and device - Google Patents

Human snore recognition method and device Download PDF

Info

Publication number
CN112820319A
CN112820319A CN202011642951.0A CN202011642951A CN112820319A CN 112820319 A CN112820319 A CN 112820319A CN 202011642951 A CN202011642951 A CN 202011642951A CN 112820319 A CN112820319 A CN 112820319A
Authority
CN
China
Prior art keywords
snore
human
sound signal
sound
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011642951.0A
Other languages
Chinese (zh)
Inventor
单华锋
赵晓磊
曹辉
张建炜
任宇翔
陈学刚
程兴港
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Keeson Technology Corp Ltd
Original Assignee
Keeson Technology Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Keeson Technology Corp Ltd filed Critical Keeson Technology Corp Ltd
Priority to CN202011642951.0A priority Critical patent/CN112820319A/en
Publication of CN112820319A publication Critical patent/CN112820319A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention provides a human snore recognition method and a device thereof, wherein the human snore recognition method comprises the following steps: collecting a sound signal; performing autocorrelation endpoint detection on the sound signal to determine a sound source of the sound signal; if the sound source of the sound signal contains human sound, extracting the acoustic features of the sound signal; matching the acoustic features with a preset acoustic model; if the matching is successful, preliminarily judging that the sound signal is the human snore, and recording the time point of preliminarily judging that the sound signal is the human snore; and finally judging that the sound signal is human snore if the number of the time points exceeds a preset number threshold value within a preset time length. The human snore recognition method and the device thereof can judge whether snore occurs more accurately, improve the triggering accuracy and improve the user experience.

Description

Human snore recognition method and device
Technical Field
The invention relates to the technical field of sound identification, in particular to a human snore identification method and a human snore identification device.
Background
Snoring is the sound of airflow passing through the mouth during breathing due to a person's narrowing of the pharyngeal passageway for a variety of reasons during sleep. People mostly rely on subjective description of the same room or measure the snoring degree by a decibel meter, people who live alone are difficult to know the snoring condition of themselves, and the people who always do not know the snoring condition of themselves or even influence the health of the people.
At present, the electric bed mostly adopts a decibel instrument to detect snore and intervene, the snore is identified through the intensity of sound, the detection accuracy is low, and false triggering is easy.
Disclosure of Invention
The invention provides a human snore recognition method, which aims to solve the technical problem of inaccurate snore recognition in the prior art.
The invention provides a human snore identification method which comprises the following steps:
collecting a sound signal;
performing autocorrelation endpoint detection on the sound signal to determine a sound source of the sound signal;
if the sound source of the sound signal contains human sound, extracting the acoustic features of the sound signal;
matching the acoustic features with a preset acoustic model; if the matching is successful, preliminarily judging that the sound signal is the human snore, and recording the time point of preliminarily judging that the sound signal is the human snore;
and finally judging that the sound signal is human snore if the number of the time points exceeds a preset number threshold value within a preset time length.
Further, if the number of the time points exceeds a preset number threshold within a preset time period, the step of finally determining that the sound signal is the human snore includes:
the accumulation time is calculated according to the following equation (1):
Figure BDA0002873892020000021
wherein CT represents the accumulation time, tnThe difference value of two similar time points of detected snore is shown, k is a natural number, and n is a positive integer.
Further, the step of performing autocorrelation endpoint detection on the sound information to determine the sound source of the sound information comprises:
framing the sound signal, and calculating a main peak ratio and logarithmic energy of an autocorrelation function of each frame;
respectively normalizing the ratio of the main peak and the logarithmic energy;
dividing the ratio of the main peak of the autocorrelation function by the logarithmic energy to obtain a mean value;
respectively calculating a first threshold value and a second threshold value according to the following formulas (2) and (3):
T1=α*mean (2);
T2=β*mean (3);
wherein T1 represents a first threshold, T2 represents a second threshold, mean represents a mean of the autocorrelation function main peak ratio divided by the logarithmic energy, α represents a first threshold coefficient, β represents a second threshold coefficient, and 0< α < β < 1;
calculating a waveform value of the sound signal;
and if the waveform value of the sound signal is larger than a second threshold value, judging that the sound source contains human sound.
Further, before acquiring the sound signal, the method further comprises:
establishing an acoustic model;
training the acoustic model.
Further, the step of establishing an acoustic model comprises:
collecting a snoring corpus sample;
pre-emphasis, framing and windowing are carried out on the snore corpus sample to obtain snore corpus data;
and (3) extracting the acoustic characteristics of the snore corpus data by executing a Mel-cepstrum coefficient method.
Further, before matching the acoustic features with a preset acoustic model, the method further includes: and if the sound source of the sound signal does not contain human sound, continuing to collect the sound signal.
Further, the step of matching the acoustic features with a preset acoustic model includes: and if the matching fails, discarding the acoustic features.
Further, after the final determination that the sound signal is human snore, the method further includes: and controlling the electric bed to execute preset actions.
The invention also provides a corresponding human snore recognition device, which comprises:
the acquisition module is used for acquiring sound signals;
the analysis module is used for carrying out autocorrelation endpoint detection on the sound signal so as to determine a sound source of the sound information;
the feature extraction module is used for extracting the acoustic features of the sound signals if the sound sources of the sound signals contain human sounds;
the matching module is used for matching the acoustic characteristics with a preset acoustic model so as to preliminarily judge whether the sound signal is human snore or not and record a time point for preliminarily judging that the sound signal is human snore;
and the comparison module is used for comparing that the number of the time points in the preset time length exceeds a preset number threshold.
Specifically, the comparison module calculates the accumulation time according to the following formula (1):
Figure BDA0002873892020000031
wherein CT represents the accumulation time, tnThe difference value of two similar time points of detected snore is shown, k is a natural number, and n is a positive integer.
Specifically, the analysis module includes:
the first calculation module is used for framing the sound signal and calculating the main peak ratio and logarithmic energy of the autocorrelation function of each frame;
the second calculation module is used for respectively normalizing the main peak ratio and the logarithmic energy;
the third calculation module is used for dividing the ratio of the main peak of the autocorrelation function by the logarithmic energy to obtain a mean value;
a fourth calculating module, configured to calculate the first threshold and the second threshold according to the following formulas (2) and (3), respectively:
T1=α*mean (2);
T2=β*mean (3);
wherein T1 represents a first threshold, T2 represents a second threshold, mean represents a mean of the autocorrelation function main peak ratio divided by the logarithmic energy, α represents a first threshold coefficient, β represents a second threshold coefficient, and 0< α < β < 1;
the fifth calculation module is used for calculating a waveform numerical value of the sound signal;
and the sixth calculating module is used for judging whether the waveform numerical value of the sound signal is larger than a second threshold value so as to judge whether the sound source contains human sound.
Further, still include:
the model calculation module is used for establishing an acoustic model;
and the training module is used for training the acoustic model.
Further, still include:
the sample acquisition module is used for acquiring snore corpus samples;
the corpus processing module is used for carrying out pre-emphasis, framing and windowing processing on the snore corpus sample to obtain snore corpus data;
and the corpus feature extraction module is used for executing a Mel-cepstrum coefficient method to extract the acoustic features of the snore corpus data.
Further, still include: and the instruction module is used for controlling the electric bed to execute preset actions.
According to the human snore identification method, the accuracy of snore identification is improved, the triggering accuracy is improved, and the user experience is improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
fig. 1 is a schematic flow chart of a human snore identification method according to a first embodiment of the invention;
FIG. 2 is a schematic diagram of a human snore statistical method according to a first embodiment of the invention;
FIG. 3 is a schematic flow chart of a human snore identifying method according to a second embodiment of the invention;
fig. 4 is a schematic structural diagram of a human snore recognition device according to an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Fig. 1 is a schematic flow chart of a human snore identifying method according to an embodiment of the present invention, and as shown in fig. 1, an embodiment of the present invention provides a human snore identifying method, which includes:
step S101, collecting sound signals.
The sound signals in the surrounding environment are collected, and are packaged in frames and buffered according to time, so that data are provided for later calculation for subsequent identification.
Step S102, performing autocorrelation endpoint detection on the audio signal (frame-packed and time-buffered data) to determine the sound source of the audio signal.
The autocorrelation end point detection is to judge the beginning and the end of the syllable, and is a preprocessing part of the subsequent data calculation. The pitch period is one of the most important parameters of a speech signal, and describes an important feature of a speech excitation source, reflecting the time interval between two adjacent glottis or the frequency of the glottis.
The autocorrelation endpoint detection uses a short-time autocorrelation function for pitch period checking. The short-time autocorrelation function has a characteristic that when the original signal has periodicity, its autocorrelation function also has periodicity, and the periodicity is the same as the period of the original signal, and a peak occurs when the time delay amount is an integral multiple of the period. The first maximum peak point is usually taken as the pitch period point. Autocorrelation function pitch detection takes advantage of this property to perform a pitch period check, with the pitch detection points being defined as endpoints, i.e., starting points for the calculations that the data is to be processed.
The snore is obviously different from other sounds in tone color, loudness, frequency and tone, the snore can be accurately captured through the differences, sound signals except human sounds are filtered, interference of other sound sources is reduced, the identification rate can be improved through self-correlation endpoint detection, and the data volume calculated by a Central Processing Unit (CPU) is reduced.
Step S103, if the sound source of the sound signal contains human voice, extracting the acoustic feature of the sound signal.
Mel-cepstrum coefficients (MFCCs), i.e., acoustic features. And (3) performing acoustic feature extraction, namely performing Fast Fourier Transform (FFT) on each frame of the preprocessed data to obtain a frequency spectrum, further calculating a magnitude spectrum, adding a Mel filter bank to the magnitude spectrum, finally performing logarithm operation on all filter output values, and further performing Discrete Cosine Transform (DCT) to obtain a Mel inverse coefficient (MFCC). A person produces sound through a vocal tract whose shape determines what sound is made, including the tongue, teeth, etc. If we can know this shape exactly, we can describe the phoneme phone accurately. The shape of the vocal tract is shown in the envelope of the short-time power spectrum of speech. Since the mel-frequency cepstrum coefficient (MFCC) is a feature that accurately describes the envelope, the mel-frequency cepstrum coefficient (MFCC) is very important as an acoustic feature, and is more effective and reliable in sound recognition compared with a traditional decibel meter and frequency domain analysis.
Step S104, matching the acoustic characteristics with a preset acoustic model; and if the matching is successful, preliminarily judging that the sound signal is the human snore, and recording the time point of preliminarily judging that the sound signal is the human snore.
The extracted acoustic features are matched with a preset acoustic model trained by a large amount of data, a matching coefficient is set between 0 and 1, the higher the coefficient is, the higher the recognition rate is, but the error recognition rate is also increased, and the matching coefficient is more suitable for 0.5 to 0.8. If the matching is successful, whether the sound signal is the human snore or not is preliminarily judged, and the time point is recorded to determine the snore condition.
And S105, if the number of the time points exceeds a preset number threshold value within a preset time length, finally judging that the sound signal is the human snore.
And counting the identified snore time points, wherein the number of snore identification times is greater than N within the latest T time, and N and T are both settable numerical values, and finally judging that the snore is human. Through counting the possible snore times in a period of time, the influence of single judgment error on the whole judgment is reduced, and the snore judgment accuracy is improved.
Specifically, the statistical method can be implemented through a statistical method, fig. 2 is a schematic diagram of a statistical method for human snore according to an embodiment of the present invention, as shown in fig. 2, P is a time point when snore is detected, and the accumulated time is calculated according to the following formula (1):
Figure BDA0002873892020000061
wherein CT represents the accumulation time, tnThe difference value of two similar time points of detected snore is shown, k is a natural number, and n is a positive integer.
And if the CT is smaller than T, finally judging that the sound signal is the human snore. The method is beneficial to improving the accuracy of human snore recognition and reducing the error recognition rate.
Fig. 3 is a schematic flow chart of a human snore identifying method according to a second embodiment of the present invention, and as shown in fig. 3, the second embodiment of the present invention provides a human snore identifying method, which includes:
step S201, a snoring corpus sample is collected.
The fact that voice and snoring corpus samples are collected is that voice sent by everyone is different, the same voice of everyone has different timbres, and the frequency of voices of men and women and children is different, so that a large amount of corpus can be collected, more voice characteristic parameters can be extracted, and the recognition rate is improved.
Step S202, pre-emphasis, framing and windowing are carried out on the snore corpus sample to obtain snore corpus data.
Signal pre-processing (pre-emphasis, framing, windowing): the sound signal has a plurality of clutter, and the data needs to be subjected to correlation processing, so that the subsequent calculation is facilitated.
(1) The pre-emphasis process is actually passing the speech signal through a high-pass filter:
H(Z)=1-μz-1
wherein the value of mu is between 0.9 and 1.0, the recommended value of mu is 0.96, and Z represents the frequency value. The purpose of pre-emphasis is to boost the high frequency portions, flatten the spectrum of the signal, and remove the spectral tilt to compensate for the high frequency portions of the speech signal that are suppressed by the voicing system. Meanwhile, in order to eliminate the vocal cords and lip effects in the generation process, the lip radiation can be equivalent to a first-order zero model.
(2) Framing: since the speech signal is a short stationary signal, framing is required to process each frame as a stationary signal. And in order to reduce the variation from frame to frame, the adjacent frames are overlapped. The frame length is 25ms in general, and the frame shift is half of the frame length.
(3) Windowing: in the FFT algorithm, periodic continuation is actually performed. This is because the computer processes the data for a finite period of time, while the fourier transform requires integration of time from negative infinity to positive infinity, and therefore, a continuation is necessary. The spectrum leakage problem is involved here. Discontinuities in the time domain have a significant effect on the spectrum under fourier transformation, i.e. Spectral leakage (Spectral leakage). To eliminate this Spectral leakage, we need to introduce a windowing algorithm. The magnitude of the effect of spectral leakage on the spectrogram depends on the degree of discontinuity on the boundaries in the time domain map. Windowing can minimize such discontinuities.
Step S203, executing a Mel-cepstrum coefficient method to extract the acoustic characteristics of the snore corpus data, and generating an acoustic model.
The acoustic model is a database for snore recognition, a large number of acoustic features are obtained through the steps, and the acoustic model is built, so that snore data can be recognized from various mixed sounds, the recognition speed is high, and the accuracy is high.
And step S204, training the acoustic model.
Training an acoustic model: collect more corpus information, constantly perfect acoustic model, and then improve the recognition rate gradually, reduce the false triggering, this process is constantly perfect process, and the later stage is through the performance that improves the product to the continuous upgrading of model. Therefore, our product can be continuously updated.
Step S205, collecting the sound signal.
Step S206, frame-dividing the sound signal, and calculating a main peak ratio and a logarithmic energy of an autocorrelation function of each frame.
Step S207, normalizing the main peak ratio and the logarithmic energy respectively.
And step S208, dividing the ratio of the main peak of the autocorrelation function by the logarithmic energy to obtain a mean value.
Step S209, respectively calculating a first threshold and a second threshold according to the following formulas (2) and (3):
T1=α*mean (2);
T2=β*mean (3);
wherein T1 represents a first threshold value, T2 represents a second threshold value, mean represents a mean of the autocorrelation function main peak ratio divided by the logarithmic energy, α represents a first threshold coefficient, β represents a second threshold coefficient, and 0< α < β < 1.
T1 is the critical point of noise, and a waveform value less than T1 is noise. T2 is the critical point of human voice, and if the waveform value is greater than T2, it is human voice (starting point). The voice ends (dead center) when the waveform value is reduced from greater than T2 to less than T1.
Step S210, calculating a waveform value of the sound signal.
Step S211, if the waveform value of the sound signal is greater than the second threshold, it is determined that the sound source contains human sound, i.e. the starting point of the endpoint,
when the waveform falls below T1, it is considered that the human voice is no longer contained, i.e., the end point of the endpoint detection.
By performing autocorrelation endpoint detection on sound information, it is determined whether the sound source of the sound information, i.e., the sound, is from a human being.
In step S212, if the sound source of the sound signal contains human voice, the acoustic feature of the sound signal is extracted.
Step S213, if the sound source of the sound signal does not contain human voice, continuing to collect the sound signal, and re-executing step S205.
Repeated data collection, analysis and judgment can enable the system to detect snore in real time 24 hours all day, so that the snore can be found in time.
And step S214, matching the acoustic features with a preset acoustic model.
Step S215, if the matching is successful, preliminarily determining that the sound signal is the human snore, and recording the time point of preliminarily determining that the sound signal is the human snore.
And step S216, if the matching fails, discarding the acoustic features.
Abandoning relevant data, releasing storage space, ensuring that continuous data acquisition and analysis has enough storage space, ensuring that the whole system is smoother and improving the overall operation speed.
Step S217, if the number of the time points exceeds a preset number threshold value within a preset time length, finally determining that the sound signal is the human snore.
In step S218, the electric bed is controlled to perform a predetermined operation.
The electric bed comprises at least one bed board capable of being turned upwards, the bed board is turned over according to preset actions, and the turning angle is 0-60 degrees. Through the change of bed board angle, support human part, the human physiology curve of laminating changes the throat muscle state, alleviates human local pressure, realizes the snore relieving efficiency, gives the abundant relaxation of health, gives the more comfortable experience of user.
Fig. 4 is a schematic structural diagram of a human snore identifying device according to an embodiment of the present invention, and as shown in fig. 4, a control module for an electric bed snore identifying device according to an embodiment of the present invention includes:
and the acquisition module 401 is used for acquiring a sound signal.
And sound signals in the surrounding environment are collected so as to be identified later, and finally the function of the device is realized.
An analysis module 402, configured to perform autocorrelation endpoint detection on the sound signal to determine a sound source of the sound information.
The autocorrelation end point detection is to judge the beginning and the end of the syllable, and is a preprocessing part of the subsequent data calculation. The pitch period is one of the most important parameters of a speech signal, and describes an important feature of a speech excitation source, reflecting the time interval between two adjacent glottis or the frequency of the glottis.
The autocorrelation endpoint detection uses a short-time autocorrelation function for pitch period checking. The short-time autocorrelation function has a characteristic that when the original signal has periodicity, its autocorrelation function also has periodicity, and the periodicity is the same as the period of the original signal, and a peak occurs when the time delay amount is an integral multiple of the period. The first maximum peak point is usually taken as the pitch period point. Autocorrelation function pitch detection takes advantage of this property to perform a pitch period check, with the pitch detection points being defined as endpoints, i.e., starting points for the calculations that the data is to be processed.
The snore is obviously different from other sounds in tone color, loudness, frequency and tone, the snore can be accurately captured through the differences, sound signals except human sounds are filtered, interference of other sound sources is reduced, the identification rate can be improved through self-correlation endpoint detection, and the data volume calculated by a Central Processing Unit (CPU) is reduced.
A feature extraction module 403, configured to extract acoustic features of a sound signal including human voice.
Mel-cepstrum coefficients (MFCCs), i.e., acoustic features. And (3) performing acoustic feature extraction, namely performing Fast Fourier Transform (FFT) on each frame of the preprocessed data to obtain a frequency spectrum, further calculating a magnitude spectrum, adding a Mel filter bank to the magnitude spectrum, finally performing logarithm operation on all filter output values, and further performing Discrete Cosine Transform (DCT) to obtain a Mel inverse coefficient (MFCC). A person produces sound through a vocal tract whose shape determines what sound is made, including the tongue, teeth, etc. If we can know this shape exactly, we can describe the phoneme phone accurately. The shape of the vocal tract is shown in the envelope of the short-time power spectrum of speech. Since the mel-frequency cepstrum coefficient (MFCC) is a feature that accurately describes the envelope, the mel-frequency cepstrum coefficient (MFCC) is very important as an acoustic feature, and is more effective and reliable in sound recognition compared with a traditional decibel meter and frequency domain analysis.
A matching module 404, configured to match the acoustic features with a preset acoustic model, so as to preliminarily determine whether the sound signal is human snore, and record a time point at which the sound signal is preliminarily determined to be human snore.
A comparing module 405, configured to compare that the number of the time points in the preset time duration exceeds a preset number threshold.
Specifically, the comparing module 405 calculates the accumulation time according to the following formula (1):
Figure BDA0002873892020000101
wherein CT represents the accumulation time, tnThe difference value of two similar time points of detected snore is shown, k is a natural number, and n is a positive integer.
Specifically, the analysis module 402 includes:
a first calculating module 4021, configured to perform framing on the sound signal, and calculate a main peak ratio and logarithmic energy of an autocorrelation function of each frame;
the speech is not stationary macroscopically, but microscopically, the speech signal can be viewed as stationary in a short time, which we call a short stationary signal, and is framed to process each frame of the sound signal as a stationary signal. And in order to reduce the variation from frame to frame, the adjacent frames are overlapped. The frame length is 25ms in general, and the frame shift is half of the frame length. Therefore, the demand of subsequent calculation is met, data distortion caused by unstable sound signals in the subsequent calculation is avoided, and accuracy of snore identification is improved.
A second calculation module 4022, configured to normalize the main peak ratio and the logarithmic energy respectively;
normalization is to limit the data to be processed within a certain range required by a user after processing, so that subsequent data processing is facilitated and accelerated convergence during program operation is ensured. The speed of solving the optimal solution by gradient descent is increased, and the accuracy is improved.
A third calculating module 4023, configured to divide the ratio of the main peak of the autocorrelation function by the logarithmic energy to obtain a mean value;
a fourth calculating module 4024, configured to calculate the first threshold and the second threshold according to the following formulas (2) and (3), respectively:
T1=α*mean (2);
T2=β*mean (3);
wherein T1 represents a first threshold, T2 represents a second threshold, mean represents a mean of the autocorrelation function main peak ratio divided by the logarithmic energy, α represents a first threshold coefficient, β represents a second threshold coefficient, and 0< α < β < 1;
t1 is the critical point of noise, and a waveform value less than T1 is noise. T2 is the critical point of human voice, and if the waveform value is greater than T2, it is human voice (starting point). The voice ends (dead center) when the waveform value is reduced from greater than T2 to less than T1.
A fifth calculation module 4025, configured to calculate a waveform value of the sound signal;
the voice signal is quantized by utilizing the waveform numerical value so as to distinguish human voice from the mixed voice signal, and the method is small in calculation amount, simple and fast.
A sixth calculating module 4026, configured to determine whether a waveform value of the sound signal is greater than a second threshold value, so as to determine whether the sound source includes human sound.
T2 is the critical point of existence of human voice, and if the waveform value is greater than T2, it is human voice. When the signal is less than T1, the signal is considered as noise, the signal is greater than T1 and then greater than T2, and the rising process is judged to have voice until the time point of being greater than T2 (namely, the jump from less than T2 to greater than T2 in the rising process of the signal is considered as the voice starting, namely the starting point); the voice is judged to be ended when the signal falls from T2 to T1 and finally is smaller than T1 (namely, the jump from larger than T1 to smaller than T1 in the signal falling process is considered to be the voice end, namely, the dead point).
Further, the apparatus 400 for human snore recognition further comprises:
a model calculation module 406 for establishing an acoustic model;
the acoustic model is a database for snore recognition, a large number of acoustic features are obtained through the steps, the acoustic model is built, snore data can be conveniently recognized from various mixed sounds, the recognition speed is high, and the accuracy is high.
A training module 407, configured to train the acoustic model.
Collect more corpus information, constantly perfect acoustic model, and then improve the recognition rate gradually, reduce the false triggering, constantly perfect process improves the performance of product through the continuous upgrading to the model, promotes discernment speed and the degree of accuracy.
Further, the apparatus 400 for human snore recognition further comprises:
a sample collection module 408, configured to collect a snoring corpus sample;
the fact that voice and snoring corpus samples are collected is that because the sound that everyone sent is different, everyone has different tone qualities like, and the frequency of man's woman's child sound is all different, so collect a large amount of corpus and can extract more speech characteristic parameters, guarantee the feasibility of subsequent processing, improve the recognition rate.
A corpus processing module 409, configured to perform pre-emphasis, framing, and windowing on the snore corpus sample to obtain snore corpus data;
a plurality of clutters exist in the collected snore corpus samples, the snore corpus samples are subjected to related processing, subsequent calculation is facilitated, the calculation amount is reduced, and the identification speed and the accuracy are improved.
And the corpus feature extraction module 410 is configured to perform a mel-cepstrum coefficient method to extract acoustic features of the snore corpus data.
Linear transformation of the log energy spectrum based on the nonlinear mel scale (mel scale) of sound frequencies. The division of the frequency bands of the mel-frequency cepstrum is equally divided on the mel scale, which more closely approximates the human auditory system than the linearly spaced frequency bands used in the normal log cepstrum.
The melbep coefficient method has better identification performance in a pure human voice and noise environment, is high in identification speed, contributes to reducing the energy consumption and the heat dissipation requirements of the device 400 for identifying human snore, and improves the overall performance.
Further, the apparatus 400 for human snore recognition further comprises: and the instruction module 411 is configured to control the electric bed to perform a preset action.
The electric bed comprises at least one bed board capable of being turned upwards, and after receiving the instruction of the instruction module 406, the bed board is turned over according to the preset action, and the turning angle is 0-60 degrees. Through the change of bed board angle, support human local, the human physiology curve of laminating alleviates human local pressure, gives the abundant relaxation of health, gives the more comfortable experience of user.
The specific principle of the human snore recognition device of the embodiment of the invention is shown in the control method of the embodiment of the invention, and the detailed description is omitted here.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof.

Claims (14)

1. A method for identifying human snoring, comprising:
collecting a sound signal;
performing autocorrelation endpoint detection on the sound signal to determine a sound source of the sound signal;
if the sound source of the sound signal contains human sound, extracting the acoustic features of the sound signal;
matching the acoustic features with a preset acoustic model; if the matching is successful, preliminarily judging that the sound signal is the human snore, and recording the time point of preliminarily judging that the sound signal is the human snore;
and finally judging that the sound signal is human snore if the number of the time points exceeds a preset number threshold value within a preset time length.
2. The method of claim 1, wherein the step of finally determining that the sound signal is the human snore if the number of the time points exceeds a preset number threshold within a preset time period comprises:
the accumulation time is calculated according to the following equation (1):
Figure FDA0002873892010000011
wherein CT represents the accumulation time, tnThe difference value of two similar time points of detected snore is shown, k is a natural number, and n is a positive integer.
3. The method of claim 1, wherein the step of performing autocorrelation endpoint detection on the sound information to determine the sound source of the sound information comprises:
framing the sound signal, and calculating a main peak ratio and logarithmic energy of an autocorrelation function of each frame;
respectively normalizing the ratio of the main peak and the logarithmic energy;
dividing the ratio of the main peak of the autocorrelation function by the logarithmic energy to obtain a mean value;
respectively calculating a first threshold value and a second threshold value according to the following formulas (2) and (3):
T1=α*mean (2);
T2=β*mean (3);
wherein T1 represents a first threshold, T2 represents a second threshold, mean represents a mean of the autocorrelation function main peak ratio divided by the logarithmic energy, α represents a first threshold coefficient, β represents a second threshold coefficient, and 0< α < β < 1;
calculating a waveform value of the sound signal;
and if the waveform value of the sound signal is larger than a second threshold value, judging that the sound source contains human sound.
4. The method of identifying human snoring as claimed in claim 1, further comprising, prior to acquiring the sound signal:
establishing an acoustic model;
training the acoustic model.
5. The method of human snore recognition as recited in claim 4, wherein said step of establishing an acoustic model includes:
collecting a snoring corpus sample;
pre-emphasis, framing and windowing are carried out on the snore corpus sample to obtain snore corpus data;
and (3) extracting the acoustic characteristics of the snore corpus data by executing a Mel-cepstrum coefficient method.
6. The method of human snore recognition as recited in claim 1, further comprising, prior to matching the acoustic features to a preset acoustic model: and if the sound source of the sound signal does not contain human sound, continuing to collect the sound signal.
7. The method of claim 1, wherein the step of matching the acoustic features to a predetermined acoustic model comprises: and if the matching fails, discarding the acoustic features.
8. The method for identifying human snore as recited in claims 1 to 7, further comprising, after said final determination that said sound signal is human snore: and controlling the electric bed to execute preset actions.
9. An apparatus for human snore identification, comprising:
the acquisition module is used for acquiring sound signals;
the analysis module is used for carrying out autocorrelation endpoint detection on the sound signal so as to determine a sound source of the sound information;
the feature extraction module is used for extracting the acoustic features of the sound signals if the sound sources of the sound signals contain human sounds;
the matching module is used for matching the acoustic characteristics with a preset acoustic model so as to preliminarily judge whether the sound signal is human snore or not and record a time point for preliminarily judging that the sound signal is human snore;
and the comparison module is used for comparing that the number of the time points in the preset time length exceeds a preset number threshold.
10. The apparatus of claim 9, wherein the comparing module calculates the accumulated time according to the following equation (1):
Figure FDA0002873892010000031
wherein CT represents the accumulation time, tnThe difference value of two similar time points of detected snore is shown, k is a natural number, and n is a positive integer.
11. The apparatus of human snore recognition as recited in claim 9, wherein the analysis module comprises:
the first calculation module is used for framing the sound signal and calculating the main peak ratio and logarithmic energy of the autocorrelation function of each frame;
the second calculation module is used for respectively normalizing the main peak ratio and the logarithmic energy;
the third calculation module is used for dividing the ratio of the main peak of the autocorrelation function by the logarithmic energy to obtain a mean value;
a fourth calculating module, configured to calculate the first threshold and the second threshold according to the following formulas (2) and (3), respectively:
T1=α*mean (2);
T2=β*mean (3);
wherein T1 represents a first threshold, T2 represents a second threshold, mean represents a mean of the autocorrelation function main peak ratio divided by the logarithmic energy, α represents a first threshold coefficient, β represents a second threshold coefficient, and 0< α < β < 1;
the fifth calculation module is used for calculating a waveform numerical value of the sound signal;
and the sixth calculating module is used for judging whether the waveform numerical value of the sound signal is larger than a second threshold value so as to judge whether the sound source contains human sound.
12. The apparatus for human snore recognition as recited in claim 9, further comprising:
the model calculation module is used for establishing an acoustic model;
and the training module is used for training the acoustic model.
13. The apparatus for human snore recognition as recited in claim 12, further comprising:
the sample acquisition module is used for acquiring snore corpus samples;
the corpus processing module is used for carrying out pre-emphasis, framing and windowing processing on the snore corpus sample to obtain snore corpus data;
and the corpus feature extraction module is used for executing a Mel-cepstrum coefficient method to extract the acoustic features of the snore corpus data.
14. The apparatus for human snore identification as recited in any one of claims 9 to 13, further comprising: and the instruction module is used for controlling the electric bed to execute preset actions.
CN202011642951.0A 2020-12-30 2020-12-30 Human snore recognition method and device Pending CN112820319A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011642951.0A CN112820319A (en) 2020-12-30 2020-12-30 Human snore recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011642951.0A CN112820319A (en) 2020-12-30 2020-12-30 Human snore recognition method and device

Publications (1)

Publication Number Publication Date
CN112820319A true CN112820319A (en) 2021-05-18

Family

ID=75856519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011642951.0A Pending CN112820319A (en) 2020-12-30 2020-12-30 Human snore recognition method and device

Country Status (1)

Country Link
CN (1) CN112820319A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486964A (en) * 2021-07-13 2021-10-08 盛景智能科技(嘉兴)有限公司 Voice activity detection method and device, electronic equipment and storage medium
CN113599052A (en) * 2021-07-15 2021-11-05 麒盛科技股份有限公司 Snore monitoring method and system based on deep learning algorithm and corresponding electric bed control method and system
WO2023284814A1 (en) * 2021-07-15 2023-01-19 麒盛科技股份有限公司 Electric bed control method and system based on deep learning algorithm, and computer program

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN106539560A (en) * 2016-11-25 2017-03-29 美的集团股份有限公司 Sound of snoring detection method, sound of snoring detecting system, snoring system and Easy pillow
CN108369813A (en) * 2017-07-31 2018-08-03 深圳和而泰智能家居科技有限公司 Specific sound recognition methods, equipment and storage medium
CN109350075A (en) * 2018-09-18 2019-02-19 深圳和而泰数据资源与云技术有限公司 A kind of sound of snoring detection method, device and readable storage medium storing program for executing
CN211533543U (en) * 2019-05-05 2020-09-22 麒盛科技股份有限公司 Electric bed for snoring intervention

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN106539560A (en) * 2016-11-25 2017-03-29 美的集团股份有限公司 Sound of snoring detection method, sound of snoring detecting system, snoring system and Easy pillow
CN108369813A (en) * 2017-07-31 2018-08-03 深圳和而泰智能家居科技有限公司 Specific sound recognition methods, equipment and storage medium
WO2019023877A1 (en) * 2017-07-31 2019-02-07 深圳和而泰智能家居科技有限公司 Specific sound recognition method and device, and storage medium
CN109350075A (en) * 2018-09-18 2019-02-19 深圳和而泰数据资源与云技术有限公司 A kind of sound of snoring detection method, device and readable storage medium storing program for executing
CN211533543U (en) * 2019-05-05 2020-09-22 麒盛科技股份有限公司 Electric bed for snoring intervention

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈泽伟 等: "基于自相关函数的语音端点检测方法", 计算机工程与应用, vol. 54, no. 06, 11 April 2018 (2018-04-11), pages 216 - 221 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486964A (en) * 2021-07-13 2021-10-08 盛景智能科技(嘉兴)有限公司 Voice activity detection method and device, electronic equipment and storage medium
CN113599052A (en) * 2021-07-15 2021-11-05 麒盛科技股份有限公司 Snore monitoring method and system based on deep learning algorithm and corresponding electric bed control method and system
WO2023284814A1 (en) * 2021-07-15 2023-01-19 麒盛科技股份有限公司 Electric bed control method and system based on deep learning algorithm, and computer program

Similar Documents

Publication Publication Date Title
CN107610715B (en) Similarity calculation method based on multiple sound characteristics
CN106935248B (en) Voice similarity detection method and device
Dhingra et al. Isolated speech recognition using MFCC and DTW
CN112820319A (en) Human snore recognition method and device
CN108896878B (en) Partial discharge detection method based on ultrasonic waves
CN109147796B (en) Speech recognition method, device, computer equipment and computer readable storage medium
US7660718B2 (en) Pitch detection of speech signals
Kumar et al. Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm
CN110880329B (en) Audio identification method and equipment and storage medium
WO2019023877A1 (en) Specific sound recognition method and device, and storage medium
CN109256127B (en) Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter
US11672472B2 (en) Methods and systems for estimation of obstructive sleep apnea severity in wake subjects by multiple speech analyses
WO2019023879A1 (en) Cough sound recognition method and device, and storage medium
CN110570880A (en) Snore signal identification method
Kapoor et al. Parkinson’s disease diagnosis using Mel-frequency cepstral coefficients and vector quantization
CN108682432B (en) Speech emotion recognition device
CN109036437A (en) Accents recognition method, apparatus, computer installation and computer readable storage medium
CN110942784A (en) Snore classification system based on support vector machine
WO2013187826A2 (en) Cepstral separation difference
Murugappan et al. DWT and MFCC based human emotional speech classification using LDA
CN110299141A (en) The acoustic feature extracting method of recording replay attack detection in a kind of Application on Voiceprint Recognition
CN112397074A (en) Voiceprint recognition method based on MFCC (Mel frequency cepstrum coefficient) and vector element learning
Li et al. A comparative study on physical and perceptual features for deepfake audio detection
CN113782032B (en) Voiceprint recognition method and related device
Katsir et al. Evaluation of a speech bandwidth extension algorithm based on vocal tract shape estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination