CN113314143A

CN113314143A - Apnea judgment method and device and electronic equipment

Info

Publication number: CN113314143A
Application number: CN202110629654.0A
Authority: CN
Inventors: 竹东翔; 程齐明
Original assignee: Nanjing Youbo Yichuang Intelligent Technology Co ltd
Current assignee: Nanjing Youbo Yichuang Intelligent Technology Co ltd
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2021-08-27
Anticipated expiration: 2041-06-07
Also published as: CN113314143B

Abstract

The application discloses a method and a device for judging apnea and electronic equipment, wherein the method comprises the following steps: acquiring an audio signal; determining the background noise loudness according to the loudness of each frame, and determining the voiced segments according to the loudness of each frame and the background noise loudness; identifying snore fragments from the sound fragments based on a neural network snore forecasting model; identifying continuous snore fragments and snore fragment intervals, and extracting normal snore interval characteristics according to the continuous snore fragments and the snore fragment intervals; determining snore fragment intervals larger than an apnea threshold value as suspected apnea fragments; and determining the effectiveness of the suspected apnea fragment according to the loudness of each frame of the suspected apnea fragment, the linear spectrum energy and a preset rule. According to the method, whether the user breathes smoothly is determined by monitoring the snore interval, and the method is strict in logic, small in calculated amount and high in identification accuracy.

Description

Apnea judgment method and device and electronic equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a method and a device for judging apnea and electronic equipment.

Background

Apnea syndrome is a serious health-threatening sleep disorder, medical science detects through the flow of breath, wearing equipment can affect the sleep state of a user, and the user faces a psychological threshold for monitoring, which greatly affects the popularization rate of screening.

At present, the respiration monitoring device based on the piezoelectric principle and electromagnetic waves/radar waves exists in the market, and the problem is that the thorax movement of a user with non-neural apnea affects the identification accuracy rate under the condition of apnea. The intelligent watch finds apnea by monitoring blood oxygen, and the difficulty lies in that the accuracy of continuous blood oxygen monitoring is contradictory to the design of power consumption, so that a user feels uncomfortable when wearing the watch at night. Yet another significant problem faced by these smart devices is that the purchase cost of the hardware itself is prohibitive for most users, and the apnea syndrome prescreening market requires a convenient and inexpensive solution.

The adoption of sound to carry out the preliminary screening of apnea syndrome is a feasible and efficient means, and on the one hand, the user does not wear uncomfortable psychological problem, and on the other hand, the popularization of smart mobile phones greatly reduces the worry of the user to the monitoring cost. However, how to accurately identify snore and breath sound is the key of the technology.

At present, many related products/patents focus on identifying spectral features before and after apnea or hypopnea occurs, but the problems of large calculation amount, low identification accuracy and the like generally exist.

Disclosure of Invention

The embodiment of the application provides an apnea judging method, an apnea judging device and electronic equipment, so as to solve or at least partially solve the problems.

According to a first aspect of the present application, there is provided a method for determining apnea, comprising:

acquiring an audio signal, and determining the loudness, linear spectrum energy and Mel spectrum energy of each frame in the audio signal;

determining the loudness of background noise according to the loudness of each frame, and determining a voiced segment according to the loudness of each frame and the loudness of the background noise;

inputting the snore forecasting model of the neural network according to the Mel frequency spectrum energy of each frame, and identifying the snore fragment from the sound fragments;

identifying continuous snore fragments and snore fragment intervals, and extracting normal snore interval characteristics according to the continuous snore fragments and the snore fragment intervals, wherein the normal snore interval characteristics comprise average snore intervals, minimum snore intervals and maximum snore intervals;

determining an apnea threshold according to a medical definition of apnea; determining snore fragment intervals larger than an apnea threshold value as suspected apnea fragments;

and determining the effectiveness of the suspected apnea fragment according to the loudness of each frame of the suspected apnea fragment, the linear spectrum energy and a preset rule.

According to a second aspect of the present application, there is provided an apnea judging apparatus, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an audio signal and determining the loudness, linear spectral energy and Mel spectral energy of each frame in the audio signal;

the first identification unit is used for determining the background noise loudness according to the loudness of each frame and determining the voiced segments according to the loudness of each frame and the background noise loudness;

the second identification unit is used for inputting the snore forecasting model of the neural network according to the Mel frequency spectrum energy of each frame and identifying the snore fragment from the sound fragment;

the third identification unit is used for identifying the continuous snore fragments and the snore fragment intervals and extracting normal snore interval characteristics according to the continuous snore fragments and the snore fragment intervals, wherein the normal snore interval characteristics comprise average snore intervals, minimum snore intervals and maximum snore intervals;

a fourth identification unit for determining an apnea threshold according to a medical definition of apnea; determining snore fragment intervals larger than an apnea threshold value as suspected apnea fragments;

and the judging unit is used for determining the effectiveness of the suspected apnea fragment according to the loudness of each frame of the suspected apnea fragment, the linear spectrum energy and a preset rule.

According to another aspect of the present application, there is provided an electronic device including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform any of the methods described above.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:

the method comprises the steps of firstly determining background noise loudness of the current environment according to loudness of frames forming an audio signal, marking segments above the background noise loudness as sound segments, further identifying the snore segments from the sound segments based on a neural network snore prediction model, identifying segments with snore segment intervals exceeding a certain threshold value as suspected breath pause segments in the snore segments, and judging whether breathing stops or not according to the loudness of each frame of the interval segments, linear spectrum energy and a preset rule. According to the method, the snore fragments are accurately identified through the neural network snore forecasting model, the snore intervals are further identified in the snore fragments, whether the user breathes smoothly is determined through monitoring the snore intervals, a solid foundation is provided for monitoring the health state of the user, and the method is strict in logic, small in calculated amount and high in identification accuracy.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a schematic flow chart of a method for determining apnea according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an apnea judging device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical solutions provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Fig. 1 shows a schematic flow chart of a method for determining apnea according to an embodiment of the present application, and as can be seen from fig. 1, the method at least includes steps S110 to S160:

it should be noted that the present application mainly determines apnea occurring during snoring, because people usually suffer from respiratory disturbance during snoring, and because the airway is obstructed during snoring, apnea is easily generated.

Step S110: acquiring an audio signal, and determining linear spectral energy, Mel spectral energy and loudness of each frame in the audio signal.

The audio signal may be, but is not limited to, a sound signal collected by the smart terminal while the user is sleeping. For example, audio signals of a microphone of the smart terminal are collected at 16000Hz to obtain a time sequence of audio signals, wherein the audio signals are represented by 16 bits at 16000Hz as an example, and have a single channel, the size of a collection point signal is (2,4,100,120,140,60, -60, -130, …), and the interval time of each point is 1/16000 seconds.

The linear spectral energy, mel spectral energy, for each frame in the audio signal may be obtained according to the following steps: collecting sound pressure amplitude values of sampling points in the audio signal at a preset frequency; and performing framing, frame shifting, Fourier transformation and Mel frequency transformation on the audio signals, and determining a plurality of linear frequency points, corresponding Mel frequencies, linear spectrum energy and Mel spectrum energy which correspond to the linear frequency points and the Mel frequencies which form each frame.

The sampling time sequence of the audio signal is intercepted by N consecutive data to form a group, the group of data is called a frame data, for example, 512 consecutive data are taken each time, this process is called a frame data, this process is a framing, and the number of data taken each time can be set according to the amount of calculation, usually 512 or 1024. Specifically, in a frame of data, which frequency points of data can be obtained, is related to frequency point resolution, for example, taking a frequency point resolution of 16000 and taking 512 data each time as an example, since 16000/512 is 31.25Hz, that is, in a frequency domain of 0 to 8000Hz, only information of 31.25 × N frequency points can be obtained, and N is an integer of 1 to 256.

In this application, in order to improve the detection accuracy, in the framing process, each time of value taking is started from the middle position of the previous frame data rather than the tail portion of the previous frame data, and in this embodiment, the second frame is started from the middle position of the first frame, that is, 257 frequency points, and the amplitude values corresponding to 512 frequency points are taken. For convenience of processing, a frame number is set for each frame of audio signal, and the frame number is sequentially increased.

And carrying out short-time Fourier transform on sound pressure amplitude values corresponding to each sampling point in the audio signal frame, and combining according to the time sequence to form the frequency spectrum energy of the audio signal frame. That is, the spectral energy of each frame can be represented by a one-dimensional array (a1, a2, a3, …, a256), which corresponds to the spectral energy amplitudes of 31.25Hz, 62.5Hz, 93.75Hz, …,8000Hz, and then the mel-frequency spectrum energy is obtained by using the mel-frequency filter bank, and the relationship between the mel-frequency spectrum and the linear frequency spectrum can refer to the prior art.

Furthermore, the loudness of each frame can be determined according to the sound pressure amplitude of each sampling point, and the specific calculation formula refers to the prior art.

Step S120: determining the loudness of background noise according to the loudness of each frame; and determining the voiced segments according to the loudness of each frame and the loudness of the background noise.

Background noise is usually present in the sleeping environment of people, the background noise is usually continuous and has smooth loudness, and therefore, the influence of the background noise is firstly eliminated. Since different users are in different environments, the background noise is different in size, and in order to accurately eliminate the background noise, the loudness of the background noise is calculated first.

Specifically, the loudness of the background noise can be calculated according to the loudness of each frame, for example, the loudness of all frames in the audio signal within 5 consecutive seconds is arranged as a 1-dimensional array, the mean and the variance are calculated, if the variance is smaller than a preset upper threshold value of stability, the background noise is obtained, and the mean of the loudness is obtained as the loudness of the background noise.

In summary, the loudness of the background noise can be determined according to the loudness of each frame, and if the loudness of a certain frame is greater than the background noise loudness, the frame is marked as a voiced frame, and if the loudness of a certain frame is less than or equal to the background noise loudness, the frame is marked as a background frame, and the voiced frame or consecutive voiced frames are used as voiced segments.

It should be noted that the above description is only an example, and the detection rule of the voiced segments may be made stricter to improve the detection accuracy, and the present application is not limited thereto.

Step S130: and inputting the snore fragments into a neural network snore predicting model according to the Mel frequency spectrum energy of each frame, and identifying the snore fragments from the sound fragments.

Talking segments may be sleeptalking, may be snoring, or may be others. Therefore, it is necessary to identify snore fragments among the voiced fragments.

The snore can be identified through a neural network snore forecasting model, the model is a two-classification model, Mel frequency spectrum energy of each frame is used as an input value and is input into the neural network snore forecasting model, and the output result is that the sound fragment is a snore fragment or a non-snore fragment.

In the application, the neural network snore prediction model is established based on a multi-layer neural network, for example, a 1-layer full-connection layer, a 1-layer Long-Short-Term Memory network Layer (LSTM), a 1-layer full-connection layer and a 1-layer logistic regression layer (softmax) are combined to establish the neural network snore prediction model.

And S140, identifying continuous snore fragments and snore fragment intervals, and extracting normal snore interval characteristics according to the continuous snore fragments and the snore fragment intervals, wherein the normal snore interval characteristics comprise average snore intervals, minimum snore intervals and maximum snore intervals. And determining continuous snore fragments in the sound fragments, wherein the parts among the discontinuous snore fragments are snore fragment intervals, the sleeping habits of each person are different, the snoring has certain regularity, and feature extraction is carried out according to the continuous snore fragments and the intervals among the continuous snore fragments, so that the normal snore interval feature of one user can be obtained, and the normal snore interval feature comprises an average snore interval, a minimum snore interval and a maximum snore interval.

Step S150, determining an apnea threshold value according to medical apnea definition; and determining the snore fragment interval larger than the apnea threshold value as a suspected apnea fragment.

According to the definition of apnea medically, the apnea threshold is at 10 s.

Further, it can be determined that the interval of the snore fragments greater than the apnea threshold is a suspected apnea fragment, for example, the interval of the snore fragments exceeds a segment of more than 10 seconds, and the segment of the snore interval exceeding the segment of more than 10 seconds is taken as the suspected apnea fragment.

When people snore continuously, the breathing disorder does not occur, and the apnea possibly occurs only when the interval between the snore and the snore exceeds 10 seconds. Therefore, the snore segment is divided into continuous snore frames and a section with the snore interval exceeding 10 seconds, the judgment is carried out according to the loudness of each frame and the number of each frame forming the snore segment, if a preset snore threshold value is set, when the loudness of a certain frame is larger than the preset snore threshold value, the frame is considered as the snore frame, and if the frame numbers of multiple frames of the snore frame are continuous, the multiple frames of the snore frame are determined as the continuous snore segment. One or more continuous frames with loudness smaller than the preset snore threshold value form a snore interval section, namely an interval between one snore and another snore, and the interval is more than 10 seconds, namely the suspected apnea section.

And step S160, determining the effectiveness of the suspected apnea fragment according to the loudness of each frame of the suspected apnea fragment, the linear spectrum energy and a preset rule.

And determining the suspected apnea fragment, judging the validity of the suspected apnea fragment, if the suspected apnea fragment is invalid, determining that apnea does not occur, and if the suspected apnea fragment is valid, determining that apnea occurs.

In some embodiments of the present application, the effectiveness of an apnea fragment may use the following 3 conditions, but is not limited to these 3 conditions:

firstly, loudness and spectral energy of 0-1K linear frequency points and two conditions are used for finding out a sound segment with small fluctuation of loudness or spectral energy of 0-1K. Setting the loudness of background noise plus a smaller loudness threshold value as a loudness fluctuation threshold value; finding out that the variance of the frequency spectrum energy sum of the 0-1K linear frequency points is smaller than a preset threshold value within 5 seconds as the frequency spectrum energy sum of the 0-1K linear frequency points of background noise, and superposing the preset threshold value as the frequency spectrum energy sum of the 0-1K linear frequency points and the upper limit threshold value of stable background noise; one of more than 2 conditions is identified as a voiced frame; forming a voiced fragment by continuous 3 suspected voiced frames;

first, a suspected apnea fragment is invalid if it is found that there is sound of the body moving in the bed. In the method, the frequency spectrum energy of the linear frequency points of the initial frame of the sound segment forms a 1-dimensional array, the frequency spectrum energy of the linear frequency points of the sound frames in the subsequent sound segment is continuously superposed, and finally the superposed frequency spectrum energy is divided by the number of the sound frames to obtain the frequency spectrum energy of the linear frequency points of the average single frame of the sound segment. If the spectral energy fluctuation of each linear frequency point of 0-8K is smaller than the preset threshold of white noise fluctuation, and the time length of the sound fragment exceeds more than 2 seconds, recognizing the sound as the sound of the body moving on the bed;

secondly, the time interval of the sound segment accords with the breathing characteristics obtained by continuous snore statistics, namely the interval is in the range of the minimum breathing interval and the maximum breathing interval, and meanwhile, the time length of the sound segment is in the range of the preset breathing length. If the condition is met, the suspected apnea segment is invalid;

third, a preset rule: the suspected apnea is at or above 10 seconds long, otherwise invalid.

It should be noted that the above description is only an example, and in order to improve the detection accuracy, the detection rule of the validity of the suspected apnea fragment may be made stricter, and the present application is not limited thereto.

It can be seen from the method shown in fig. 1 that, in the present application, the background noise loudness of the current environment is determined according to the loudness of each frame constituting an audio signal, segments above the background noise loudness are marked as voiced segments, further, based on a neural network snore prediction model, snore segments are identified from the voiced segments, suspected apnea segments are identified in snore segment intervals, and whether breathing stops is determined according to the loudness of each frame, linear spectral energy and a preset rule. According to the method, the snore fragments are accurately identified through the neural network snore forecasting model, the snore intervals are further identified in the snore fragments, whether the user breathes smoothly is determined through monitoring the snore intervals, a solid foundation is provided for monitoring the health state of the user, and the method is strict in logic, small in calculated amount and high in identification accuracy.

In some embodiments of the present application, in the above method, determining the background noise loudness from the loudness of each frame comprises: intercepting a background noise sample segment from an audio signal according to a preset time length; determining the loudness of each frame constituting a background noise sample segment; and determining the loudness mean and the loudness variance of the background noise sample segment according to the loudness of each frame forming the background noise sample segment. And comparing the loudness variance with a preset upper threshold of the background noise variance, and taking the loudness mean of the background noise sample segment as the loudness of the background noise under the condition that the loudness variance is smaller than the preset upper threshold of the background noise variance. When the background noise loudness is determined, if the audio signal is taken as a full sample, the calculation amount is very large, and because the sound of people somntalking or snoring usually does not continue the whole sleeping process, data with a preset duration ahead at the current moment can be intercepted in the audio signal to be used as a background noise sample segment, so that the method can adapt to the environment with constantly changing background noise, and dawn cell noise is a common phenomenon compared with late-night cell noise.

Specifically, a background noise sample segment is intercepted from the audio signal according to a preset duration, and a nearest audio portion is intercepted, for example, the audio signal within a latest continuous 5s duration is intercepted as the background noise sample segment.

The prior art can be referred for determining the background noise, or the method recommended by the application can be adopted, specifically, the loudness of each frame forming the background noise sample segment is determined, the loudness mean value and the loudness variance of the whole background noise sample segment are determined according to the loudness of each frame, the variance in statistics is the mean of the square value of the difference between each sample value and the mean value of the whole sample values, and the variance can represent the smoothness. The loudness variance of the background noise sample segment as a whole can be calculated from the definition of variance and the loudness of each frame.

And under the condition that the loudness variance is smaller than a preset upper limit threshold of the background noise variance, the background noise sample segment is considered to have no sound segments such as snores, sleeptalking and the like and only have background noise, and further, the loudness mean value of the background noise sample segment is used as the loudness of the background noise. The mean and variance are definitions in the general sense, and the loudness mean and loudness variance calculation method refers to the prior art.

And under the condition that the loudness variance is larger than a preset upper limit threshold of the background noise variance, considering that the background noise sample segment has other sounds except the background noise, discarding the background noise sample segment, and re-intercepting.

In some embodiments of the present application, in the method above, determining the voiced segments according to the loudness of each frame and the background noise loudness includes: determining a frame as a starting frame of the voiced segment under the condition that the loudness of the frame is greater than the sum of the background noise loudness and the fluctuation loudness; determining a first frame of the continuous frames as a cut-off frame of the voiced segment under the condition that the loudness of a preset number of continuous frames is less than the sum of the background noise loudness and the fluctuation loudness; each frame between the start frame and the end frame is taken as a voiced segment.

For the identification of the voiced segment, the identification can be realized by a start frame and a stop frame of the voiced segment, in order to improve the detection accuracy, in this embodiment, a fluctuation loudness is added on the basis of the background noise loudness, and the fluctuation loudness can be 4-6 db, and in the case that one or more frames in front of a certain frame do not meet the determination rule of the voiced segment, when the loudness of the frame is greater than the sum of the background noise loudness and the fluctuation loudness, the frame is considered as the start frame of the voiced segment; for the judgment of the cut-off frame, it may be determined that the first frame of the consecutive frames is the cut-off frame of the voiced segment when the loudness of the preset number of consecutive frames is less than the sum of the background noise loudness and the fluctuation loudness, for example, if the loudness of the consecutive 3 frames is less than the sum of the background noise loudness and the fluctuation loudness, the voiced segment is considered to be cut off, and the first frame of the consecutive 3 frames is used as the cut-off frame of the voiced segment. Finally, all frames between the start frame and the end frame are taken as voiced segments. In some embodiments of the present application, in the method, the step of inputting the energy of mel spectrum of each frame into a neural network snore predicting model, and the step of identifying the snore fragment from the sound fragments comprises: inputting the Mel frequency spectrum capability of each frame forming the sound fragment into the neural network snore prediction model as an input value to obtain the snore probability value of each frame; and under the condition that the snore probability value is greater than a preset probability threshold value, setting the frame as a snore frame, wherein the sound fragment containing the snore frame is marked as a snore fragment.

When the snore fragments are identified, the input value is the Mel frequency spectrum energy of each frame forming the sound fragments through the neural network snore forecasting model, and the output value is the snore probability value of each frame of the sound fragments.

And comparing the snore probability value with a preset probability threshold, if the preset probability threshold is set to be 0.35, marking the frame as a snore frame and marking the sound fragment containing the snore frame as a suspected snore fragment under the condition that the snore probability value is greater than or equal to the preset probability threshold.

For the sound fragments which are not marked as suspected snore fragments, that is, the sound fragments which do not contain snore frames, although the sound fragments are not marked as the snore fragments by the neural network snore prediction model, in order to improve the detection accuracy, the embodiment further detects and extracts the snore by combining the rhythm characteristics of the snore and the mean linear spectrum energy similarity characteristics. Firstly, continuous suspected snore fragments are determined from the suspected snore fragments, specifically, the identification can be carried out according to whether the adjacent snores accord with the breathing characteristics, if the frame number of the starting frame of the previous snore fragment and the frame number of the starting frame of the next snore fragment do not exceed the preset threshold value, for example, 10 seconds, the continuous suspected snore fragments can be considered.

Under the condition that the continuous suspected snore fragments meet a preset snore rule, marking the continuous suspected snore fragments as the snore fragments, wherein the preset snore rule is as follows: the distance between adjacent suspected snore starting frames is within a preset threshold value of breathing, such as 2-7 seconds.

For the sound segment which is not identified as suspected snore, in order to improve the detection accuracy, a preset snore rule is adopted to further detect and extract the snore.

The preset snore rule can be understood as two sub-snore rules, which are recorded as a snore rule I and a snore rule II, wherein the snore rule I represents the rhythm characteristics of snore, and the specific content is that if the time of the initial inter-frame distance frame of the continuous sound fragment is in a preset time length range, the range is a normal range of human breathing, the preset time length range can be but is not limited to 2-7s, and 3 continuous time intervals are kept stable, if the variance of the time intervals is less than 1 second, the snore rule I is met, otherwise, the snore rule I is not met.

And if the characteristic frequency points of the 4 continuous sound segments are consistent, the snore rule II is satisfied, otherwise, the snore rule II is not satisfied.

In some embodiments of the present application, the determining validity of the suspected apnea fragment according to the loudness of each frame of the suspected apnea fragment, the linear spectrum energy, and a preset rule includes:

and determining a suspected action segment according to the loudness and the linear spectrum energy of each frame of the suspected apnea segment, wherein the loudness of each frame forming the suspected action segment is greater than a first preset loudness threshold value, meanwhile, the frame mean linear spectrum energy is kept stable in the full frequency band, and under the condition that the duration of the suspected action segment is greater than a first preset duration threshold value, the suspected apnea segment is determined to be marked as invalid.

When people snore, actions such as turning around and moving occur, and the snoring is stopped when actions occur, so that the influence of the actions is preferably eliminated when the effectiveness of the suspected apnea fragment is detected. Specifically, the frequency spectrum energy of the linear frequency points of the initial frame of the sound segment is formed into a 1-dimensional array, the frequency spectrum energy of the linear frequency points of all frames in the sound segment is superposed, and finally, the frequency spectrum energy of the linear frequency points of the average single frame of the sound segment is obtained by dividing the frequency spectrum energy of the linear frequency points of all frames in the sound segment by the number of the sound frames. And if the spectral energy fluctuation of each linear frequency point of 0-8K is smaller than the white noise fluctuation preset threshold value and the time length of the sound segment exceeds more than 2 seconds, identifying that the body moves on the bed instead of apnea.

In some embodiments of the present application, the determining validity of the suspected apnea fragment according to the loudness of each frame of the suspected apnea fragment and a preset rule further includes: and determining the suspected breath segment according to the loudness of each frame of the suspected apnea segment, wherein the loudness of each frame forming the suspected breath segment is greater than a second preset loudness threshold value, and the time interval of the voiced segment meets the respiratory characteristics obtained by continuous snore statistics, namely the interval meets the requirements in the minimum and maximum respiratory interval ranges, and meanwhile, the time length of the voiced segment is in the respiratory preset length range. If a condition is satisfied, determining that the suspected apnea fragment is marked as invalid.

Further, in the process of snoring, people may suddenly stop snoring due to actions, environmental changes and the like to enter a normal breathing state, and in this case, people can breathe uniformly with small breathing sound, so that the influence of entering the normal breathing state by people needs to be eliminated.

Similarly, it may also be implemented by setting a suitable second preset loudness threshold, where it should be noted that the value of the second preset loudness threshold is relatively small, such as 4 db, and the second preset loudness threshold is smaller than the first preset loudness threshold. And forming a suspected breath segment by using frames with loudness greater than the second preset loudness threshold, and further determining that the suspected apnea segment is marked as invalid when the duration of the suspected breath segment is less than the second duration threshold.

The second duration threshold may be calculated according to the snoring rule of the user, for example, in the statistical snore fragment, the non-continuous snore frame, that is, the duration of the snore interval, is obtained as the minimum duration and the maximum duration, and is obtained by weighted average according to the minimum duration and the maximum duration.

Fig. 2 shows an apnea determining apparatus according to an embodiment of the present application, and as can be seen from fig. 2, the apparatus 200 includes:

an obtaining unit 210, configured to obtain an audio signal, and determine loudness, linear spectral energy, and mel spectral energy of each frame in the audio signal;

a first identifying unit 220, configured to determine a background noise loudness according to the loudness of each frame, and determine a voiced segment according to the loudness of each frame and the background noise loudness;

a second identifying unit 230, configured to input the neural network snore predicting model according to the mel spectrum energy of each frame, and identify a snore fragment from the voiced fragments;

a third identifying unit 240, configured to identify consecutive snore fragments and snore fragment intervals, and extract normal snore interval features according to the consecutive snore fragments and the snore fragment intervals, where the normal snore interval features include an average snore interval, a minimum snore interval, and a maximum snore interval;

a fourth identification unit 250 for determining an apnea threshold according to the definition of medical apnea; determining snore fragment intervals larger than an apnea threshold value as suspected apnea fragments;

the determining unit 260 is configured to determine the validity of the suspected apnea fragment according to the loudness of each frame of the suspected apnea fragment, the linear spectrum energy, and a preset rule.

In some embodiments of the present application, in the above apparatus, the obtaining unit 210 is configured to acquire an audio signal at a preset frequency; performing framing, frame shifting and short-time Fourier transform on the audio signal to obtain linear spectrum energy of each frame, and performing Mel spectrum transform on the linear spectrum energy to obtain Mel spectrum energy; and determining the loudness of each frame according to the microphone sound pressure signal data obtained by sampling. .

In some embodiments of the present application, in the above apparatus, the apparatus is configured to intercept a background noise sample segment in the audio signal according to a preset time length; determining the loudness of frames that make up the background noise sample segment; determining the loudness mean and the loudness variance of the background noise sample segment according to the loudness of each frame forming the background noise sample segment; and comparing the loudness variance with a preset snore threshold value, and taking the loudness mean value of the background noise sample segment as the background noise loudness when the loudness variance is smaller than the preset snore threshold value.

In some embodiments of the present application, in the above apparatus, the first identifying unit 220 is configured to determine that a frame is a starting frame of the voiced segment if the loudness of the frame is greater than the sum of the background noise loudness and the fluctuation loudness; determining a first frame of the consecutive frames as a cut-off frame of the voiced segment under the condition that the loudness of a preset number of consecutive frames is greater than the sum of the background noise loudness and the fluctuation loudness; and taking the amplitude spectrum of each frame between the starting frame and the ending frame as a sound segment.

In some embodiments of the present application, in the above apparatus, the second identifying unit 230 is configured to input mel spectrum energy of each frame constituting the voiced sound segment as an input value to the neural network snore predicting model, so as to obtain a snore probability value of each voiced frame; and under the condition that the snore probability value is greater than or equal to a preset probability threshold value, marking the sound frame as a snore frame, and marking sound fragments containing the snore frame as snore fragments.

In some embodiments of the present application, in the above apparatus, the second identifying unit 230 is further configured to determine consecutive suspected snore fragments from the snore fragments; under the condition that the continuous suspected snore fragments meet a preset snore rule, marking the continuous suspected snore fragments as snore fragments, wherein the preset snore rule is as follows: the time of the initial frame interval of the adjacent snore fragments in the continuous suspected snore fragments is in a preset time length range; in addition, in order to improve the detection accuracy, the snore in the continuous sound segments is further detected and extracted by combining the rhythm characteristics of the snore and the mean linear spectrum energy similarity characteristics, except for the preset snore rule, the local maximum values of the single-frame mean linear spectrum energy of all the sound segments forming the continuous sound segments are completely consistent, namely the mean linear spectrum energy similarity characteristics are met, and the sound segments are marked as the snore segments.

In some embodiments of the present application, in the above apparatus, the determining unit 250 is configured to determine the suspected apnea fragment as a suspected body moving fragment in bed if the loudness and linear spectrum energy of each frame of the suspected apnea fragment satisfy a first preset rule, where the first preset rule is: the loudness of each frame forming the suspected body moving segment on the bed is greater than the sum of the background noise loudness and a first preset loudness threshold, and the energy distribution of the linear frequency spectrum energy in the whole frequency band is uniform; determining that the flag of the suspected apnea fragment is invalid if the length of time that the suspected body moves the fragment in bed is greater than a first preset length threshold.

In some embodiments of the present application, in the above apparatus, the determining unit 250 is configured to determine the suspected apnea fragment as the suspected apnea fragment if the loudness and the linear spectrum energy of each frame of the suspected apnea fragment satisfy a second preset rule, where the second preset rule is: the loudness of each frame forming the suspected breathing segment is greater than the sum of the background noise loudness and a second preset loudness threshold, or the sum of the spectral energy of each linear frequency point of 0-1K is greater than the spectral energy of each linear frequency point of 0-1K of background noise and the sum of the spectral energy of each linear frequency point of the first preset 0-1K;

and under the condition that the duration of the suspected respiration segment is smaller than a second duration threshold, and the interval time between the starting frames of the characteristics of the adjacent suspected respiration segments is in a second preset duration range, determining that the suspected apnea segment is marked as invalid.

It can be understood that the above-mentioned apparatus can implement the steps of the method provided in the foregoing embodiments, and the explanations regarding the method are applicable to the apparatus, and are not described herein again.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 3, at a hardware level, the electronic device includes a processor, and optionally further includes an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 3, but this does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the apnea judging device on a logic level. And a processor for executing the program stored in the memory.

The method performed by the apnea determining apparatus disclosed in the embodiment of fig. 3 may be implemented in or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads signals in the memory and combines hardware thereof to complete the steps of the method.

The electronic device may further execute the method executed by the apnea determining apparatus in fig. 3, and implement the function of the apnea determining apparatus in the embodiment shown in fig. 3, which is not described herein again.

The present application also provides a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the method performed by the apnea determining apparatus in the embodiment shown in fig. 3.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement signal storage by any method or technology. The signals may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store signals that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method for determining apnea, comprising:

2. The method of claim 1, wherein obtaining the audio signal and determining the loudness, the linear spectral energy, and the mel-frequency spectral energy of each frame in the audio signal comprises:

collecting audio signals at a preset frequency;

performing framing, frame shifting and short-time Fourier transform on the audio signal to obtain linear spectrum energy of each frame, and performing Mel spectrum transform on the linear spectrum energy to obtain Mel spectrum energy;

and the combination of (a) and (b),

and determining the loudness of each frame according to the microphone sound pressure signal data obtained by sampling.

3. The method of claim 1, wherein determining the background noise loudness from the loudness of the frames comprises:

intercepting a background noise sample segment from the audio signal according to a preset time length;

determining the loudness of frames that make up the background noise sample segment;

determining the loudness mean and the loudness variance of the background noise sample segment according to the loudness of each frame forming the background noise sample segment;

and comparing the loudness variance with a preset stationary noise upper limit threshold, and taking the loudness mean value of the background noise sample segment as the background noise loudness when the loudness variance is smaller than the preset stationary noise upper limit threshold.

4. The method of claim 1, wherein determining the voiced segments based on the loudness of each frame and the background noise loudness comprises:

under the condition that the loudness of a frame is greater than the sum of the background noise loudness and the fluctuation loudness, determining the frame as a starting frame of the voiced segment;

determining a first frame of the consecutive frames as a cut-off frame of the voiced segment under the condition that the loudness of a preset number of consecutive frames is less than the sum of the background noise loudness and the fluctuation loudness;

and taking each frame between the starting frame and the ending frame as a sound segment.

5. The method of claim 1, wherein extracting the mel-frequency spectrum energy of each frame and inputting the mel-frequency spectrum energy into a neural network snore predicting model, and identifying a snore fragment from the snore fragments comprises:

inputting Mel frequency spectrum energy of each frame constituting the sound fragment as input value into the neural network snore prediction model to obtain snore probability value of each frame, and marking the frame higher than the snore probability threshold value as snore frame;

the voiced segments containing the snore frames are labeled as snore segments.

6. The method of claim 5, wherein said inputting into a neural network snore predicting model based on the mel-frequency spectrum energy of each frame, identifying a snore segment from said snore segments further comprises:

determining continuous suspected snore fragments from the voiced fragments;

under the condition that the continuous suspected snore fragments meet a preset snore rule, marking the continuous suspected snore fragments as snore fragments, wherein the preset snore rule is as follows: the interval time between the starting frames of the adjacent continuous suspected snore fragments is in a first preset time length range.

7. The method of claim 6, wherein said inputting into a neural network snore predicting model based on the mel-frequency spectrum energy of each frame, identifying a snore segment from said snore segments further comprises:

determining normally adjacent continuous suspected snore fragments according to the sound fragments and the snore fragments;

and determining the average value, the minimum value and the maximum value of the interval between the starting frames of the normally adjacent continuous suspected snore fragments, and storing the average value, the minimum value and the maximum value as personalized breathing characteristic data.

8. The method of claim 1, wherein determining the validity of the suspected apnea fragment according to the loudness of each frame of the suspected apnea fragment, the linear spectral energy, and a predetermined rule comprises:

under the condition that the loudness and linear spectrum energy of each frame of the suspected apnea fragment meet a first preset rule, determining that the suspected apnea fragment is a suspected body moving fragment on a bed, wherein the first preset rule is as follows: the loudness of each frame forming the suspected body moving segment on the bed is greater than the sum of the background noise loudness and a first preset loudness threshold, and the energy distribution of the linear frequency spectrum energy in the whole frequency band is uniform;

determining that the flag of the suspected apnea fragment is invalid if the length of time that the suspected body moves the fragment in bed is greater than a first preset length threshold.

9. The method of claim 1, wherein determining the validity of the suspected apnea fragment according to the loudness of each frame of the suspected apnea fragment, the linear spectral energy, and a predetermined rule further comprises:

under the condition that the loudness and the linear spectrum energy of each frame of the suspected apnea fragment meet a second preset rule, determining that the suspected apnea fragment is the suspected apnea fragment, wherein the second preset rule is as follows: the loudness of each frame forming the suspected breathing segment is greater than the sum of the background noise loudness and a second preset loudness threshold, or the sum of the spectral energy of each linear frequency point of 0-1K is greater than the spectral energy of each linear frequency point of 0-1K of background noise and the sum of the spectral energy of each linear frequency point of the first preset 0-1K;

10. An apnea judging device, comprising: