CN111631688B

CN111631688B - Algorithm for automatic sleep staging

Info

Publication number: CN111631688B
Application number: CN202010591697.XA
Authority: CN
Inventors: 刘铁军; 王林; 吕彬; 范宇熊; 宋晓宇; 郜东瑞; 尧德中
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2021-10-29
Anticipated expiration: 2040-06-24
Also published as: CN111631688A

Abstract

The invention discloses an algorithm for automatically staging sleep, which comprises the following steps: a. the feature layer extracts abstract features by using a multilayer perceptron and combines the traditional manual features based on expert experience to be used as information representation of sleep; b. then, a Bi-directional gating circulation unit Bi-GRU is used as a network model in the machine model layer; c. and finally, using a conditional random field CRF method as time continuity correction in a correction layer. The method solves the problems of poor stability, universality and practicability of the existing sleep automatic staging algorithm.

Description

Algorithm for automatic sleep staging

Technical Field

The invention relates to the field of sleep algorithms, in particular to an algorithm for automatically staging sleep.

Background

Different sleep time periods can be divided into five normal categories of WAKE, REM, N1, N2 and N3 periods according to international sleep medical standards by using physiological information during the sleep process of a person.

In recent years, a method of monitoring a sleep state of a person and automatically classifying the same using a computer device and a program has come into wide use. Although modern electronic information technology, machine learning theory, biomedical engineering and other aspects are rapidly developed, the sleep automatic staging method utilizing the machine learning theory still has no internationally recognized standard in the fields of scientific research, medical application, consumer electronics and the like. The main reasons include the lack of thorough and complete theory of the underlying mechanisms of human sleep, the lack of sufficient understanding and trust of clinicians and researchers in the international standards for artificial staging of sleep, the low consistency of sleep state classifications, the lack of expert experience in the developers of sleep monitoring systems, the intermingling of abnormal sleep patterns in normal sleep patterns, etc.

All the automatic sleep staging algorithms relate to characteristic engineering, developers have difficulty in deeply understanding the physiological process of sleep with expert experience, and the expert knowledge mainly comes from a sleep technical instruction manual of the American sleep medical society and an artificial staging criterion obtained by groping in practice.

The sleep automatic staging algorithm relates to a neural network, actual data is difficult to fit by an algorithm model, an under-fit model condition is caused, and the accuracy of a judgment result of a machine is far lower than an expectation.

In general, existing systems for automatically staging sleep are not accurate enough to distinguish between normal sleep states. The existing machine learning algorithm based on feature engineering and the machine learning algorithm based on a neural network reach the accuracy rate of 60% to 90% in a small data range, the existing algorithm does not consider the time continuity by taking the physiological signal data segment of the whole night sleep as an independence hypothesis, and the stability, the universality, the practicability and the like of the whole automatic staging technology are urgently required to be optimized.

Disclosure of Invention

The invention aims to provide an automatic sleep staging algorithm, which solves the problems of poor stability, universality and practicability of the conventional automatic sleep staging algorithm.

In order to solve the technical problems, the invention adopts the following technical scheme:

an algorithm for sleep automatic staging, comprising the steps of: a. the feature layer extracts abstract features by using a multilayer perceptron and combines the traditional manual features based on expert experience to be used as information representation of sleep; b. then, a Bi-directional gating circulation unit Bi-GRU is used as a network model in the machine model layer; c. and finally, using a conditional random field CRF method as time continuity correction in a correction layer.

As a further preferred aspect of the present invention, the step a of extracting abstract features by using a multi-layer perceptron and combining traditional expert experience-based manual features as the information characterization of sleep includes the following steps: s1, making data set from physiological data collected during sleep, including EEG signal, eye electrical signal and mandible electromyographic signal, where C ═ C₁，C₂，...，C_NC represents the data set of all people, N represents the number of people (or the number of times of completing overnight sleep); s2, dividing each person' S continuous data into S segments in time sequence, each segment representing different sleep stage information, each segment needing to learn fitting with different algorithm models as time goes on, so each segment separately making a data set to generate S segment data sets, each segment data set containing N × M sample data, M being the number of sample points contained in each segment data, one sample point representing a sleep period, the time span of which is usually 30 seconds, each sample point containing L signal sampling points, so the data set sample size of C is S × N × M × L; and S3, respectively performing feature engineering on the S segment data sets, wherein the mode comprises the following steps of extracting abstract features: an auto-encoder; traditional features are extracted: the method comprises the following steps that time domain characteristics, frequency domain characteristics and nonlinear dynamics characteristics are spliced into vectors, M vectors are generated in each segment of data and are sequentially arranged to form a sequence, namely a segment sequence; the step b of utilizing the Bi-directional gating circulation unit Bi-GRU as the network model at the machine model layer comprises the following steps: s4, in the machine model layer, the Bi-directional gating circulation unit Bi-GRU network model is trained by using the segment sequences as samples, S segment data sets respectively comprise NxM segment sequences, each segment data set is divided into a training set and a testing set according to a certain proportion, and after the training of each segment data set is finished, the model is stored; s5 sample point input for trainingAfter entering a trained Bi-directional gating circulating unit Bi-GRU network model, finally obtaining a classification label of a sleep stage; s6, splicing the label sequences of S-segment samples sleeping at night into an ultra-long sequence, namely, the night data finally corresponds to a complete night label sequence, the night label sequence is a one-dimensional vector, the dimension of the one-dimensional vector is T (S multiplied by M), the sequence set containing N night labels is still divided into three sets according to the previous division, the training set contains C_tThe strip, the verification set contains C_vThe test set comprises C strips; the step c of using a conditional random field CRF method as time continuity correction in the correction layer comprises a step S7, wherein in the step S7, the label training set sequence obtained in the step S6 is input into the correction layer, the advantages of the conditional random field CRF method in the aspect of context information transfer extraction are used, specifically, a CRF linear chain method is used for modeling the sequence, an optimal label sequence path is decoded by a Viterbi algorithm, and an overnight label sequence is corrected to be continuously consistent with a label sequence of an expert artificial sleep stage judgment result.

As a further preferred aspect of the present invention, the step S1 further includes the following sub-steps: s11, monitoring, recording and storing human body physiological signal data according to American society for sleep medical Science (SOH) standard by a polysomnography device with a physiological signal acquisition function during human sleep; s12, after sampling and digitizing the original signal data of different types of physiological signals, respectively carrying out zero-phase digital filtering to prevent the physiological signals with non-stationary properties from phase distortion, removing extremely low frequency base lines, power frequency noise and high frequency noise in the signals, and completing signal preprocessing; s13, extracting the electro-ocular signal, the mandibular electromyographic signal and the electroencephalogram signal in the physiological signals, and carrying out original data set C ═ C by using a three-lead signal₁，C₂，...，C_NMaking N overnight sleep data from N individuals, and dividing each subset by person.

As a further preferred aspect of the present invention, the step S2 further includes the following sub-steps: s21, reasonably setting a segmentation stage mode, wherein single sleep data are continuous in time for one night, the whole continuous process is divided into S stages by the algorithm, physiological signals of segmented sections represent different information of the sleep process, the algorithm simulates the prejudgment experience of experts on the whole night signal data in a plurality of trend stages when the experts artificially sleep the stages, and the number M of sample points in each stage is set by the aid of how long the stage has; s22, preparing a specific data set for the early stage of an algorithm model, dividing S data sets on the basis of the original data set, wherein the data size cannot be wrong, each data set comprises N multiplied by M sample points from N individuals, each sample point is the minimum unit of sleep staging and has the time span of 30 seconds, one sample point corresponds to one category label, and the five category labels comprise WAKE, REM, N1, N2 and N3; and S23, aligning the label with the data, and creating and storing the check segment data set into a file. The above data set production is the key step of the present invention, and the result and performance of the whole algorithm are deeply influenced.

As a further preferred aspect of the present invention, the step S3 further includes the following sub-steps: s31, extracting traditional characteristics, and calculating characteristic vector x according to the signals_iThe dimension is k. Feature vector x_iThe m features of (1) include: the dynamic characteristic vector comprises a time domain characteristic vector, a frequency domain characteristic vector and a nonlinear dynamic characteristic vector, wherein the time domain characteristic vector comprises a statistical characteristic vector and a geometric characteristic vector. The frequency domain characteristic quantity comprises a power spectral density characteristic quantity and a time frequency characteristic quantity, the nonlinear dynamics characteristic quantity comprises a fractal dimension characteristic quantity and a complexity characteristic quantity, and each characteristic quantity is determined by respective parameters and a calculation mode; and S32, extracting abstract features. Abstract feature extraction is carried out through a self-encoder in the field of artificial neural networks, the unsupervised learning characteristic is utilized to fit the artificial neural networks capable of efficiently representing input data, no additional manual auxiliary work is added, the input signal data are efficiently represented by fixed low-dimensional vectors, namely self-encoding is carried out, the output dimension of the self-encoder is generally smaller than the input signal dimension, namely the self-encoder data dimension reduction characteristic. The invention adopts the self-encoder which is provided with a plurality of encoding layers, the complexity of encoding depends on the number of layers of the neural network stacking layers, the stacking layers are properly increased, and the input data can be effectively compressed and expressed; s33, onCarrying out feature engineering on each sample point, and splicing into a feature vector xi_i，(ξ_i∈R^kξ_i∈R^k) The sample points in each segment must be arranged into a segment sequence in time sequence, the corresponding eigenvectors are also arranged into a segment sequence seq, and actually, the whole eigenspace constructed by the segment sequence is a three-dimensional tensor X belonging to R^N×M×k。

As a further preferred aspect of the present invention, the step S4 further includes the following sub-steps: s41, dividing the tensor X of the feature space generated by seq into a final training data set, a verification data set and a test data set; s42, inputting the section sequence seq with time sequence into a Bi-directional gating circulation unit Bi-GRU network model, and generating tensor X of feature space by seq for R^N×M×kReasonably setting the structure, training mode and initial parameters of the network model, and then loading a data set to start the training of the model; and S43, storing the Bi-directional gating circulation unit Bi-GRU network model after the Bi-directional gating circulation unit Bi-GRU network model reaches a preset termination condition.

As a further preferred aspect of the present invention, the step S5 further includes the following sub-steps: s51, all data are transmitted forward by using the network model to obtain a label of a sleep classification stage corresponding to a data sample, wherein the data sample is a training set and a verification set generated by the data set; and S52, comparing the classification result labels of the machine models with the manual classification result labels of the experts, recording evaluation indexes such as accuracy, recall rate, F1 scores and the like of each data set, and completing construction of the network model.

As a further preferred aspect of the present invention, the step S7 further includes the following sub-steps: s71, constructing a conditional random field CRF model, inputting the training set of tag sequences in the step S6 and tag sequences corresponding to the artificial stages of experts into the conditional random field CRF model, setting the number K of feature functions, iteratively training optimal parameters, and further obtaining the conditional probability P (y | x) of the conditional random field, namely context information of time-dependent transition from model learning to sleep stage, wherein the stage transition is closely related to the sleep time, and the method is also beneficial to the characteristics of sleepA key step of probability transfer using a CRF model; s72, testing the corrected result of the CRF model, and utilizing the conditional probability P (y | x) and the label sequence x in the verification set_sTo calculate the optimal tag sequence y^*And finally, calculating evaluation indexes such as accuracy and the like. And (5) counting and comparing the results of the CRF correction model and the network model.

Compared with the prior art, the invention can at least achieve one of the following beneficial effects:

1. selecting a proper data segment for segmentation, so that the algorithm has stronger adaptability to each sleep stage;

2. traditional characteristics and abstract characteristics are fused, and the expression capability of the characteristics is amplified, so that the algorithm accuracy is higher;

3. the interpretation of the algorithm on the time continuity is effectively enhanced through the time-associated network and the probability correction.

Drawings

FIG. 1 is an overall block diagram of the algorithm of the present invention.

FIG. 2 is a schematic view of a data set generation process according to the present invention.

FIG. 3 is a schematic diagram of data set generation according to the present invention.

FIG. 4 is a sample structure of data according to the present invention.

Fig. 5 is a block diagram of a stacked self-encoder according to the present invention.

FIG. 6 is a schematic diagram of a first layer of a stacked self-encoder according to the present invention.

FIG. 7 is a diagram of a second layer of a stacked self-encoder according to the present invention.

FIG. 8 is a diagram of the output layer of a stacked self-encoder according to the present invention.

Fig. 9 is an overall schematic diagram of a stacked self-encoder according to the present invention.

FIG. 10 is a diagram of a Bi-GRU network model according to the present invention.

Fig. 11 shows the internal structure of a GRU node according to the present invention.

FIG. 12 is a schematic view of an overall machine model layer incorporating CRF corrections in accordance with the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Specific example 1:

fig. 1, fig. 2, fig. 3, fig. 4, fig. 5, fig. 6, fig. 7, fig. 8, fig. 9, fig. 10, fig. 11, and fig. 12 show an algorithm for sleep automatic staging, and as shown in fig. 1, the overall idea of the present invention is to refine the construction of a sleep automatic staging algorithm model step by step according to three processes of data set production, feature engineering, and model layering.

As shown in fig. 2, the data set production flow is divided into three subsections, data acquisition, data pre-processing and overnight data segmentation. First, a data set is created from physiological data collected during human sleep, including conventional physiological signals such as electroencephalogram (EEG), Electrooculogram (EOG), and mandibular Electromyogram (EMG). C ═ C₁，C₂，...，C_NC represents the data set of all people and N represents the number of people (or the number of times of completing overnight sleep).

Human body physiological signal data are monitored, recorded and stored according to the American society for sleep medicine standard by a polysomnography device with a physiological signal acquisition function during the sleep of a human body.

The original signal data of different kinds of physiological signals are digitized after sampling, and then zero-phase digital filtering is respectively carried out to prevent the physiological signals with non-stationary property from phase distortion. And removing extremely low-frequency baseline, power frequency noise and high-frequency noise in the signal, and finishing signal preprocessing.

Extracting the electro-oculogram signal, the mandible myoelectricity signal and the brain electricity signal in the physiological signals, and carrying out original data set C ═ C by using the three-lead signals₁，C₂，...，C_NAnd (6) making. N overnight sleep data from N individuals, each subset divided by person.

The structure of the data is visualized, as shown in fig. 3 and 4, the data of each person which is continuous overnight is divided into S segments according to the time sequence, each segment represents different sleep stage information, and each sleep data needs to be fitted by different algorithm models along with the time. Thus, each segment is individually made into a data set, resulting in S segment data sets, each segment data set containing N × M sample data, where M is the number of sample points contained in each segment of data. One sample point represents a sleep period, which is typically 30 seconds in time span. Each sample point contains L signal sampling points. The data set sample size for C is sxnxnxnxmxmx.

And reasonably setting the mode of the segmentation stage. The single overnight sleep data are continuous in time, the whole continuous process of the data is divided into S stages by the algorithm, and the physiological signals of the divided stages represent different information of the sleep process. The algorithm simulation expert can have a plurality of trend stages of prejudgment experience on the signal data of the whole night when the artificial sleep stage is divided. Further, how long a stage has is to set the number M of sample points per stage.

And preparing a specific data set for the early preparation of the algorithm model, and dividing S section data sets on the basis of the original data set. The data size cannot be wrong, each data set comprises NxM sample points from N persons, each sample point is a minimum unit of sleep stage, the time span is 30 seconds, and one sample point corresponds to one category label. There are five category labels, which include WAKE, REM, N1, N2, and N3.

And aligning the label with the data, and checking the segment data set to prepare and store the segment data set to a file. The above data set production is the key step of the present invention, and the result and performance of the whole algorithm are deeply influenced.

And respectively performing feature engineering on the S segment data sets, wherein the mode comprises the following steps of extracting abstract features: an auto-encoder; traditional features are extracted: time domain features, frequency domain features, and nonlinear dynamics features. All the features are spliced into vectors, M vectors are generated from each segment of data, and the vectors are sequentially arranged to form a sequence, namely a segment sequence.

In which conventional features are extracted. Computing a feature vector x from the signal_iThe dimension is k. Feature vector x_iThe m features of (1) include: time domain feature quantity, frequency domain feature quantity and nonlinear dynamics feature quantity. Time domain feature quantity packetContains statistical characteristic quantity and geometric characteristic quantity. The frequency domain feature quantity includes a power spectral density feature quantity and a time frequency feature quantity. The nonlinear dynamics characteristic quantity comprises a fractal dimension characteristic quantity and a complexity characteristic quantity. Each feature quantity is determined by respective parameters and calculation modes.

Wherein abstract features are extracted. Abstract feature extraction is performed by an auto-encoder (Autoencoders) in the field of artificial neural networks. And fitting an artificial neural network capable of efficiently representing input data by using the unsupervised learning characteristic of the artificial neural network. The method has the advantages that extra manual assistance work is not added, and the input signal data are effectively represented by fixed low-dimensional vectors, namely self-coding. The output dimension is generally smaller than the input signal dimension, i.e., the dimension reduction characteristic of the self-encoder data. The invention adopts a self-encoder (SA) which is provided with a plurality of encoding layers, the complexity of encoding depends on the number of the stacking layers of the neural network, the stacking layers are properly increased, and the input data can be effectively compressed and expressed.

The network structure principle of the stacked self-encoder is shown in fig. 5, 6, 7, 8 and 9, and abstract features are extracted.

Performing feature engineering on each sample point to splice feature vectors xi_i，(ξ_i∈R^k) The sample points in each segment have to be arranged in a sequence of segments in time order, as well as the corresponding feature vectors into a sequence of segments seq. In fact, the whole feature space constructed by the segment sequence is the three-dimensional tensor X epsilon R^N×M×k。

Through a data set manufacturing process and a characteristic project, the data set structure achieves the design idea of the invention, and fully embodies the time dependence of characteristic fusion and data.

As shown in fig. 10, at the machine model layer, the Bi-directional gated cyclic unit Bi-GRU network model is trained using the above segment sequences as samples, and the S segment data sets respectively include N × M segment sequences.

The tensor X of the feature space generated by the seq is divided into a final training data set, a verification data set and a test data set.

After cross validation, training of each section of data set is finished, and finally the model is stored.

And inputting the section sequence seq with the time sequence into a Bi-directional gating circulation unit Bi-GRU network model. Tensor X of seq generated eigenspace belongs to R^N×M×kShould meet the Bi-GRU input layer requirements. Reasonably setting the structure, the training mode and the initial parameters of the network model, and then loading the data set to start the training of the model.

And the Bi-directional gating circulation unit Bi-GRU network model is stored after reaching a preset termination condition.

And transmitting all data forward by using the network model to obtain the label of the sleep classification stage corresponding to the data sample. Wherein the data samples are a training set and a validation set generated by the data sets.

And comparing the classification result labels of the machine model with the manual classification result labels of the experts, recording evaluation indexes such as accuracy, recall rate, F1 scores and the like of each data set, and completing the construction of the network model.

And inputting the sample points for training into the Bi-directional gating circulation unit Bi-GRU network model which is trained, and finally obtaining the classification labels of the sleep stage.

And splicing the label sequences of S section samples sleeping at night into an overlong sequence, namely, a whole night data finally corresponds to a complete whole night label sequence. The overnight tag sequence is a one-dimensional vector with dimensions T ═ sxm. The sequence set containing N overnight tags is still divided into three sets according to the previous division, and the training set contains C_tThe strip, the verification set contains C_vBars, test set contains C bars.

And inputting the label training set sequence into a correction layer, modeling the sequence by using a Conditional Random Field (CRF) linear chain method by utilizing the advantages of the CRF method in the aspect of context information transfer extraction, and decoding an optimal label sequence path by using a Viterbi algorithm. And correcting the overnight label sequence to make the overnight label sequence continuously coincide with the label sequence of the judgment result of the expert artificial sleep stage.

And (4) constructing a conditional random field CRF model. Inputting the training set of tag sequences in the step S6 and the tag sequences corresponding to the expert manual staging into a conditional random field CRF model, setting the number K of feature functions, iteratively training out optimal parameters, and further obtaining a conditional probability P (y | x) of the conditional random field, that is, context information of the model learning to sleep stage time dependency transfer, stage transfer and sleep time are closely related, which is also a key step of the present invention in utilizing the probability transfer of the CRF model for sleep characteristics.

And testing the corrected result of the CRF model. Using the conditional probability P (y | x) and the tag sequence x in the verification set_sTo calculate the optimal tag sequence y^*. And finally, calculating evaluation indexes such as accuracy and the like. And (5) counting and comparing the results of the CRF correction model and the network model.

Although the invention has been described herein with reference to a number of illustrative embodiments thereof, it should be understood that numerous other modifications and embodiments can be devised by those skilled in the art that will fall within the spirit and scope of the principles of this disclosure. More specifically, various variations and modifications are possible in the component parts and/or arrangements of the subject combination arrangement within the scope of the disclosure, the drawings and the appended claims. In addition to variations and modifications in the component parts and/or arrangements, other uses will also be apparent to those skilled in the art.

Claims

1. An algorithm for automatic sleep staging, characterized by: the method comprises the following steps: a. the feature layer extracts abstract features by using a multilayer perceptron and combines the traditional manual features based on expert experience to be used as information representation of sleep; b. then, a Bi-directional gating circulation unit Bi-GRU is used as a network model in the machine model layer; c. finally, a conditional random field CRF method is used as time continuity correction in a correction layer;

the step a of extracting abstract features by using a multilayer perceptron and combining traditional manual features based on expert experience as the information representation of sleep comprises the following steps: s1, making a data set from physiological data acquired during sleep of a person, wherein the data set comprises an electroencephalogram signal, an electro-oculogram signal and a mandibular electromyogram signal, C is { C1, C2,.., C N }, C represents a data set of all persons, and N represents the number of persons; s2, dividing each person' S continuous data into S segments in time sequence, each segment representing different sleep stage information, each segment needing to learn fitting with different algorithm models as time goes on, so each segment separately making a data set to generate S segment data sets, each segment data set containing N × M sample data, M being the number of sample points contained in each segment data, one sample point representing a sleep period, the time span of which is usually 30 seconds, each sample point containing L signal sampling points, so the data set sample size of C is S × N × M × L; and S3, respectively performing feature engineering on the S segment data sets, wherein the mode comprises the following steps of extracting abstract features: an auto-encoder; traditional features are extracted: the method comprises the following steps that time domain characteristics, frequency domain characteristics and nonlinear dynamics characteristics are spliced into vectors, M vectors are generated in each segment of data and are sequentially arranged to form a sequence, namely a segment sequence; the step b of utilizing the Bi-directional gating circulation unit Bi-GRU as the network model at the machine model layer comprises the following steps: s4, in the machine model layer, the Bi-directional gating circulation unit Bi-GRU network model is trained by using the segment sequences as samples, S segment data sets respectively comprise NxM segment sequences, each segment data set is divided into a training set and a testing set according to a certain proportion, and after the training of each segment data set is finished, the model is stored; s5, inputting the sample points for training into the Bi-directional gating circulation unit Bi-GRU network model which is trained, and finally obtaining a classification label of the sleep stage; s6, splicing the tag sequences of S sections of samples sleeping at night into an ultra-long sequence, that is, a night data corresponds to a complete night tag sequence at last, where the night tag sequence is a one-dimensional vector with a dimension T ═ sxm, the sequence set containing N night tags is still divided into three sets according to the previous division, the training set contains Ct strips, the verification set contains Cv strips, and the test set contains Cc strips; the step c of using a conditional random field CRF method as time continuity correction in the correction layer comprises a step S7, wherein in the step S7, the label training set sequence obtained in the step S6 is input into the correction layer, the advantages of the conditional random field CRF method in the aspect of context information transfer extraction are used, specifically, a CRF linear chain method is used for modeling the sequence, an optimal label sequence path is decoded by a Viterbi algorithm, and an overnight label sequence is corrected to be continuously consistent with a label sequence of an expert artificial sleep stage judgment result.

2. The algorithm for sleep automatic staging according to claim 1, characterized in that: the step S1 further includes the following sub-steps: s11, monitoring, recording and storing human body physiological data according to American society for sleep medical Science (SOD) standard by a polysomnography device with a physiological signal acquisition function during human sleep; s12, after sampling and digitizing the original signal data of different types of physiological signals, respectively carrying out zero-phase digital filtering to prevent the physiological signals with non-stationary properties from phase distortion, removing extremely low frequency base lines, power frequency noise and high frequency noise in the signals, and completing signal preprocessing; s13, extracting the electro-ocular signal, the mandibular electromyographic signal and the electroencephalographic signal from the physiological signals, making an original data set C ═ { C1, C2 … C N } by using the three-lead signals, dividing the N overnight sleep data from the N individuals into subsets according to the individual.

3. The algorithm for sleep automatic staging according to claim 1, characterized in that: the step S2 further includes the following sub-steps: s21, reasonably setting a segmentation stage mode, wherein single overnight sleep data are continuous in time, the whole continuous process of the single overnight sleep data is divided into S stages by an algorithm, physiological signals of segmented stages represent different information of the sleep process, the algorithm simulates the prejudgment experience of experts on overnight signal data when the experts artificially sleep stages, and the number M of sample points of each stage is set according to how long a stage has; s22, preparing a specific data set for the early stage of an algorithm model, dividing S data sets on the basis of an original data set, wherein the data size cannot be wrong, each data set comprises N multiplied by M sample points from N individuals, each sample point is the minimum unit of sleep staging, the time span of each sample point is 30 seconds, one sample point corresponds to one category label, the category labels are five, and the category labels comprise WAKE, REM, N1, N2 and N3; and S23, aligning the label with the data, and creating and storing the check segment data set into a file.

4. The algorithm for sleep automatic staging according to claim 1, characterized in that: the step S3 further includes the following sub-steps: s31, extracting traditional features, calculating the dimension k of a feature vector x i according to the signals, wherein m features in the feature vector x i comprise: the dynamic characteristic quantity calculation method comprises the following steps of time domain characteristic quantity, frequency domain characteristic quantity and nonlinear dynamic characteristic quantity, wherein the time domain characteristic quantity comprises statistical characteristic quantity and geometric characteristic quantity, the frequency domain characteristic quantity comprises power spectral density characteristic quantity and time frequency characteristic quantity, the nonlinear dynamic characteristic quantity comprises fractal dimension characteristic quantity and complexity characteristic quantity, and each characteristic quantity is determined by respective parameters and calculation modes; s32, abstract features are extracted, the self-encoder in the field of artificial neural networks is used for extracting the abstract features, the unsupervised learning characteristics are utilized to fit the artificial neural networks capable of efficiently representing input data, no additional manual auxiliary work is added, the input signal data are efficiently represented by fixed low-dimensional vectors, namely self-encoding, the output dimension of the self-encoder is generally smaller than the input signal dimension, namely the dimension reduction characteristics of the self-encoder data, the self-encoder is adopted, the self-encoder has a plurality of encoding layers, the encoding complexity depends on the number of the stacking layers of the neural networks, the stacking number is properly increased, and the input data can be effectively compressed and represented; s33, performing feature engineering on the sample points to splice feature vectors xi i, (xi i belongs to R k) the sample points in each segment must be arranged into a segment sequence according to the time sequence, the corresponding feature vectors are also arranged into a segment sequence seq, and actually, the whole feature space constructed by the segment sequence is a three-dimensional tensor X belongs to R^{N× M × k}。

5. The algorithm for sleep automatic staging according to claim 1, characterized in that: the step S4 further includes the following sub-steps: s41, dividing the three-dimensional tensor X of the feature space generated by seq into a final training data set, a verification data set and a test data set; s42, inputting the sequence seq with time sequence into the bidirectional gating circulation unit Bi-GRU network model, and tensor X of feature space generated by seq belongs to R^{N× M × k}Reasonably setting the structure, the training mode and the initial parameters of a network model when the requirement of a Bi-GRU input layer is met, and then loading a data set to start the training of the model; and S43, storing the Bi-directional gating circulation unit Bi-GRU network model after the Bi-directional gating circulation unit Bi-GRU network model reaches a preset termination condition.

6. The algorithm for sleep automatic staging according to claim 1, characterized in that: the step S5 further includes the following sub-steps: s51, all data are transmitted forward by using the network model to obtain a label of a sleep classification stage corresponding to a data sample, wherein the data sample is a training set and a verification set generated by the data set; and S52, comparing the classification result labels of the machine models with the manual classification result labels of the experts, recording evaluation indexes of each section of data set, including accuracy, recall rate and F1 score, and completing construction of the network model.

7. The algorithm for sleep automatic staging according to claim 1, characterized in that: the step S7 further includes the following sub-steps: s71, constructing a conditional random field CRF model, inputting the tag sequence training set in the step S6 and tag sequences corresponding to the manual staging of experts into the conditional random field CRF model, setting the number K of feature functions, and iteratively training optimal parameters to further obtain the conditional probability P (y | x) of the conditional random field; and S72, testing the corrected result of the CRF model, calculating the optimal tag sequence y by using the conditional probability P (y | x) and the tag sequence x S in the verification set, finally calculating evaluation indexes including accuracy, and statistically comparing the results of the CRF correction model and the network model.