CN114386454B - Medical time sequence signal data processing method based on signal mixing strategy - Google Patents

Medical time sequence signal data processing method based on signal mixing strategy Download PDF

Info

Publication number
CN114386454B
CN114386454B CN202111498955.0A CN202111498955A CN114386454B CN 114386454 B CN114386454 B CN 114386454B CN 202111498955 A CN202111498955 A CN 202111498955A CN 114386454 B CN114386454 B CN 114386454B
Authority
CN
China
Prior art keywords
data
medical
sample
training
sample set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111498955.0A
Other languages
Chinese (zh)
Other versions
CN114386454A (en
Inventor
王振常
郑伟
任鹏玲
罗德红
蔡林坤
赵二伟
刘雅文
张婷婷
吕晗
刘冬
尹红霞
赵鹏飞
李静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Friendship Hospital
Original Assignee
Beijing Friendship Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Friendship Hospital filed Critical Beijing Friendship Hospital
Priority to CN202111498955.0A priority Critical patent/CN114386454B/en
Publication of CN114386454A publication Critical patent/CN114386454A/en
Application granted granted Critical
Publication of CN114386454B publication Critical patent/CN114386454B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • G06F2218/10Feature extraction by analysing the shape of a waveform, e.g. extracting parameters relating to peaks
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7225Details of analog processing, e.g. isolation amplifier, gain or sensitivity adjustment, filtering, baseline or drift compensation
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Signal Processing (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Pathology (AREA)
  • Computing Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Veterinary Medicine (AREA)
  • Physiology (AREA)
  • Psychiatry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Surgery (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Power Engineering (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Fuzzy Systems (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The specification discloses a medical time series signal data processing method based on a signal mixing strategy, which can at least partially solve the problem that the learning efficiency of a medical prediction model from medical data is low in the related art. In the medical time-series signal data processing method based on the signal mixing strategy in the present specification, when a training sample set is constructed, on one hand, periodic data is adopted, and on the other hand, aperiodic data is also adopted. The periodic data and the aperiodic data have different expression modes for knowledge, so that the medical prediction model can learn the knowledge expressed by the medical data more efficiently through the training sample set constructed by the method in the specification, and the trained medical prediction model is adopted subsequently, so that the model performance is better, and more accurate prediction results can be obtained based on the acquired medical data.

Description

Medical time sequence signal data processing method based on signal mixing strategy
Technical Field
The application relates to the technical field of data processing, in particular to a medical time sequence signal data processing method based on a signal mixing strategy.
Background
AI (Artificial Intelligence) techniques have considerable potential in increasing the speed and accuracy of medical data processing, but they require extensive training of Artificial Intelligence models before obtaining Artificial Intelligence models with certain processing capabilities for medical data. Compared with data in other fields, the medical data has the characteristic of being not single in type. For example, in the context of medical diagnostics, the type of heart rate data and blood pressure data are different. Aiming at the processing process of medical data from different sources, on one hand, a medical data user wants to acquire knowledge expressed in the medical data; on the other hand, different types of medical data often have different expression modes for knowledge, so that certain difficulty exists in acquiring the knowledge expressed in the medical data.
Therefore, how to process the medical data enables the medical prediction model to efficiently learn the knowledge shown in the medical data, and further ensures that the trained medical prediction model has better data processing capability, which becomes a problem to be solved urgently.
Disclosure of Invention
The embodiment of the specification provides a medical time series signal data processing method based on a signal mixing strategy, so as to partially solve the above problems in the prior art.
The embodiment of the specification adopts the following technical scheme:
in a first aspect, the present application provides a method for medical time-series signal data processing based on a signal mixing strategy, including: constructing a training sample set according to periodic data and aperiodic data in the medical data set; training the medical prediction model by adopting a training sample set; acquiring medical data; and taking medical data as the input parameter of the trained medical prediction model, executing the medical prediction model and outputting a medical prediction result.
In an alternative embodiment of the present description, the method further comprises: constructing a first sample set according to the periodic data; acquiring aperiodic data from a medical dataset; constructing a second sample set according to the aperiodic data; obtaining a training sample set based on the first sample set and the second sample set; and training the medical prediction model by adopting a training sample set.
In an alternative embodiment of the present specification, constructing a first sample set based on the periodic data comprises: processing the periodic data according to an empirical mode decomposition method to obtain a first sample; from the first samples, a first set of samples is constructed.
In an optional embodiment of the present specification, constructing a second sample set from the aperiodic data comprises: processing the non-periodic data according to a variational modal decomposition method to obtain a second sample; from the second samples, a second set of samples is constructed.
In an alternative embodiment of the present specification, obtaining a training sample set based on the first sample set and the second sample set includes: constructing an intermediate sample set based on the first sample set and the second sample set; and amplifying the intermediate sample set to obtain a training sample set.
In an optional embodiment of the present specification, the amplifying the intermediate sample set to obtain a training sample set includes: taking a first sample or a second sample obtained by processing the same data in the intermediate sample set as a sample in a sample subset; performing a specified number of amplifications on the sample subsets to obtain a specified number of amplified sample subsets, wherein the training sample set is obtained according to the specified number of amplified sample subsets; wherein the specified data is the number of samples in the subset of samples, the amplifying comprising the steps of: and respectively weighting the samples in the sample subsets by adopting a preset weighting strategy to obtain an amplified sample subset.
In an optional embodiment of the present description, the periodic data comprises at least one of: heart rate data, pulse wave data, respiratory frequency data and electrocardiogram data; the aperiodic data comprises at least one of: blood pressure data, blood glucose data, blood oxygen data, body temperature data, blood cell data.
In an alternative embodiment of the present description, the medical prediction model is any one of: support vector machine, logistic regression model, decision tree model, convolution neural network model, LSTM model.
In an alternative embodiment of the present description, the medical data is pre-processed; wherein the pretreatment mode comprises at least one of the following modes: denoising, missing value supplement and smoothing.
In an optional embodiment of the present description, the method further comprises: training a model based on at least one sample selected from the training sample set to obtain an intermediate model; determining the performance index of the intermediate model by adopting a test sample set; wherein the performance index comprises at least one of accuracy, recall and F1-score; and if the value of the performance index is lower than a performance threshold, continuing to train the intermediate model based on the samples selected from the training sample set until the performance index of the intermediate model is not lower than the performance threshold.
In a second aspect, the present specification provides a medical time-series signal data processing apparatus based on a signal mixing strategy, comprising:
a training sample set construction module configured to: constructing a training sample set according to periodic data and aperiodic data in the medical data set;
a training module configured to: training the medical prediction model by adopting a training sample set;
a medical data acquisition module configured to: acquiring medical data;
a prediction module configured to: and taking medical data as the input parameter of the trained medical prediction model, executing the medical prediction model and outputting a medical prediction result.
The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described medical time-series signal data processing method based on a signal mixing strategy.
The electronic device provided by the present specification includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the above medical time series signal data processing method based on the signal mixing strategy.
The embodiment of the specification adopts at least one technical scheme which can achieve the following beneficial effects:
the medical time-series signal data processing method, device, storage medium and electronic device based on the signal mixing strategy in the embodiments of the present specification can at least partially solve the problem existing in the related art that the efficiency of learning knowledge from medical data by a medical prediction model is low. In the medical time sequence signal data processing method based on the signal mixing strategy in the present specification, when a training sample set is constructed, periodic data is adopted on one hand, and aperiodic data is also adopted on the other hand. The periodic data and the aperiodic data have different expression modes for knowledge, so that the medical prediction model can learn the knowledge expressed by the medical data more efficiently through the training sample set constructed by the method in the specification, and the trained medical prediction model is adopted subsequently, so that the model performance is better, and more accurate prediction results can be obtained based on the acquired medical data.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:
fig. 1 is a schematic flowchart of a medical time-series signal data processing process based on a signal mixing strategy according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram illustrating a sample processing flow involved in a medical time-series signal data processing process based on a signal mixing strategy according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of a medical prediction model provided in an embodiment of the present specification;
fig. 4 is a schematic structural diagram of a medical time-series signal data processing apparatus based on a signal mixing strategy according to an embodiment of the present specification;
fig. 5 is a schematic diagram of an electronic device corresponding to fig. 1 provided in an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the specification without making any creative effort belong to the protection scope of the specification.
The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.
The medical data in the present specification comprises at least one of: data collected for a medical event (e.g., an event experienced while treating a patient), data collected for the production and development of a pharmaceutical product (e.g., a protein) that may be used during the medical event.
At present, the artificial intelligence technology is applied to the technical field of image processing technology and the like, and plays a certain positive role in the development of the image processing technology. However, the artificial intelligence technology is rarely applied successfully in the field of processing medical data.
In an actual medical data processing scenario, the source of the medical data may not be singular. If the medical prediction model is to learn knowledge from medical data with different sources for predicting knowledge, the difficulty of learning knowledge is high.
In view of this, the present specification provides a medical time series signal data processing method based on a signal mixing strategy, so as to at least partially solve the technical problem that a training effect for a medical prediction model in the related art is not good, and thus the trained medical prediction model does not have good data processing. The execution main body of the medical time-series signal data processing process based on the signal mixing strategy is a medical time-series signal data processing device based on the signal mixing strategy.
Fig. 2 is a medical time-series signal data processing process based on a signal mixing strategy provided in an embodiment of the present specification, which may specifically include one or more of the following steps:
s100: and constructing a training sample set according to the periodic data and the aperiodic data in the medical data set.
The medical data set in this specification contains historically acquired medical data. Different medical data in the medical data set may exhibit different periodicities due to different sources of the medical data. The method comprises the steps that a data source generates medical data, and the medical data are endowed with periodicity, namely the periodicity data in the specification; the mode of generating medical data by the data source itself cannot endow periodic medical data, namely periodic data in the description, namely non-periodic data in the description.
Illustratively, the periodic data in this specification includes at least one of: heart rate data, pulse wave data, respiratory frequency data and electrocardiogram data; the aperiodic data comprises at least one of: blood pressure data, blood glucose data, blood oxygen data, body temperature data, blood cell data.
In a further alternative embodiment of the present description, before constructing the training sample set, the medical data is first preprocessed, and then the preprocessed medical data is used to construct the training sample set. The preprocessing mode can include at least one of noise reduction processing, missing value supplement and smoothing processing.
The number of medical data in the medical data set is not particularly limited in the present specification, and for example, in the foregoing medical diagnosis scenario, the medical data collected for the third user may be used as one medical data in the medical data set; the medical data acquired for user li four may be used as further medical data in the medical data set.
In the foregoing scenario of studying protein modification, the medical data obtained from the medical data set of protein numbered a in a certain experiment may be used as one of the medical data sets; the medical data obtained for the protein medical data set with the number B in the experiment can be used as the other medical data in the medical data set; the medical data acquired for the other experiments may be used as other medical data in the medical data set.
That is, in the present specification, the medical data in the medical data set may be distinguished according to the acquisition target of the medical data (for example, the aforementioned zhang san, protein No. a). I.e. one acquisition object, corresponds to one medical data of the medical data set.
S102: and training the medical prediction model by adopting a training sample set.
For convenience of description, the present specification will refer to each of the training samples as a training sample (for example, the training sample may include a first sample and a second sample hereinafter).
S104: medical data is acquired.
After the trained medical prediction model with prediction capability is obtained through the foregoing steps, the medical prediction model can be applied to a line. The medical data in this specification is data input to the model when prediction is performed on line.
It should be noted that, the execution order of this step and the aforementioned step S100 and step S102 is not sequential.
S106: and taking medical data as the input parameter of the trained medical prediction model, executing the medical prediction model and outputting a medical prediction result.
It should be noted that in an alternative embodiment of the present disclosure, the medical prediction model may be a model that predicts based on some type of medical data. Taking the foregoing medical diagnosis scenario as an example, the medical prediction model is used for predicting the probability that the user will suffer from heart diseases, and the medical data only includes electrocardiographic data.
In yet another alternative embodiment of the present description, the medical prediction model may be a model that predicts based on several types of medical data. Taking the above medical diagnosis scenario as an example, the medical prediction model is used for predicting the probability that the user will suffer from heart diseases, and the medical data includes heart rate data, pulse data, electrocardiogram data and electromyogram data.
In addition, in an alternative embodiment of the present specification, the medical prediction model may predict a certain prediction item based on the input medical data. Taking the foregoing medical diagnosis scenario as an example, the medical prediction model is only used for predicting the probability that the user will suffer from heart disease ("predicting that the user will suffer from heart disease" is a prediction item).
In yet another alternative embodiment of the present description, the medical prediction model may predict some of the prediction items based on the input medical data. Taking the above medical diagnosis scenario as an example, the medical prediction model is used for predicting the probability that the user will suffer from heart disease, the probability that the user suffers from diabetes, and the probability that the user suffers from hypertension.
The medical time-series signal data processing process based on the signal mixing strategy in the embodiment of the specification can at least partially solve the problem that the learning efficiency of the medical prediction model from the medical data is low in the related art. In the medical time-series signal data processing method based on the signal mixing strategy in the present specification, when a training sample set is constructed, on one hand, periodic data is adopted, and on the other hand, aperiodic data is also adopted. The periodic data and the aperiodic data have different expression modes for knowledge, so that the medical prediction model can learn the knowledge expressed by the medical data more efficiently through the training sample set constructed by the method in the specification, and the trained medical prediction model has better model performance and can obtain a more accurate prediction result based on the acquired medical data.
As can be seen from the foregoing, the medical time-series signal data processing process based on the signal mixing strategy in the present specification distinguishes between periodic data and non-periodic data in the medical data set, and considers the knowledge of the respective expressions of both periodic data and non-periodic data during the model training process. Hereinafter, the process of processing the periodic data and the aperiodic data will be described.
1. For periodic data.
In order to enable the medical prediction model to learn the knowledge for prediction from the periodic data in the medical data set more efficiently, in an optional embodiment of the present specification, the periodic data is processed according to an Empirical Mode Decomposition (EMD) method to obtain a first sample; from the first samples, a first set of samples is constructed. Thereafter, the medical prediction model is trained using the samples of the first set of samples and the aperiodic data.
The periodic data may be characterized in a time series manner, and the empirical Mode decomposition method may decompose the signal into a sum of a series of time series signals (each time series signal is a first sample), that is, each Intrinsic Mode Function (IMF). The implicit mode of the complex signal can be visually distinguished from each obtained IMF, and the instantaneous frequency can be made meaningful. Any two IMF separations are independent of each other, so that fluctuations and trends in the signal at different scales can be resolved step by step to generate time series of different feature scales (time span of adjacent extreme points of the signal), i.e., IMF components.
On one hand, knowledge in the medical data can be represented by different IMF components from different dimensions to a certain extent, and the knowledge can be effectively learned from the first subsample set by the medical prediction model by processing the obtained first subsample set through the empirical mode decomposition method.
On the other hand, the related art has a problem that the lack of samples required for training the model due to the lack of medical data exists to some extent. The medical data set adopted by the medical time-series signal data processing process based on the signal mixing strategy in the specification also has the problem of insufficient samples contained in the training sample set due to insufficient data. According to the embodiment, a certain periodic data can be decomposed into a plurality of first samples through an empirical mode decomposition method, sample amplification can be achieved to a certain degree, and overfitting caused by insufficient samples is avoided.
In an alternative embodiment of the present disclosure, the target periodic data may be determined from the periodic data (e.g., one periodic data may be randomly selected as the target periodic data). Then, extreme points of the target periodic data are determined. Then, fitting by adopting a cubic spline interpolation function to obtain an upper envelope line x corresponding to the target periodic data max (t) and the lower envelope x min (t) of (d). Calculating an envelopeAverage value of (m (t) = [ x) ] max (t)+x min (t)]And/2 as an average signal.
Then, the difference between the target periodic data and the average signal (i.e., x (t) -m (t)) is taken as a designated signal h (t) after the low-frequency trend is filtered. Whether the specified signal h (t) satisfies the specified constraint condition is judged. If the judgment result is yes, the specified signal h (t) is determined to be the component with the highest frequency of the signal at the local time. If the judgment result is negative, the designated signal h (t) is determined as the target periodic data again until the judgment result is positive. The first sample obtained by decomposition can be characterized by the following formula (1):
Figure BDA0003401995110000091
in the formula, c i (t) is the ith first sample from the decomposition, and r (t) is the residue term.
The specified constraint condition is that the difference between the number of extreme points and the number of zero-crossing points in one periodic data is less than or equal to 1; and, on one periodic data, the average of the upper envelope and the lower envelope of any point is 0.
Then, after the first sample corresponding to the target periodic data is determined, the target periodic data is determined again from the periodic data which does not correspond to the first sample until the first samples are determined for all the target periodic data.
2. For non-periodic data.
In order to enable the medical prediction model to learn the knowledge for prediction from the non-periodic data in the medical data set more efficiently, in an optional embodiment of the present specification, the non-periodic data is processed according to a Variational Mode Decomposition (VMD) method to obtain a second sample; from the second samples, a second set of samples is constructed. Then, the medical prediction model is trained by using samples composed of the second sample set and the periodic data.
Although the aperiodic data does not have strong periodicity in data distribution, the aperiodic data can be sequenced according to the characteristics of the time dimension according to the acquisition time of the aperiodic data so that the aperiodic data shows the characteristics of the time dimension, and the data represented in a time series mode can be obtained.
The variational modal decomposition method is an adaptive, completely non-recursive signal decomposition method that can decompose a signal into a sum of a finite number of IMFs. In an alternative embodiment of the present disclosure, the following steps may be taken to obtain the second sample.
1) The problem of structural variation.
Decomposing aperiodic data f (t) into K mode functions u with different frequency characteristics k And assumes that the resolved modal components are of limited bandwidth with a center frequency. Firstly, each modal component needs to be subjected to Hilbert change to obtain an analysis signal. Second, through the direction u k And adding an index term to adjust the estimated center frequency of each mode, and shifting the frequency spectrum of each mode to a baseband. Finally, the IMF fractional bandwidth is determined according to gaussian smoothing, and the obtained variation constraint model can be represented by the following formula (2).
Figure BDA0003401995110000101
In the formula, mu k ={μ 12 ,…,μ k Is the set of IMF components (i.e., second samples), ω k ={ω 12 ,…,ω k Is the set of center frequencies of all components, δ (t) is the dickstra distribution, and is the convolution operation.
2) And solving a variational equation.
The specification adopts a variational modal decomposition method to introduce a secondary penalty factor alpha and a Lagrange multiplier lambda to convert the constraint variational problem into a non-constraint variational problem, and obtains an augmented Lagrange expression as a formula (3):
Figure BDA0003401995110000102
Figure BDA0003401995110000111
in the formula (I), the compound is shown in the specification,<·>representing the inner product calculation. Pairing by alternative orientation Method of Multipliers (ADMM)
Figure BDA0003401995110000112
And (6) updating.
Given discrimination accuracy epsilon>0, when
Figure BDA0003401995110000113
And n is<And N, terminating iteration, wherein N is the maximum iteration number.
To improve the training effect of the model, the medical data in the present specification may be augmented by using a data augmentation technique to increase the number of samples used for model training. In the related art, the data amplification technology has been applied to the technical field of image processing, and also plays a certain positive role in image processing realized by means of artificial intelligence. However, data amplification techniques are rarely applied in the field of processing medical data.
This is mainly because the difficulty of acquiring medical data is high, for example, in a medical diagnosis scenario, if the heart rate of a user needs to be acquired, the user needs to wear a heart rate acquisition device for a long period of time, and the heart rate acquisition device has a high difficulty for most users; for another example, in a scenario of researching protein modification (pharmaceutical product research), various catalysts and influences of the environment on the protein need to be collected, and technical means required in the collection process are also complex, which causes great difficulty in medical data collection.
As can be seen, the periodic data and the non-periodic data are processed separately in different processing manners, so that the features shown in the data can be made clearer, and the number of samples can be increased.
In order to guarantee the training effect of the medical prediction model and further increase the number of samples for model training, in a further alternative embodiment of the present specification, an intermediate sample set is constructed based on the first sample set and the second sample set. Then, the intermediate samples are collected, and a first sample or a second sample obtained by processing the same data is used as a sample in a sample subset; performing a specified number of amplifications on the sample subsets to obtain a specified number of amplified sample subsets, wherein the training sample set is obtained according to the specified number of amplified sample subsets; wherein the specified data is the number of samples in the subset of samples, the amplifying comprising the steps of: and respectively weighting the samples in the sample subsets by adopting a preset weighting strategy to obtain an amplified sample subset.
Illustratively, for any one of periodic data or aperiodic data D 1 The decomposed sample set (first sample set or second sample set) is denoted as D 1 ={D 11 ,D 12 ,…,D 1m },D 1m Represents the m-th signal component (i.e., the first subsample or the second subsample), m being the number of signal components, and the new signal D 'can be obtained by weighted fusion of the signal components' 1 ={λ 1 D 112 D 21 ,…,λ j D j1 And (c) the step of (c) in which,
Figure BDA0003401995110000121
weight lambda j The first sample in the data set is D after n times of initialization and weighting 1 ={D′ 11 ,D′ 12 ,…,D′ 1n }. Optionally, n = m.
To more accurately implement the project prediction, the source of the sample set may not be unique. For example, in predicting for cardiac diseases, the heart rate data is used to construct the medical sample set D, the blood pressure data is used to construct the medical sample set X, the blood cell data is used to construct the medical sample set Y, and so on. The medical sample set X and the medical sample set Y are processed by adopting the process, so that a medical prediction model obtained by training each medical sample set is obtained.
The medical prediction model in the present specification is a model actually employed when prediction is performed on-line. All existing models which can be used for prediction can be used as medical prediction models in the specification under the condition that the condition allows. For example, the medical prediction model in the present specification may be an LSTM (Long Short-Term Memory) model or a Support Vector Machine (SVM).
Illustratively, the medical prediction model is described by taking an LSTM model as an example. The structure of the medical prediction model is shown in fig. 3. Three control switches are contained inside the medical prediction model as shown in fig. 3: forget door f t (Forget Gate), input Gate (Input Gate), and Output Gate (Output Gate).
Forgetting the gate f during the operation of the medical prediction model t From the current input x t And the previous output h t-1 Get, forget the door f t Determines the last cell state C t-1 Which content to discard. Forget door f t Is a numerical value from 0 to 1, 1 represents complete retention, and 0 represents complete deletion. Forget door f t Can be expressed as the following formula (4):
f t =σ(w f ·h t-1 +u f ·x t +b f )
formula (4)
Input door i t For updating important information. i.e. i t From the current input x t And the previous output h t-1 To obtain i t For determining what new information is required to enter the current cell state C t In (1). For new information here
Figure BDA0003401995110000131
And (4) showing.
i t =σ(w i ·h t-1 +u i ·x t +b i )
Formula (5)
Figure BDA0003401995110000132
Figure BDA0003401995110000133
Output gate o t To determine what values the model needs to output. o. o t Determine how much information is output to h t In (1).
o t =σ(w o ·h t-1 +u o ·x t +b o )
Formula (8)
h t =o t tanh⊙(C t )
Formula (9)
The present specification processes the output of the medical prediction model using the softmax function to obtain the probability y of the prediction category, as shown in the following equation (10).
y=softmax(w*h t +b)
Formula (10)
In order to improve the training speed of the model, the model is trained by adopting a mini-batch gradient descent method, and parameters of the model are gradually updated by calculating the gradient of the loss function, so that the convergence is finally achieved. This patent uses a cross entropy loss function as shown in equation (11):
Figure BDA0003401995110000134
wherein, y' i Is the true tag value, y i The label value of the prediction category is calculated by using the softmax function.
In an alternative embodiment of the present disclosure, the medical prediction model may be trained in a supervised training manner. Illustratively, the training sample set includes several training samples and labels corresponding to the training samples one to one. The process of training the medical prediction model in this specification may be: and determining at least part of the training samples in the training sample set as target samples. And inputting the target sample into the medical prediction model to obtain an undetermined result output by the medical prediction model. And taking the difference between the label corresponding to the target sample and the result to be determined as the loss. And adjusting parameters of the medical prediction model by taking loss minimization as a training target, and updating the medical prediction model by using the parameters after parameter adjustment. And then, re-determining the target sample from the training sample set, and continuing training the medical prediction model until the obtained loss is less than a loss threshold value, and converging the model.
In another alternative embodiment of the present specification, the process of training the medical prediction model may be: and determining at least part of the training samples in the training sample set as target samples. And inputting the target sample into the medical prediction model to obtain an undetermined result output by the medical prediction model. And taking the difference between the to-be-determined result and the label corresponding to the target sample as the loss. And adjusting parameters of the medical prediction model by taking the loss minimization as a training target, and updating the medical prediction model by using the parameters after parameter adjustment. Then, the samples in the target sample are input into the parameter-updated medical prediction model, and whether the model performance (e.g., accuracy, recall, F1-score) of the parameter-updated medical prediction model is greater than a performance threshold is determined according to the output of the parameter-updated medical prediction model. If not, the target sample is determined again to continue training the medical prediction model, and if yes, the model is converged.
Alternatively, the training sample set and the test sample set may be obtained by dividing the samples in the first sample set and the second sample set according to a specified ratio (for example, the ratio of the training sample to the test sample is 9. The training sample set is used for training the medical prediction model, and the testing sample set is used for evaluating the model performance of the medical prediction model.
In an alternative embodiment of the present description, knowledge in the training samples is learned in order to make the medical prediction model more efficient. And sequencing the training samples in the training sample set in advance to obtain a sample sequence to be determined. Then, a first sequence step length is determined according to the number of training samples in the training sample set, and the first sequence step length is positively correlated with the number of training samples in the training sample set. Dividing the sample sequence to be determined according to the first sequence step length to obtain a plurality of subsequences arranged according to the specified sequence, so that the length of each obtained subsequence is equal to the first sequence step length, the similarity between training samples in each subsequence is greater than a first similarity threshold, and the similarity between two training samples respectively belonging to any two adjacent subsequences is less than a second similarity threshold. When the medical prediction model is trained, the samples in the subsequence are sequentially input into the medical prediction model according to a specified order for training.
As can be seen from the foregoing, the medical data has differences in data sources, and the processing of the training samples in this embodiment enables the medical prediction model to have knowledge that can be specifically learned in the samples in a certain subsequence. The difference of the samples in the two adjacent subsequences is large, so that the difference knowledge embodied by the samples in the two adjacent subsequences is more vivid, and the model learning efficiency is improved.
In an alternative embodiment of the present specification, the first sequence step size is also positively correlated with a proportion of samples obtained from the periodic medical data and the aperiodic medical data in the training sample set, that is, the higher the total number of training samples occupied by the samples obtained from the periodic medical data is, the longer the first sequence step size is, so that the medical prediction model can learn more fully the knowledge in the training samples obtained from the periodic medical data.
As can be seen from the foregoing, if the medical data is periodic data, and the way of processing the data in the subsequent steps has a certain effect, it is necessary to distinguish whether the medical data is periodic data before some processing is performed on the medical data.
In an alternative embodiment of the present description, the medical data is transformed into frequency domain data using a fourier transform before or after pre-processing the medical data in order to be able to distinguish between periodic data and non-periodic data. This transformation can be implemented using the following equation (12).
Figure BDA0003401995110000151
Wherein F (ω) represents the frequency domain data after transformation, ω represents the frequency, t represents the time, e -iwt Representing a complex function.
And then, calculating three indexes of a waveform factor, a kurtosis factor and a pulse factor based on the transformed frequency domain data as the judging characteristics of the periodicity strength of the data. Wherein, the form factor C s Is the ratio of the root mean square to the rectified mean. Root mean square X rms Also called as effective value, is obtained by summing the squares of all the values, then calculating the mean value, and then extracting the square. Form factor C s The following equation (13) can be used. Crest factor C p Is the ratio of the signal peak to the root mean square, representing the extreme extent of the peak in the waveform. Crest factor C p The following equation (14) can be used. Pulse factor C if Is the ratio of the signal peak value to the rectified mean (mean of absolute values).
Pulse factor C if The following equation (15) can be used.
Figure BDA0003401995110000161
Figure BDA0003401995110000162
Figure BDA0003401995110000163
After obtaining the factors, a transformation is appliedThe processed frequency domain data is normalized for each factor (specifically, each factor is divided by the first component of the transformed frequency domain data) to obtain processed factors, i.e., processed form factors C' s And processed kurtosis factor C' p And processed pulsefactor C' if
For a medical data, if the processed form factor C 'of the medical data' s And processed kurtosis factor C' p And processed pulsefactor C' if If the periodic data condition is met, the medical data is periodic data; if the periodic data condition is not satisfied, the medical data is not periodic data. The periodic data condition can be characterized by equation (16).
Figure BDA0003401995110000171
In the formula, a 1 Is the first coefficient, a 2 Is the second coefficient, a 3 Is the third coefficient. The first coefficient is less than or equal to the second coefficient, and the third coefficient is greater than the second coefficient. And the sum of the first coefficient, the second coefficient and the third coefficient is less than 1. Illustratively, the first coefficient is equal to 0.2, the second coefficient is equal to 0.2, and the third coefficient is equal to 0.5.
In a further alternative embodiment of the present description, the periodic data is modified based on the differentiated processing of the periodic data and the non-periodic data. Illustratively, the step of engineering occurs after amplification.
Specifically, the compressed samples of the signal can be obtained directly by using the sub-nyquist sampling technique, that is, the signal is subjected to non-adaptive measurement coding at a rate far lower than the nyquist sampling rate, and the periodic data is transformed into non-periodic data. Then, the correlation between the non-periodic data (e.g. the aforementioned medical data sets D, X, Y) of each source and the predicted item is quantified by linear regression, and the correlation parameter of the data set of each source and the predicted item is obtained.
Based on calculatedStrong correlation data, extracting the peak value as the main characteristic Z, assuming that the Z distribution obeys Gaussian distribution, and recording as Z-N (mu, sigma) 2 ) And a probability density function f (z) is obtained.
Then, a feature distribution probability is calculated for each piece of data in the medical data set:
Figure BDA0003401995110000172
Figure BDA0003401995110000173
based on the same idea, the embodiments of the present specification further provide a signal mixing strategy-based medical time-series signal data processing apparatus corresponding to the process shown in fig. 1, and the signal mixing strategy-based medical time-series signal data processing apparatus is shown in fig. 4.
Fig. 4 is a schematic structural diagram of a medical time-series signal data processing apparatus based on a signal mixing strategy according to an embodiment of the present specification, where the medical time-series signal data processing apparatus based on the signal mixing strategy may include one or more of the following modules:
a training sample set construction module 400 configured to: constructing a training sample set according to periodic data and aperiodic data in the medical data set;
a training module 402 configured to: training the medical prediction model by adopting a training sample set;
a medical data acquisition module 404 configured to: acquiring medical data;
a prediction module 406 configured to: and taking medical data as the input parameter of the trained medical prediction model, executing the medical prediction model and outputting a medical prediction result.
In an alternative embodiment of the present disclosure, the training sample set constructing module 400 is specifically configured to: constructing a first sample set according to the periodic data; acquiring aperiodic data from a medical dataset; constructing a second sample set according to the aperiodic data; and obtaining a training sample set based on the first sample set and the second sample set. The training module 402 is specifically configured to: and training the medical prediction model by adopting a training sample set.
In an alternative embodiment of the present disclosure, the training sample set constructing module 400 is specifically configured to: constructing a first sample set from the periodic data, comprising: processing the periodic data according to an empirical mode decomposition method to obtain a first sample; constructing a first sample set according to the first sample; and/or, constructing a second sample set according to the aperiodic data, comprising: processing the aperiodic data according to a variational modal decomposition method to obtain a second sample; from the second samples, a second set of samples is constructed.
In an alternative embodiment of the present disclosure, the training sample set constructing module 400 is specifically configured to: constructing an intermediate sample set based on the first sample set and the second sample set; and amplifying the intermediate sample set to obtain a training sample set.
In an alternative embodiment of the present disclosure, the training sample set constructing module 400 is specifically configured to: taking a first sample or a second sample obtained by processing the same data in the intermediate sample set as a sample in a sample subset; performing a specified number of amplifications on the sample subsets to obtain a specified number of amplified sample subsets, wherein the training sample set is obtained according to the specified number of amplified sample subsets; wherein the specified data is the number of samples in the subset of samples, the amplifying comprising the steps of: and respectively weighting the samples in the sample subsets by adopting a preset weighting strategy to obtain an amplified sample subset.
In an alternative embodiment of the present description, the periodic data comprises at least one of: heart rate data, pulse wave data, respiratory rate data and electrocardiogram data; the aperiodic data comprises at least one of: blood pressure data, blood glucose data, blood oxygen data, body temperature data, blood cell data.
In an optional embodiment of the present description, the medical prediction model is any one of: a support vector machine, a logistic regression model, a decision tree model, a convolutional neural network model and an LSTM model;
in an alternative embodiment of the present description, the medical data is preprocessed; wherein the pretreatment mode comprises at least one of the following modes: denoising, missing value supplement and smoothing.
In an alternative embodiment of the present disclosure, the training module 402 is specifically configured to: training a model based on at least one sample selected from the training sample set to obtain an intermediate model; determining the performance index of the intermediate model by adopting a test sample set; wherein the performance index comprises at least one of accuracy, recall and F1-score; and if the value of the performance index is lower than a performance threshold, continuing to train the intermediate model based on the samples selected from the training sample set until the performance index of the intermediate model is not lower than the performance threshold.
Embodiments of the present specification also provide a computer-readable storage medium, which stores a computer program, where the computer program can be used to execute the above-mentioned process of medical time-series signal data processing based on the signal mixing strategy provided in fig. 1.
The embodiment of the present specification further provides a schematic structural diagram of the electronic device shown in fig. 5. As shown in fig. 5, at the hardware level, the electronic device may include a processor, an internal bus, a network interface, a memory, and a non-volatile memory, and may also include hardware required for other services. The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize any one of the above-mentioned processes of medical time-series signal data processing based on the signal mixing strategy.
Of course, besides the software implementation, the present specification does not exclude other implementations, such as a combination of logic devices or software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.
In the 90's of the 20 th century, improvements to a technology could clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements to process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development, but the original code before compiling is also written in a specific Programming Language, which is called Hardware Description Language (HDL), and the HDL is not only one kind but many kinds, such as abll (Advanced boot Expression Language), AHDL (alternate hard Description Language), traffic, CUPL (computer universal Programming Language), HDCal (Java hard Description Language), lava, lola, HDL, PALASM, software, rhydl (Hardware Description Language), and vhul-Language (vhyg-Language), which is currently used in the field. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium that stores computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be conceived to be both a software module implementing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims (6)

1. A medical time sequence signal data processing method based on a signal mixing strategy is characterized by comprising the following steps:
transforming the medical data in the medical data set into frequency domain data by adopting Fourier transform;
calculating a form factor, a kurtosis factor and a pulse factor based on the transformed frequency domain data;
normalizing the form factor, the kurtosis factor and the pulse factor;
if the processed form factor, kurtosis factor and pulse factor meet the condition of periodic data, determining the medical data as periodic data, otherwise determining the medical data as non-periodic data;
processing periodic data in the medical data set according to an empirical mode decomposition method to obtain a first sample, wherein one piece of periodic data is decomposed into a plurality of first samples, and the method specifically comprises the following steps: determining target periodic data from the periodic data, then determining an extreme point of the target periodic data, then fitting by adopting a cubic spline interpolation function to obtain an upper envelope line and a lower envelope line corresponding to the target periodic data, calculating an average value of the envelope lines as an average signal, then taking a difference value of the target periodic data and the average signal as a designated signal after low-frequency trend filtering, judging whether the designated signal meets a designated constraint condition, if so, determining that the designated signal is a component with the highest local time frequency, if not, re-determining the designated signal as the target periodic data until the judgment result is yes, and then, after determining a first sample corresponding to the target periodic data, re-determining the target periodic data from the periodic data which does not correspond to the first sample until all the target periodic data determine the first sample;
constructing a first sample set according to the first sample;
processing the non-periodic data in the medical data set according to a metamorphic modal decomposition method to obtain second samples, wherein one piece of non-periodic data is decomposed into a plurality of second samples, and the method specifically comprises the following steps: decomposing the aperiodic data into K modal functions with different frequency characteristics, supposing that the decomposed modal components are limited bandwidth with central frequency, firstly, carrying out Hilbert change on each modal component to obtain an analysis signal, secondly, adjusting the estimated central frequency of each modal by adding an exponential term to the modal functions, shifting the frequency spectrum of each modal to a baseband, finally, determining IMF sub-bandwidth according to Gaussian smoothing, obtaining a variation constraint model, and then solving a variation equation;
constructing a second sample set according to the second sample;
amplifying an intermediate sample set constructed based on the first sample set and the second sample set to obtain a training sample set; the method comprises the following steps: taking a first sample or a second sample obtained by processing the same data in the intermediate sample set as a sample in a sample subset; performing a specified number of amplifications on the sample subsets to obtain a specified number of amplified sample subsets, wherein the training sample set is obtained according to the specified number of amplified sample subsets; wherein the specified data is the number of samples in the subset of samples, the amplifying comprising the steps of: respectively weighting the samples in the sample subsets by adopting a preset weighting strategy to obtain an amplified sample subset;
determining a first sequence step length according to the number of training samples in a training sample set, and positively correlating the first sequence step length with the number of the training samples in the training sample set;
dividing the training sample set according to a first sequence step length to obtain a plurality of subsequences, wherein the similarity between the training samples in each subsequence is greater than a first similarity threshold, and the similarity between two training samples in any two adjacent subsequences is less than a second similarity threshold;
sequentially inputting the plurality of subsequences into a medical prediction model for model training;
acquiring medical data;
and taking medical data as the input parameter of the trained medical prediction model, executing the medical prediction model and outputting a medical prediction result.
2. The method of claim 1, further comprising at least one of:
the periodic data includes at least one of: heart rate data, pulse wave data, respiratory frequency data and electrocardiogram data; the aperiodic data comprises at least one of: blood pressure data, blood glucose data, blood oxygen data, body temperature data, blood cell data;
the medical prediction model is any one of: a support vector machine, a logistic regression model, a decision tree model, a convolutional neural network model and an LSTM model;
preprocessing the medical data; wherein the pretreatment mode comprises at least one of the following modes: denoising, missing value supplement and smoothing.
3. The method of claim 1, wherein the method further comprises:
training a model based on at least one sample selected from the training sample set to obtain an intermediate model;
determining the performance index of the intermediate model by adopting a test sample set; wherein the performance index comprises at least one of accuracy, recall and F1-score;
and if the value of the performance index is lower than a performance threshold, continuing training the intermediate model based on the samples selected from the training sample set until the performance index of the intermediate model is not lower than the performance threshold.
4. A medical time-series signal data processing apparatus based on a signal mixing strategy, the apparatus comprising:
a training sample set construction module configured to: transforming the medical data in the medical data set into frequency domain data by using Fourier transform; calculating a form factor, a kurtosis factor and a pulse factor based on the transformed frequency domain data; normalizing the form factor, the kurtosis factor and the pulse factor; if the processed form factor, kurtosis factor and pulse factor meet a periodic data condition, determining the medical data as periodic data, otherwise determining the medical data as non-periodic data; processing periodic data in the medical data set according to an empirical mode decomposition method to obtain a first sample, wherein one piece of periodic data is decomposed into a plurality of first samples, and the method specifically comprises the following steps: determining target periodic data from the periodic data, then determining an extreme point of the target periodic data, then fitting by adopting a cubic spline interpolation function to obtain an upper envelope line and a lower envelope line corresponding to the target periodic data, calculating an average value of the envelope lines as an average signal, then taking a difference value of the target periodic data and the average signal as a designated signal after low-frequency trend filtering, judging whether the designated signal meets a designated constraint condition, if so, determining that the designated signal is a component with the highest local time frequency, if not, re-determining the designated signal as the target periodic data until the judgment result is yes, and then, after determining a first sample corresponding to the target periodic data, re-determining the target periodic data from the periodic data which does not correspond to the first sample until all the target periodic data determine the first sample; constructing a first sample set according to the first sample; processing the non-periodic data in the medical data set according to a variational modal decomposition method to obtain a second sample, wherein one piece of non-periodic data is decomposed into a plurality of second samples, and the method specifically comprises the following steps: decomposing the non-periodic data into K mode functions with different frequency characteristics, supposing that the decomposed mode components are limited bandwidth with central frequency, firstly, carrying out Hilbert change on each mode component to obtain an analysis signal, secondly, adjusting the estimated central frequency of each mode by adding an exponential term to the mode functions, shifting the frequency spectrum of each mode to a baseband, finally, determining IMF sub-bandwidth according to Gaussian smoothing to obtain a variation constraint model, and then solving a variation equation; constructing a second sample set according to the second sample; amplifying an intermediate sample set constructed based on the first sample set and the second sample set to obtain a training sample set; taking a first sample or a second sample obtained by processing the same data in the intermediate sample set as a sample in a sample subset; performing a specified number of amplifications on the sample subsets to obtain a specified number of amplified sample subsets, wherein the training sample set is obtained according to the specified number of amplified sample subsets; wherein the specified data is the number of samples in the subset of samples, the amplifying comprising the steps of: respectively weighting the samples in the sample subsets by adopting a preset weighting strategy to obtain an amplified sample subset; determining a first sequence step length according to the number of training samples in a training sample set, and positively correlating the first sequence step length with the number of the training samples in the training sample set; dividing the training sample set according to a first sequence step length to obtain a plurality of subsequences, wherein the similarity between the training samples in each subsequence is greater than a first similarity threshold, and the similarity between two training samples in any two adjacent subsequences is less than a second similarity threshold;
a training module configured to: sequentially inputting the plurality of subsequences into a medical prediction model for model training;
a medical data acquisition module configured to: acquiring medical data;
a prediction module configured to: and taking medical data as the input parameter of the trained medical prediction model, executing the medical prediction model and outputting a medical prediction result.
5. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when being executed by a processor, carries out the method of any one of the preceding claims 1 to 3.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-3 when executing the program.
CN202111498955.0A 2021-12-09 2021-12-09 Medical time sequence signal data processing method based on signal mixing strategy Active CN114386454B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111498955.0A CN114386454B (en) 2021-12-09 2021-12-09 Medical time sequence signal data processing method based on signal mixing strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111498955.0A CN114386454B (en) 2021-12-09 2021-12-09 Medical time sequence signal data processing method based on signal mixing strategy

Publications (2)

Publication Number Publication Date
CN114386454A CN114386454A (en) 2022-04-22
CN114386454B true CN114386454B (en) 2023-02-03

Family

ID=81196265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111498955.0A Active CN114386454B (en) 2021-12-09 2021-12-09 Medical time sequence signal data processing method based on signal mixing strategy

Country Status (1)

Country Link
CN (1) CN114386454B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913982B (en) * 2022-07-18 2022-10-11 之江实验室 End-stage renal disease complication risk prediction system based on contrast learning
CN117204859B (en) * 2023-11-09 2024-02-13 博睿康医疗科技(上海)有限公司 Dry electrode brain electrical system with common mode noise channel and active noise reduction method for signals

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065202A (en) * 2012-12-24 2013-04-24 电子科技大学 Wind power plant ultrashort term wind speed prediction method based on combination kernel function
CN108108251A (en) * 2017-11-30 2018-06-01 重庆邮电大学 A kind of reference point k nearest neighbor classification method based on MPI parallelizations
CN112257917A (en) * 2020-10-19 2021-01-22 北京工商大学 Time series abnormal mode detection method based on entropy characteristics and neural network
CN113672666A (en) * 2021-08-23 2021-11-19 成都佳华物链云科技有限公司 Machine load prediction method and device, electronic equipment and readable storage medium

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101852871A (en) * 2010-05-25 2010-10-06 南京信息工程大学 Short-term climate forecasting method based on empirical mode decomposition and numerical value set forecasting
CN109192299A (en) * 2018-08-13 2019-01-11 中国科学院计算技术研究所 A kind of medical analysis auxiliary system based on convolutional neural networks
US11158069B2 (en) * 2018-12-11 2021-10-26 Siemens Healthcare Gmbh Unsupervised deformable registration for multi-modal images
EP3879453A1 (en) * 2020-03-12 2021-09-15 Siemens Healthcare GmbH Method and system for detecting landmarks in medical images
CN111476292B (en) * 2020-04-03 2021-02-19 北京全景德康医学影像诊断中心有限公司 Small sample element learning training method for medical image classification processing artificial intelligence
US20230317248A1 (en) * 2020-05-29 2023-10-05 Medtronic, Inc. Presentation of patient information for cardiac blood flow procedures
CN111881973A (en) * 2020-07-24 2020-11-03 北京三快在线科技有限公司 Sample selection method and device, storage medium and electronic equipment
CN112016702B (en) * 2020-09-09 2023-07-28 平安科技(深圳)有限公司 Medical data processing method, device, equipment and medium based on transfer learning
CN111816306B (en) * 2020-09-14 2020-12-22 颐保医疗科技(上海)有限公司 Medical data processing method, and prediction model training method and device
CN113133771B (en) * 2021-03-18 2022-10-28 浙江工业大学 Uterine electromyographic signal analysis and early birth prediction method based on time-frequency domain entropy characteristics
CN113517046B (en) * 2021-04-15 2023-11-07 中南大学 Heterogeneous data feature fusion method in electronic medical record, fusion feature-based prediction method, fusion feature-based prediction system and readable storage medium
CN113297987B (en) * 2021-05-28 2022-07-05 东北林业大学 Variational modal decomposition signal noise reduction method based on dual-objective function optimization
CN113240666B (en) * 2021-06-04 2024-04-16 科大讯飞股份有限公司 Medical image preprocessing method, device, equipment and storage medium
CN113378991A (en) * 2021-07-07 2021-09-10 上海联影医疗科技股份有限公司 Medical data generation method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103065202A (en) * 2012-12-24 2013-04-24 电子科技大学 Wind power plant ultrashort term wind speed prediction method based on combination kernel function
CN108108251A (en) * 2017-11-30 2018-06-01 重庆邮电大学 A kind of reference point k nearest neighbor classification method based on MPI parallelizations
CN112257917A (en) * 2020-10-19 2021-01-22 北京工商大学 Time series abnormal mode detection method based on entropy characteristics and neural network
CN113672666A (en) * 2021-08-23 2021-11-19 成都佳华物链云科技有限公司 Machine load prediction method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN114386454A (en) 2022-04-22

Similar Documents

Publication Publication Date Title
CN114386454B (en) Medical time sequence signal data processing method based on signal mixing strategy
JP7422946B2 (en) Automatic construction of neural network architecture using Bayesian graph search
JP6755849B2 (en) Pruning based on the class of artificial neural networks
EP3777674A1 (en) Time series data learning and analyzing method using artificial intelligence
JP2021502650A (en) Time-invariant classification
EP3867830A1 (en) Adapting prediction models
Adib et al. Synthetic ecg signal generation using probabilistic diffusion models
CN109171754B (en) Training method and device of blood sugar prediction model, terminal and storage medium
Zeng et al. Detection of heart valve disorders from PCG signals using TQWT, FA-MVEMD, Shannon energy envelope and deterministic learning
CN116916504B (en) Intelligent control method, device and equipment for dimming panel and storage medium
CN115137374A (en) Sleep stage oriented electroencephalogram interpretability analysis method and related equipment
CN114386479B (en) Medical data processing method and device, storage medium and electronic equipment
Tigga et al. Efficacy of novel attention-based gated recurrent units transformer for depression detection using electroencephalogram signals
CN114357237A (en) Electrocardiosignal and music signal matching method, system, device and medium
Leone et al. ZyON: Enabling spike sorting on APSoC-based signal processors for high-density microelectrode arrays
EP4093270A1 (en) Method and system for personalized prediction of infection and sepsis
Pandian et al. Effect of data preprocessing in the detection of epilepsy using machine learning techniques
De Pedro-Carracedo et al. Is the PPG signal chaotic?
Al-Salman et al. A systematic review of artificial neural networks in medical science and applications
CN115240843A (en) Fairness prediction system based on structure causal model
CN112785575B (en) Image processing method, device and storage medium
Sarić et al. Implementation of neural network-based classification approach on embedded platform
US20220005603A1 (en) De-noising task-specific electroencephalogram signals using neural networks
CN117077013B (en) Sleep spindle wave detection method, electronic equipment and medium
CN115358367B (en) Dynamic self-adaptive brain-computer interface decoding method based on multi-model learning integration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant