CN110309491B - Transient phase partitioning method and system based on local Gaussian mixture model - Google Patents

Transient phase partitioning method and system based on local Gaussian mixture model Download PDF

Info

Publication number
CN110309491B
CN110309491B CN201910571289.5A CN201910571289A CN110309491B CN 110309491 B CN110309491 B CN 110309491B CN 201910571289 A CN201910571289 A CN 201910571289A CN 110309491 B CN110309491 B CN 110309491B
Authority
CN
China
Prior art keywords
steady
gaussian
phase
model
state phase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910571289.5A
Other languages
Chinese (zh)
Other versions
CN110309491A (en
Inventor
刘井响
王丹
彭周华
刘陆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN201910571289.5A priority Critical patent/CN110309491B/en
Publication of CN110309491A publication Critical patent/CN110309491A/en
Application granted granted Critical
Publication of CN110309491B publication Critical patent/CN110309491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Algebra (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a phase partitioning method and system based on a local Gaussian mixture model, which comprises the following steps: s1, collecting a sample and creating a historical training data set; s2, selecting a part of samples from a historical training data set to create a first Gaussian distribution model so as to determine a first steady-state phase; s3, based on the previously determined Gaussian model, creating a Gaussian mixture model containing two Gaussian components to determine the next steady-state phase; s4, determining transient phases possibly existing between two steady-state phases based on the determined two adjacent steady-state phase models; and S5, repeating S3 and S4 to complete phase division of all sample data. The invention greatly reduces redundant calculation, improves the calculation efficiency, adopts a step-by-step updating strategy, and gradually determines the steady-state phase and the transient-state phase according to the sampling time sequence, and has the advantages that the phase division number does not need to be pre-specified, the division result does not need to be subsequently processed, and the like.

Description

Transient phase partitioning method and system based on local Gaussian mixture model
Technical Field
The invention relates to the technical field of batch process statistical modeling, in particular to a transient phase dividing method and a transient phase dividing system in a multiphase batch process.
Background
Batch process is a very common production mode in modern industry and is widely applied to industries such as fine chemical industry, pharmacy, metallurgy, semiconductor and the like. As technology develops and demand diversifies, batch processes become more and more complex, and the direct manifestation is that a batch process comprises a plurality of different operation stages, or a plurality of different reaction/variation stages, such a process is called a multiphase batch process, and each such stage is called a phase. For example, penicillin fermentation process, assuming that the time of one penicillin fermentation process is 400h, the first 45h is a pre-culture stage, and the last 355h belongs to a feed type feeding stage, namely, raw materials are fed into a reaction kettle from the 45 th h. From the reaction mechanism, a typical penicillin fermentation process can be divided into four phases (stages), including a retardation stage, an exponential growth stage, a stabilization stage and an autolysis stage. While the penicillin fermentation process is typically a slowly time-varying process, the transition between different phases is not a sudden one, but a slowly varying one, so that the transition between different phases is not so obvious, and there is a case where a sample between two steady-state phases partially retains the characteristics of the first steady-state phase and contains the characteristics of the next new steady-state phase, and the phase corresponding to such characteristics is called transient phase. How to accurately and reasonably divide a multiphase batch process into different phases is beneficial to enhancing the further understanding of the process mechanism and improving the accuracy of process modeling.
At present, certain research results have been found on multiphase division, including multiphase principal component analysis similar to exhaustion method, which uses repetition factor index to divide batches, however, this method is not suitable for the process without obvious inflection point change. The clustering method is widely used for phase partitioning in a batch process, however, the number of the partition categories is required to be specified in advance based on the K-means algorithm, and the time-sequence relation among samples is not considered, so that the partition result is disordered, further subsequent processing is required, and the problems of difficulty in explanation and the like are caused. Therefore, the timing relationship between samples in phase is an important factor which is not negligible, and the transient phase can be accurately divided as well as the multistable phase. That is, neither of the above two methods considers the problem of insufficient time sequence and transient phase division.
Disclosure of Invention
Based on this, a phase partitioning method based on a local gaussian mixture model is provided in particular for the defects that the existing phase partitioning method does not consider the time sequence, the transient phase partitioning and the like.
A phase partitioning method based on a local Gaussian mixture model comprises the following steps:
s1, collecting a sample and creating a historical training data set;
s2, selecting a part of samples from the historical training data set according to a sampling time sequence to create a first Gaussian distribution model so as to determine a first steady-state phase;
s3, based on the previously determined Gaussian model, creating a Gaussian mixture model containing two Gaussian components to determine the next steady-state phase;
s4, determining transient phases possibly existing between two steady-state phases based on the determined two adjacent steady-state phase models;
and S5, repeating S3 and S4 to complete phase division of all sample data.
Optionally, in one embodiment, the selecting a partial sample from the historical training data set to create a first gaussian distribution model to determine first steady-state phase data includes:
s21, sequentially selecting front N from the historical training data set 1 Calculating the mean value and the variance of each sample to obtain a corresponding Gaussian distribution model p (x | 1), wherein p (x | 1) represents a probability density function of a first Gaussian distribution model, and x represents the acquired sample data;
s22, extracting sample points to perform steady-state phase verification, namely from the Nth 1 The 2 sample points start to verify to find three continuous sample points meeting the first verification condition and mark the sequence numbers corresponding to the sample points meeting the first verification condition as
Figure BDA0002108866500000021
The verification condition is
Figure BDA0002108866500000022
Where ρ is a pre-specified threshold;
s23, judging N 1 Whether or not equal to
Figure BDA0002108866500000031
If yes, the result is converged, namely the first steady-state phase is determined and the next step is carried out;otherwise make
Figure BDA0002108866500000032
And returning to the step S21 for iteration till N 1 Is equal to
Figure BDA0002108866500000033
Optionally, in one embodiment, the creating a gaussian mixture model containing two gaussian components to determine the next steady-state phase based on the previously determined gaussian model includes:
s31, based on the determined previous Gaussian model, creating a mixed model containing two Gaussian distribution functions and training without loss of generality, assuming that the first c-1 steady-state phases are determined, c is an integer greater than or equal to 2, and the formula corresponding to the mixed model is
p(x|θ c )=α c-1 p(x|c-1)+α c p(x|c)
Wherein the probability density function p (x | c-1) of the c-1 st Gaussian model is determined and includes N c-1 The c-1 steady-state phase of each sample is X c-1 The probability density function p (x | c) of the c-th Gaussian model is to be determined, assuming that N is included c The c steady-state phase of each sample is X c Record X m ={X c-1 ,X c The training parameters θ corresponding to the training data of the Gaussian mixture model c ={α c-1ccc },α c-1 And alpha c Are the combined coefficients of the c-1 th and the c-th Gaussian components in the Gaussian mixture model, mu c Sum-sigma c Respectively training the mixed model by using a maximum expectation algorithm (EM algorithm) which is a mean vector and a variance matrix in the c-th Gaussian probability density function p (x | c);
s32, extracting a sample point to perform steady-state phase verification on the mixed model,
i.e. from N c-1 The 2 sample points start to be verified to find three continuous sample points meeting the second verification condition and mark the serial numbers corresponding to the sample points meeting the second verification conditionIs marked as
Figure BDA0002108866500000034
The verification condition is
Figure BDA0002108866500000035
Where ρ is a pre-specified threshold;
s33, judgment
Figure BDA0002108866500000036
Whether or not equal to N c +N c-1 If yes, the result is converged, namely the c-th steady-state phase is determined; otherwise make the instruction
Figure BDA0002108866500000037
And returns to step S31 to iterate until
Figure BDA0002108866500000038
Is equal to N c +N c-1
Optionally, in one embodiment, the determining, based on the determined two adjacent steady-state phase models, a transient phase that may exist between two steady-state phases includes:
from two determined two steady-state phases X c-1 And X c Starting from the first sample point of the c-th steady-state phase, a test is performed to find the consecutive satisfaction of p (x) n The sample point of | c) < ρ is recorded as the transient phase X c-1,c
In addition, in order to solve the defects of the traditional technology, a phase partitioning system based on a local Gaussian mixture model is also provided.
A local gaussian mixture model based phase partitioning system, comprising:
an acquisition unit for acquiring samples and creating a historical training data set;
the first Gaussian distribution creating unit is used for selecting partial samples from the historical training data set according to the sampling time sequence to create a first Gaussian distribution model so as to determine first steady-state phase data;
the Gaussian mixture model creating unit is used for creating a Gaussian mixture model containing two Gaussian components based on the determined Gaussian model to determine the next steady-state phase and completing phase division of all sample data by matching with the transient phase acquiring unit;
a transient phase acquisition unit for determining corresponding transient phase data based on two adjacent steady state phase data.
Optionally, in one embodiment, the first gaussian distribution creating unit includes:
a first data acquisition module for sequentially selecting N before the historical training data set 1 Calculating the mean value and the variance of each sample to obtain a corresponding Gaussian distribution function p (x | 1), wherein p (x | 1) represents a probability density function of a first Gaussian distribution model, and x represents the collected sample data;
a first steady-state phase verification module for extracting sample points for steady-state phase verification, i.e. from the Nth 1 The 2 sample points start to verify to find three continuous sample points meeting the first verification condition and mark the sequence numbers corresponding to the sample points meeting the first verification condition as sequence numbers
Figure BDA0002108866500000041
The verification condition is
Figure BDA0002108866500000042
Wherein ρ is a pre-specified threshold;
a first steady-state phase determination module for determining N 1 Whether or not to be equal to
Figure BDA0002108866500000043
If yes, the result is converged, namely the first steady-state phase is determined and the next step is carried out; otherwise make
Figure BDA0002108866500000044
And the first steady-state phase verification module iterates again until N 1 Is equal to
Figure BDA0002108866500000045
Optionally, in one embodiment, the gaussian mixture model creating unit includes:
a second data obtaining module, configured to create and train a gaussian mixture model including two gaussian components based on a previously determined gaussian model, assuming that c-1 previous steady-state phases have been determined, c is an integer greater than or equal to 2, and a formula corresponding to the mixture model is
p(x|θ c )=α c-1 p(x|c-1)+α c p(x|c)
Wherein the probability density function p (x | c-1) of the c-1 st Gaussian model is determined and includes N c-1 The c-1 steady state phase of each sample is X c-1 The probability density function p (x | c) of the c-th Gaussian model is to be determined, assuming that N is included c The c steady-state phase of each sample is X c Record X m ={X c-1 ,X c The corresponding training parameter θ is the training data of the Gaussian mixture model c ={α c-1ccc },α c-1 And alpha c Are the combined coefficients of the c-1 th and the c-th Gaussian components in the Gaussian mixture model, mu c Sum-sigma c Respectively training the mixed model by using a maximum expectation algorithm (EM algorithm) which is a mean vector and a variance matrix in the c-th Gaussian probability density function p (x | c);
training the mixed model by using maximum expectation algorithm (EM algorithm), wherein X is recorded m ={X c-1 ,X c H, the corresponding training difference number theta c ={α c-1ccc };
A second steady-state phase verification module for extracting sample points to perform steady-state phase verification on the mixed model, i.e. from the Nth c-1 2 sample Point onStarting verification to find three continuous sample points meeting the second verification condition and marking the sequence numbers corresponding to the sample points meeting the second verification condition as
Figure BDA0002108866500000051
The verification condition is
Figure BDA0002108866500000052
Wherein ρ is a pre-specified threshold;
a second steady state phase determination module for determining
Figure BDA0002108866500000053
Whether or not equal to N c +N c-1 If yes, the result convergence is shown, namely the c-th steady-state phase is determined; otherwise make
Figure BDA0002108866500000054
And the second steady-state phase verification module iterates again until
Figure BDA0002108866500000055
Is equal to N c +N c-1
Optionally, in one embodiment, the processing procedure of the transient phase acquiring unit includes: from two determined two steady-state phases X c-1 And X c Starting from the first sample point of the c-th steady-state phase, checking and finding the continuous satisfaction of p (x) n The sample point of | c) < ρ is marked as the transient phase X c-1,c
In addition, in order to solve the disadvantages of the conventional technology, a computer-readable storage medium is provided, which includes computer instructions, when the computer instructions are executed on a computer, the computer executes the method.
By implementing the embodiment of the invention, the defects that the time sequence and the transient phase division are not considered in the existing phase division method are overcome, and the invention also has the following beneficial effects: in the invention, from the angle of Gaussian distribution, an independent Gaussian distribution is used for describing a steady-state phase, and a mixed model of two adjacent Gaussian distributions is used for describing a transient phase, so that the phase division method can effectively divide the steady-state phase and can determine the transient phase at the same time; (2) According to the invention, only local data is adopted for modeling verification in each iteration, so that redundant calculation is greatly reduced, and the calculation efficiency is improved; (3) The method adopts a step-by-step updating strategy, gradually determines the steady-state phase and the transient-state phase according to the sampling time sequence, and has the advantages that the phase division number does not need to be pre-specified, the division result does not need to be subsequently processed, and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Wherein:
FIG. 1a is a diagram illustrating an initial model update in one embodiment;
FIG. 1b is a schematic diagram illustrating update of a mixture model according to an embodiment;
FIG. 2 is a diagram illustrating phase partitioning for a local Gaussian mixture model in one embodiment;
FIG. 3 is a schematic diagram of the penicillin fermentation process in one example;
FIG. 4 is a flow diagram of core steps in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present application. The first and second elements are both elements, but they are not the same element.
In order to overcome the disadvantages of the conventional phase partitioning method, such as time sequence and transient phase partitioning, in this embodiment, a phase partitioning method based on a local gaussian mixture model is provided, in which an improved local gaussian mixture model method is used, and a step-wise probabilistic modeling is used to perform phase partitioning on a multiphase batch process; that is, each time a local gaussian mixture model is established, that is, an initial model with only one gaussian component and a mixture model containing two gaussian components, a steady-state phase and a transient-state phase in the process can be determined simultaneously in an iterative manner, specifically, as shown in fig. 4, the method includes the following steps:
s1, collecting a sample and creating a historical training data set; in some embodiments, the batch process data is collected by collecting batch process data
Figure BDA0002108866500000071
And unfolded into
Figure BDA0002108866500000072
Acquiring a historical training data set;
s2, selecting a part of samples from the historical training data set according to a sampling time sequence to create a first Gaussian distribution model so as to determine first steady-state phase data, wherein the purpose of the step is to establish an initial model only containing one Gaussian component, and an updating schematic diagram of the initial model is shown in FIG. 1 (a); in some specific embodiments, selecting a portion of samples from the historical training data set to create a first gaussian distribution model to determine first steady-state phase data comprises:
s21, sequentially selecting front N from the historical training data set 1 Calculating the mean value and the variance of each sample to obtain a corresponding Gaussian distribution model p (x | 1), wherein p (x | 1) represents a probability density function of a first Gaussian distribution model, and x represents the acquired sample data;
s22, extracting sample points to perform steady-state phase verification, namely from the Nth 1 The 2 sample points start to verify to find three continuous sample points meeting the first verification condition and mark the sequence numbers corresponding to the sample points meeting the first verification condition as sequence numbers
Figure BDA0002108866500000073
The verification condition is
Figure BDA0002108866500000081
Where ρ is a pre-specified threshold, e.g., ρ =0.001, if N 1 If/2 is a non-integer, then either rounding forward or rounding backward, i.e. if N is 1 If the/2 is 16.5, 16 or 17 can be selected;
s23, judging N 1 Whether or not to be equal to
Figure BDA0002108866500000082
If yes, the result is converged, namely the first steady-state phase is determined and the next step is carried out; otherwise make
Figure BDA0002108866500000083
And returns to step S21 to iterate until N 1 Is equal to
Figure BDA0002108866500000084
S3, based on the determined previous Gaussian model, creating a Gaussian mixture model containing two Gaussian components to determine the next steady-state phase, wherein the purpose of the step is to establish the mixture model containing two Gaussian components, the generality is not lost, as the c-1 steady-state phases are determined, the c-th steady-state phase is determined, and the updating schematic diagram of the mixture model is shown in FIG. 1 (b); in some specific embodiments, the creating a gaussian mixture model to determine the next steady-state phase data includes:
s31, creating a mixed model containing two Gaussian distribution functions and training, wherein the formula corresponding to the mixed model is
p(x|θ c )=α c-1 p(x|c-1)+α c p(x|c)
Wherein, assuming that the first c-1 steady-state phases have been determined, N is included c-1 The c-1 steady-state phase of each sample is X c-1 Containing N c The c steady-state phase of each sample is X c C is an integer of 2 or more;
training the mixed model by using maximum expectation algorithm (EM algorithm), wherein X is recorded m ={X c-1 ,X c H, the corresponding training difference number theta c ={α c-1ccc }; since the first c-1 steady-state phases have been determined, if the mean and variance of the first Gaussian component has been determined by S2, then this step only calculates the training difference θ c ={α c-1ccc } then;
s32, extracting sample points to perform steady-state phase verification on the mixed model, namely, from the Nth point c-1 The 2 sample points start to verify to find three continuous sample points meeting the second verification condition and mark the sequence numbers corresponding to the sample points meeting the second verification condition as
Figure BDA0002108866500000085
The verification condition is
Figure BDA0002108866500000086
Where ρ is a pre-specified threshold, e.g., ρ =0.001;
s33, determining
Figure BDA0002108866500000087
Whether or not equal to N c +N c-1 If yes, the result is converged, namely the c-th steady-state phase is determined; otherwise make
Figure BDA0002108866500000088
And returns to step S31 to iterate until
Figure BDA0002108866500000089
Is equal to N c +N c-1
The creating a gaussian mixture model containing two gaussian components to determine the next steady-state phase based on the previously determined gaussian model comprises:
s31, based on the determined previous Gaussian model, creating a mixed model containing two Gaussian distribution functions and training without loss of generality, assuming that the first c-1 steady-state phases are determined, c is an integer greater than or equal to 2, and the formula corresponding to the mixed model is
p(x|θ c )=α c-1 p(x|c-1)+α c p(x|c)
Wherein the probability density function p (x | c-1) of the c-1 st Gaussian model is determined and includes N c-1 The c-1 steady-state phase of each sample is X c-1 The probability density function p (x | c) of the c-th Gaussian model is to be determined, assuming that N is included c The c steady-state phase of each sample is X c Record X m ={X c-1 ,X c The corresponding training parameter θ is the training data of the Gaussian mixture model c ={α c-1ccc },α c-1 And alpha c Are the combined coefficients of the c-1 th and the c-th Gaussian components in the Gaussian mixture model, mu c Sum-sigma c Respectively training the mixed model by using a maximum expectation algorithm (EM algorithm) which is a mean vector and a variance matrix in the c-th Gaussian probability density function p (x | c);
s32, extracting a sample point to perform steady-state phase verification on the mixed model,
i.e. from N c-1 The 2 sample points start to verify to find three continuous sample points meeting the second verification condition and mark the sequence numbers corresponding to the sample points meeting the second verification condition as
Figure BDA0002108866500000091
The verification condition is
Figure BDA0002108866500000092
Where ρ is a pre-specified threshold;
s33, judgment
Figure BDA0002108866500000093
Whether or not equal to N c +N c-1 If yes, the result convergence is shown, namely the c-th steady-state phase is determined; otherwise make the instruction
Figure BDA0002108866500000094
And returns to step S31 to iterate until
Figure BDA0002108866500000095
Is equal to N c +N c-1
And S4, determining transient phases possibly existing between the two steady-state phases based on the determined two adjacent steady-state phase models, wherein a phase division schematic diagram of the method is shown in figure 2. In some specific embodiments, the determining, based on the determined two adjacent steady-state phase models, a transient phase that may exist between two steady-state phases includes: from two determined two steady-state phases X c-1 And X c Starting from the first sample point of the c-th steady-state phase, a test is performed to find the consecutive satisfaction of p (x) n The sample point of | c) < ρ is recorded as the transient phase X c-1,c
And S5, repeating S3 and S4 to complete phase division of all sample data.
In addition, in order to solve the defects of the conventional technology, a phase partitioning system based on a local gaussian mixture model is further provided, which includes:
an acquisition unit for acquiring samples and creating a historical training data set; in some embodiments, the batch process data is collected by collecting batch process data
Figure BDA0002108866500000101
And unfolded into
Figure BDA0002108866500000102
Acquiring a historical training data set;
the first Gaussian distribution creating unit is used for selecting partial samples from the historical training data set according to the sampling time sequence to create a first Gaussian distribution model so as to determine first steady-state phase data; in some specific embodiments, the first gaussian distribution creating unit includes:
a first data acquisition module for sequentially selecting N before the historical training data set 1 Calculating the mean value and the variance of each sample to obtain a corresponding Gaussian distribution model p (x | 1), wherein p (x | 1) represents a probability density function of a first Gaussian distribution model, and x represents the acquired sample data;
a first steady-state phase verification module for extracting sample points for steady-state phase verification, i.e. from the Nth 1 The 2 sample points start to verify to find three continuous sample points meeting the first verification condition and mark the sequence numbers corresponding to the sample points meeting the first verification condition as sequence numbers
Figure BDA0002108866500000103
The verification condition is
Figure BDA0002108866500000104
Where ρ is a pre-specified threshold, e.g., ρ =0.001;
a first steady-state phase determination module for determining N 1 Whether or not to be equal to
Figure BDA0002108866500000105
If yes, the result is converged, namely the first steady-state phase is determined and the next step is carried out; otherwise make
Figure BDA0002108866500000106
And the first steady-state phase verification module iterates again until N 1 Is equal to
Figure BDA0002108866500000107
The Gaussian mixture model creating unit is used for creating a Gaussian mixture model containing two Gaussian components based on the determined Gaussian model to determine the next steady-state phase and complete phase division of all sample data by matching with the transient phase acquiring unit, namely the Gaussian mixture model creating unit determines the steady-state phase, and the transient phase acquiring unit determines the transient phase; in some specific embodiments, the gaussian mixture model creating unit includes:
a second data obtaining module, configured to create a mixture model including two gaussian distribution functions based on a previously determined gaussian model, and train the mixture model without loss of generality, assuming that c-1 previous steady-state phases have been determined, c is an integer greater than or equal to 2, and a formula corresponding to the mixture model is
p(x|θ c )=α c-1 p(x|c-1)+α c p(x|c)
Wherein the probability density function p (x | c-1) of the c-1 st Gaussian model is determined and includes N c-1 The c-1 steady-state phase of each sample is X c-1 The probability density function p (x | c) of the c-th Gaussian model is to be determined, assuming that N is included c The c steady-state phase of each sample is X c Record X m ={X c-1 ,X c The training parameters θ corresponding to the training data of the Gaussian mixture model c ={α c-1ccc },α c-1 And alpha c Are the c-1 th and c-th Gaussian components in the Gaussian mixture model, respectivelyCombination coefficient of (a) < mu > c Sum-sigma c Respectively training the mixed model by using a maximum expectation algorithm (EM algorithm) which is a mean vector and a variance matrix in the c-th Gaussian probability density function p (x | c);
a second steady-state phase verification module for extracting sample points to perform steady-state phase verification on the mixed model, i.e. from the Nth c-1 The 2 sample points start to verify to find three continuous sample points meeting the second verification condition and mark the sequence numbers corresponding to the sample points meeting the second verification condition as
Figure BDA0002108866500000111
The verification condition is
Figure BDA0002108866500000112
Where ρ is a pre-specified threshold, e.g., ρ =0.001;
a second steady state phase determination module for determining
Figure BDA0002108866500000113
Whether or not it is equal to N c +N c-1 If yes, the result convergence is shown, namely the c-th steady-state phase is determined; otherwise make
Figure BDA0002108866500000114
And the second steady-state phase verification module iterates again until
Figure BDA0002108866500000115
Is equal to N c +N c-1
A transient phase acquisition unit for determining corresponding transient phase data based on two adjacent steady state phase data to complete phase partitioning of all sample data. In some specific embodiments, the processing of the transient phase acquisition unit includes: from two determined two steady-state phases X c-1 And X c Starting from the first sample point in the c-th steady-state phaseAnd find out that p (x) is continuously satisfied n The sample point of | c) < ρ is recorded as the transient phase X c-1,c
Based on the same inventive concept, the present invention also proposes a computer-readable storage medium comprising computer instructions which, when run on a computer, cause the computer to perform the method.
Based on the technical scheme, the effectiveness of the penicillin fermentation process is verified by taking a specific experimental example, namely the penicillin fermentation process as an example, and a schematic diagram of the penicillin fermentation process is shown in fig. 3.
Specifically, the method comprises the following steps:
in the stage of collecting samples and creating a historical training data set: here, 20 batches of normal data are generated in total for phase division, and white noise with the size of N (0, 0.04) is added to each batch of data; set the reaction time per batch to be 400h, sampled every 1h, thus each batch contained 400 sample points, each sample point containing 11 variables, see table 1.
TABLE 1
Figure BDA0002108866500000121
The phase division stage of determining the first steady-state phase data and the next steady-state phase data according to the sampling time sequence and dividing all the sample data: the number of sample points of the first phase is set to three times the number of variables as the initial modeling sample points, i.e., N 1 =33. The division results when ρ =0.001 are shown in table 2, in which it can be seen that the entire process is divided into approximately 10 steady-state phases and three transient phases. There are three very small phases between the first and fifth steady-state phases. In the actual fermentation reaction process, the initial stage is a pre-culture stage which is relatively stable and corresponds to the first steady-state phase. Then enter a vigorous reaction phase corresponding to the next three small steady-state phases. Then the process enters a feeding type feeding stage, and then enters a stable fermentation stage after a period of conversion, and finally an autolysis stage. It can be seen that the results of the partitioning of the method and the actual process stage can be corresponded well.
TABLE 2
Figure BDA0002108866500000122
Figure BDA0002108866500000131
The embodiment of the invention has the following beneficial effects:
besides solving the defects that the existing phase division method does not consider the time sequence, the transient phase division and the like, the method also has the following beneficial effects: in the invention, from the angle of Gaussian distribution, an independent Gaussian distribution is used for describing a steady-state phase, and a mixed model of two adjacent Gaussian distributions is used for describing a transient-state phase, so that the phase division method can effectively divide the steady-state phase and can also determine the transient-state phase; (2) According to the invention, only local data is adopted for modeling verification in each iteration, so that redundant calculation is greatly reduced, and the calculation efficiency is improved; (3) The method adopts a step-by-step updating strategy, gradually determines the steady-state phase and the transient-state phase according to the sampling time sequence, and has the advantages that the phase division number does not need to be pre-specified, the division result does not need to be subsequently processed, and the like.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (4)

1. A phase partitioning method based on a local Gaussian mixture model comprises the following steps:
s1, collecting a sample and creating a historical training data set;
s2, selecting a part of samples from the historical training data set according to a sampling time sequence to create a first Gaussian distribution model so as to determine a first steady-state phase;
s3, based on the previously determined Gaussian model, creating a Gaussian mixture model containing two Gaussian components to determine the next steady-state phase;
s4, determining transient phases possibly existing between two steady-state phases based on the determined two adjacent steady-state phase models;
s5, repeating the S3 and the S4 to complete phase division of all sample data;
wherein selecting a portion of the samples from the historical training data set to create a first gaussian distribution model to determine a first steady-state phase data comprises:
s21, sequentially selecting front N from the historical training data set 1 Calculating the mean value and the variance of each sample to obtain a corresponding Gaussian distribution model p (x | 1), wherein p (x | 1) represents a probability density function of a first Gaussian distribution model, and x represents the acquired sample data;
s22, extracting sample points to perform steady-state phase verification, namely from the Nth 1 The 2 sample points start to verify to find three continuous sample points meeting the first verification condition and mark the sequence numbers corresponding to the sample points meeting the first verification condition as sequence numbers
Figure FDA0003830541480000011
The verification condition is
Figure FDA0003830541480000012
Wherein ρ is a pre-specified threshold;
s23, judging N 1 Whether or not equal to
Figure FDA0003830541480000013
If yes, the result is converged, namely the first steady-state phase is determined and the next step is carried out; otherwise make
Figure FDA0003830541480000014
And returns to step S21 to iterate until N 1 Is equal to
Figure FDA0003830541480000015
Further, in the above-mentioned case,
the creating a gaussian mixture model containing two gaussian components to determine the next steady-state phase based on the previously determined gaussian model comprises:
s31, based on the previously determined Gaussian model, creating a mixed model containing two Gaussian distribution functions and training, assuming that the first c-1 steady-state phases are determined, c is an integer greater than or equal to 2, and the formula corresponding to the mixed model is
p(x|θ c )=α c-1 p(x|c-1)+α c p(x|c)
Wherein the probability density function p (x | c-1) of the c-1 st Gaussian model is determined and includes N c-1 The c-1 steady-state phase of each sample is X c-1 The probability density function p (x | c) of the c-th Gaussian model is to be determined, assuming that N is included c The c steady-state phase of each sample is X c Record X m ={X c-1 ,X c The training parameters θ corresponding to the training data of the Gaussian mixture model c ={α c-1ccc },α c-1 And alpha c Are the combined coefficients of the c-1 th and c-th Gaussian components in the Gaussian mixture model, respectively c Sum-sigma c Respectively training the mixed model by using a maximum expectation algorithm (EM algorithm) which is a mean vector and a variance matrix in the c-th Gaussian probability density function p (x | c);
s32, extracting a sample point to perform steady-state phase verification on the mixed model,
i.e. from N c-1 The 2 sample points start to verify to find three continuous sample points meeting the second verification condition and mark the sequence numbers corresponding to the sample points meeting the second verification condition as
Figure FDA0003830541480000021
The verification condition is
Figure FDA0003830541480000022
Where ρ is a pre-specified threshold;
s33, judgment
Figure FDA0003830541480000023
Whether or not equal to N c +N c-1 If yes, the result convergence is shown, namely the c-th steady-state phase is determined; otherwise make
Figure FDA0003830541480000024
And returns to step S31 to iterate until
Figure FDA0003830541480000025
Is equal to N c +N c-1
2. The phase partitioning method according to claim 1, wherein the determining a possible transient phase between two steady-state phases based on the determined two adjacent steady-state phase models comprises: from two determined two steady-state phases X c-1 And X c Starting from the first sample point of the c-th steady-state phase, a test is performed to find the consecutive satisfaction of p (x) n The sample point of | c) < ρ is recorded as the transient phase X c-1,c
3. A phase partitioning system based on a local gaussian mixture model, comprising:
an acquisition unit for acquiring samples and creating a historical training data set;
a first Gaussian distribution creating unit, configured to select a part of samples from the historical training data set according to a sampling time order to create a first Gaussian distribution model to determine first steady-state phase data;
the Gaussian mixture model creating unit is used for creating a Gaussian mixture model containing two Gaussian components based on the previously determined Gaussian model so as to determine the next steady-state phase and complete phase division of all sample data by matching with the transient phase acquisition unit;
a transient phase acquisition unit for determining a transient phase that may exist between two steady-state phases based on the determined two adjacent steady-state phase models;
wherein the first gaussian distribution creating unit includes:
a first data acquisition module for sequentially selecting N before the historical training data set 1 Calculating the mean value and the variance of each sample to obtain a corresponding Gaussian distribution function p (x | 1), wherein p (x | 1) represents a probability density function of a first Gaussian distribution model, and x represents the acquired sample data;
a first steady-state phase verification module for extracting sample points for steady-state phase verification, i.e. from the Nth 1 The 2 sample points start to verify to find three continuous sample points meeting the first verification condition and mark the sequence numbers corresponding to the sample points meeting the first verification condition as
Figure FDA0003830541480000031
The verification condition is
Figure FDA0003830541480000032
Where ρ is a pre-specified threshold;
a first steady-state phase determination module for determining N 1 Whether or not equal to
Figure FDA0003830541480000033
If yes, the result is converged, namely the first steady-state phase is determined and the next step is carried out; otherwise make
Figure FDA0003830541480000034
And the first steady-state phase verification module iterates again until N 1 Is equal to
Figure FDA0003830541480000035
The Gaussian mixture model creating unit includes:
a second data obtaining module, configured to create and train a gaussian mixture model including two gaussian components based on a previously determined gaussian model, assuming that c-1 previous steady-state phases have been determined, c is an integer greater than or equal to 2, and a formula corresponding to the mixture model is
p(x|θ c )=α c-1 p(x|c-1)+α c p(x|c)
Wherein the probability density function p (x | c-1) of the c-1 st Gaussian model is determined and includes N c-1 The c-1 steady-state phase of each sample is X c-1 The probability density function p (x | c) of the c-th Gaussian model is to be determined, assuming that N is included c The c steady-state phase of each sample is X c Record X m ={X c-1 ,X c The training parameters θ corresponding to the training data of the Gaussian mixture model c ={α c-1ccc },α c-1 And alpha c Are the combined coefficients of the c-1 th and the c-th Gaussian components in the Gaussian mixture model, mu c Sum-sigma c Respectively training the mixed model by using a maximum expectation algorithm, namely an EM algorithm, namely a mean vector and a variance matrix in a c-th Gaussian probability density function p (x | c);
a second steady-state phase verification module for extracting sample points to perform steady-state phase verification on the mixed model, i.e. from N c-1 The 2 sample points start to verify to find three continuous sample points meeting the second verification condition and mark the sequence numbers corresponding to the sample points meeting the second verification condition as
Figure FDA0003830541480000041
The verification condition is
Figure FDA0003830541480000042
Where ρ is a pre-specified threshold;
a second steady state phase determination module for determining
Figure FDA0003830541480000043
Whether or not equal to N c +N c-1 If yes, the result is converged, namely the c-th steady-state phase is determined; otherwise make the instruction
Figure FDA0003830541480000044
And the second steady-state phase verification module iterates again until
Figure FDA0003830541480000045
Is equal to N c +N c-1
4. The system of claim 3, wherein the processing of the transient phase acquisition unit comprises: from two determined two steady-state phases X c-1 And X c Starting from the first sample point of the c-th steady-state phase, checking and finding the continuous satisfaction of p (x) n The sample point of | c) < ρ is recorded as the transient phase X c-1,c
CN201910571289.5A 2019-06-26 2019-06-26 Transient phase partitioning method and system based on local Gaussian mixture model Active CN110309491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910571289.5A CN110309491B (en) 2019-06-26 2019-06-26 Transient phase partitioning method and system based on local Gaussian mixture model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910571289.5A CN110309491B (en) 2019-06-26 2019-06-26 Transient phase partitioning method and system based on local Gaussian mixture model

Publications (2)

Publication Number Publication Date
CN110309491A CN110309491A (en) 2019-10-08
CN110309491B true CN110309491B (en) 2022-10-14

Family

ID=68077805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910571289.5A Active CN110309491B (en) 2019-06-26 2019-06-26 Transient phase partitioning method and system based on local Gaussian mixture model

Country Status (1)

Country Link
CN (1) CN110309491B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451101A (en) * 2017-07-21 2017-12-08 江南大学 It is a kind of to be layered integrated Gaussian process recurrence soft-measuring modeling method
CN108804784A (en) * 2018-05-25 2018-11-13 江南大学 A kind of instant learning soft-measuring modeling method based on Bayes's gauss hybrid models

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101300247B1 (en) * 2011-11-11 2013-08-26 경희대학교 산학협력단 Markov chain hidden conditional random fields model based pattern recognition method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451101A (en) * 2017-07-21 2017-12-08 江南大学 It is a kind of to be layered integrated Gaussian process recurrence soft-measuring modeling method
CN108804784A (en) * 2018-05-25 2018-11-13 江南大学 A kind of instant learning soft-measuring modeling method based on Bayes's gauss hybrid models

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于数据驱动的全工况下燃气轮机基准值确定;王仲等;《自动化仪表》;20190420(第04期);全文 *
基于高斯混合模型的EM算法改进与优化;王凯南等;《工业控制计算机》;20170525(第05期);全文 *

Also Published As

Publication number Publication date
CN110309491A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
CN108399428B (en) Triple loss function design method based on trace ratio criterion
Oliveto et al. Simplified drift analysis for proving lower bounds in evolutionary computation
AU2012225149B2 (en) Multidimensional cluster analysis
CN113140018B (en) Method for training countermeasure network model, method for establishing word stock, device and equipment
CN113011337B (en) Chinese character library generation method and system based on deep meta learning
JP2022530447A (en) Chinese word division method based on deep learning, equipment, storage media and computer equipment
CN112632319B (en) Method for improving overall classification accuracy of long-tail distributed speech based on transfer learning
CN110309491B (en) Transient phase partitioning method and system based on local Gaussian mixture model
CN109597901B (en) Data analysis method based on biological data
CN110110860B (en) Self-adaptive data sampling method for accelerating machine learning training
Alawieh et al. Rethinking sparsity in performance modeling for analog and mixed circuits using spike and slab models
CN108021985A (en) A kind of model parameter training method and device
CN103268614B (en) A kind of for many prospects be divided into cut prospect spectrum drawing generating method
CN113435519B (en) Sample data enhancement method, device, equipment and medium based on countermeasure interpolation
CN113946424A (en) Software and hardware division and task scheduling model based on graph convolution network and method thereof
CN115035304A (en) Image description generation method and system based on course learning
CN109871612B (en) Heterogeneous catalysis surface coverage obtaining method combining ODE integration and Newton method iteration
CN108898321B (en) Semantic template-based method for acquiring standard conflict parameters of manufacturing technical problem
Lindegger et al. RawAlign: Accurate, fast, and scalable raw nanopore signal mapping via combining seeding and alignment
Korablyov et al. Dendritic Artificial Immune Network Model for Computing.
CN112951337A (en) Molecular fingerprint generation method
CN111445949A (en) Method for annotating genome of high-altitude polyploid fish by using nanopore sequencing data
CN110378390A (en) A kind of figure classification method of multitask
CN110988787A (en) Method for realizing optimal direction angle measurement based on cluster analysis in wireless signal direction-finding monitoring
CN111276188B (en) Short-time-sequence gene expression data clustering method based on angle characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant