CN112800678B - Multi-task selection model construction method, multi-task selective maintenance method and system - Google Patents

Multi-task selection model construction method, multi-task selective maintenance method and system Download PDF

Info

Publication number
CN112800678B
CN112800678B CN202110123843.0A CN202110123843A CN112800678B CN 112800678 B CN112800678 B CN 112800678B CN 202110123843 A CN202110123843 A CN 202110123843A CN 112800678 B CN112800678 B CN 112800678B
Authority
CN
China
Prior art keywords
maintenance
task
function
component
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110123843.0A
Other languages
Chinese (zh)
Other versions
CN112800678A (en
Inventor
皮德常
徐悦
陈阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110123843.0A priority Critical patent/CN112800678B/en
Publication of CN112800678A publication Critical patent/CN112800678A/en
Application granted granted Critical
Publication of CN112800678B publication Critical patent/CN112800678B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)

Abstract

The invention relates to a multitask selection model construction method, a multitask selective maintenance method and a system, wherein modeling is built aiming at multitask selective maintenance by combining an effective life attenuation model and a risk function adjustment model, the problem that the model in the prior art can only be selectively maintained aiming at a single task is solved, and the reliability of a system and the precision of maintenance cost are calculated by using the model; the multi-task selective maintenance optimal problem is standardized into a discrete Markov decision process, and the neural network structure is adjusted to solve based on the reinforcement learning framework, so that the optimal maintenance strategy of each component during interruption and the optimal operation time of each maintenance task are realized, the maintenance efficiency and the maintenance cost are improved, and the defects of unstable dimension and dimension disaster are effectively overcome.

Description

Multi-task selection model construction method, multi-task selective maintenance method and system
Technical Field
The invention relates to the field of crossing technology of engineering application and information science, in particular to a method for constructing a multi-task selection model, a method for maintaining multi-task selectivity and a system thereof.
Background
Maintenance activities play an important role in the economics of the modern industry, with costs of about 28% and up to 40% of the total production costs. In general, components may be maintained during interrupts to improve the reliability of subsequent task execution. However, due to maintenance resource limitations (e.g., budgets, time, labor, and repair facilities), it is not possible to perform repair on all components. To overcome this problem, a decision maker selects several components during an interruption and repairs the selected components according to a corresponding repair level, a process called selective maintenance.
The research of the selective maintenance problem is focused on two aspects of a model establishment and a solving method, and belongs to the nonlinear integer programming problem. The existing selective maintenance model is limited to the state change of a single task, and mainly aims at the selective maintenance of the single task, and discusses the reliability evaluation and cost calculation of a component or a system under the conditions of multiple states, randomness, independence, ambiguity, uncertainty and the like. In addition, in the prior art, when reliability evaluation modeling is carried out, only life degradation brought by component maintenance is considered, so that reliability evaluation results are inaccurate.
Based on this, there is a need in the art for a method of constructing a maintenance model that selectively maintains multiple tasks to improve the reliability assessment of the system.
Disclosure of Invention
The invention aims to provide a multi-task selection model construction method, a multi-task selection maintenance method and a system, which realize multi-task modeling, and can realize accurate evaluation and calculation of the reliability and maintenance cost of the system according to the model.
In order to achieve the above object, the present invention provides the following solutions:
a method for constructing a multitasking selection model comprises the following steps:
Calculating a risk rate function of each maintenance task under different maintenance strategies by adopting an effective life attenuation model and a risk function adjustment model, wherein the maintenance strategies comprise a component non-maintenance strategy, a component replacement strategy and an imperfect maintenance strategy;
Calculating a reliability function and a cost function according to the risk rate function;
establishing a multi-task selection model according to the reliability function and the cost function;
evaluating the reliability of a system where the component is located and the maintenance cost of the component according to the multi-task selection model;
and selecting a maintenance strategy for the assembly at each maintenance task according to the reliability and the maintenance cost.
A method of multitasking selective maintenance, comprising:
Constructing a target maintenance model by taking the minimum cost function as a target when the reliability function in the multi-task selection model meets a reliability preset value;
Converting the target maintenance model into a Markov decision model, wherein the Markov decision model is a four-tuple function comprising a state space, an action space, a reward function and a state transfer function, the action space represents a set of all actions, and the actions comprise maintenance strategies of the component in each interruption time and running time of each maintenance task;
Training, optimizing and solving the Markov decision model by adopting a reinforcement learning framework and a neural network;
When the training times meet the predicted training times, outputting an optimal maintenance strategy for all components in each interruption time and the optimal running time of each maintenance task;
and maintaining the assembly according to the optimal maintenance strategy and the optimal running time.
A multitasking selective maintenance system comprising:
the model construction module is used for constructing a target maintenance model by taking the minimum cost function as a target when the reliability function in the multi-task selection model meets the reliability preset value;
a model conversion module for converting the target maintenance model into a Markov decision model, wherein the Markov decision model is a four-tuple function comprising a state space, an action space, a reward function and a state transfer function, and the action space comprises a maintenance strategy for the component during each interruption and a running time of each maintenance task;
The solving module is used for carrying out training optimization solving on the Markov decision model by adopting a reinforcement learning framework and a neural network;
the output module is used for outputting the optimal maintenance strategies of all components and the optimal running time of the maintenance tasks when the training times meet the predicted training times;
and the maintenance module is used for maintaining the assembly according to the optimal maintenance strategy and the optimal running time.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
The invention solves the problem that the model in the prior art can only be selectively maintained for a single task by constructing a multi-task selection model, and has strong applicability;
when the reliability and the maintenance cost of the component system are evaluated by the multitask selection model, the reliability and the cost calculation precision are improved by combining the effective life attenuation model and the risk function adjustment model.
The invention adjusts the neural network structure based on the reinforcement learning framework, solves the multi-task selection model, and effectively solves the problems of unstable dimension and dimension disaster; on the premise of meeting specific system resources (preset time, required reliability, limited manpower and maintenance facilities), the optimal maintenance strategy of each component during interruption and the optimal operation time of each maintenance task are realized, and the maintenance efficiency and the maintenance cost are improved.
The invention can be applied to the selective maintenance system with multiple tasks, various scales and types after expansion.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for constructing a multi-task selection model according to embodiment 1 of the present invention;
FIG. 2 is a maintenance task sequence chart of applying a risk ratio function corresponding to an imperfect maintenance strategy to maintenance tasks provided in embodiment 1 of the present invention;
fig. 3 is a flowchart of a method for selectively maintaining multiple tasks provided in embodiment 2;
fig. 4 is a block diagram of a multitasking selective maintenance system according to embodiment 3.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a multi-task selection model construction method, a multi-task selection maintenance method and a system, which realize multi-task modeling, and can realize accurate evaluation and calculation of the reliability and maintenance cost of the system according to the model.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
Example 1:
referring to fig. 1, the present invention provides a method for constructing a multitasking selection model, including:
step S1: calculating a risk rate function of each maintenance task under different maintenance strategies by adopting an effective life attenuation model and a risk function adjustment model, wherein the maintenance strategies comprise a component non-maintenance strategy, a component replacement strategy and an imperfect maintenance strategy;
The decision maker does not take any maintenance action on the component, namely, takes a maintenance-free strategy, and the component state under the strategy is in a 'recovered as old' state because the component state under the strategy is not improved, and the risk rate function is maintained unchanged, and the corresponding formula is as follows:
hi,k(tk+x)=hi,k-1(Bi,k-1+x)
Wherein x represents the time of the component i from the last maintenance to the kth maintenance task, k is a positive integer greater than or equal to 1, h i,k () is the risk ratio function of the kth maintenance task to the component i, h i,k-1 () is the risk ratio function of the kth-1 maintenance task to the component i, t k is the logic running time of the system where the component i is located, and B i,k-1 is the effective life of the component i before maintenance;
The process of repairing or replacing the component by the decision maker is to take a component replacement strategy, under which the state of the component is from the new state, called "recovery as new", and the risk ratio function is reset to the function at the beginning, and the corresponding formula is:
hi,k(tk+x)=hi,0(x)
wherein h i,0 () is a risk function before the component i is not repaired;
Component states under the imperfect repair policy (IM) are between "restored as old" and "restored as new". Fig. 2 is a maintenance task sequence diagram of applying a risk ratio function corresponding to an imperfect maintenance strategy to a maintenance task, where tm k and tb k represent the running time of the kth maintenance task and the maintenance time spaced from the kth maintenance task, and t k and B k are the logic running time at the beginning and the effective lifetime at the end of the kth maintenance task, respectively.
The risk rate function corresponding to the imperfect maintenance strategy is:
hi,k(tk+x)=Ai,k-1·hi,k-1(bi,k-1·Bi,k-1+x)
Wherein A i,k-1 is the accumulation of the risk factor adjustment parameters a k' for the previous k-1 maintenance tasks for the component i, B i,k-1 is the effective life decay parameter of the component i, and k 'is the kth' service interval.
Step S2: calculating a reliability function and a cost function according to the risk rate function;
the reliability function is expressed as:
Wherein R i is the reliability of the system in which the component i is located, D k is the running time of the kth maintenance task for the component i, x represents the time from the last maintenance of the component i to the kth maintenance task, h i,k () is the risk ratio function of the kth maintenance task for the component i, and t k is the logic running time of the system in which the component i is located;
The cost function is expressed as:
Wherein C i is the maintenance cost for the component i, For a fixed repair cost for the component i, c i (l) is the cost of taking a repair policy for the component i, when l=1, selecting the component not repair policy; selecting the replacement component policy when l=2; when l=3, the imperfect repair strategy is selected.
Step S3: establishing a multi-task selection model according to the reliability function and the cost function;
step S4: evaluating the reliability of a system where the component is located and the maintenance cost of the component according to the multi-task selection model;
step S5: and selecting a maintenance strategy for the assembly at each maintenance task according to the reliability and the maintenance cost.
The method solves the problem that the model in the prior art can only be selectively maintained for a single task by constructing a multi-task selection model, and has strong applicability; in the embodiment, the risk rate adjustment and the effective life attenuation are considered simultaneously, the risk rate functions corresponding to different maintenance strategies are calculated, and when the risk rate functions are further applied to the reliability evaluation of the system, the accuracy can be greatly improved.
Example 2:
different maintenance strategies can lead to changes in system status. Which repair strategy is to be taken on the component under different system conditions so that minimizing costs under specific system resources (pre-set time, required reliability, defined manpower and repair facilities) is a problem currently being investigated universally in the art.
Because of the large and complex scale of the solution space of this problem, the current main solution methods include evolutionary algorithms and reinforcement learning methods. The evolutionary algorithm, such as genetic algorithm, differential evolutionary algorithm, particle swarm optimization algorithm and the like, can well solve the optimization problem of NP-difficult (NP-Hard problem, namely, non-deterministic problem of polynomial complexity); however, when the problem dimension is variable, the conventional evolutionary algorithm cannot be directly applied to problem solving. The conventional reinforcement learning method has a dimension disaster problem, especially when faced with a large-scale complex system.
In order to overcome the defects in the prior art, referring to fig. 3, the present invention further provides a method for selectively maintaining multiple tasks, including:
Step Sa: constructing a target maintenance model by taking the minimum cost function as a target when the reliability function in the multi-task selection model meets a reliability preset value;
the target maintenance model is expressed as:
Wherein s.t. denotes a constraint, C denotes a cost function in the target maintenance model, i, n denotes the number of components, V represents the number of subsystems included in the system where the component is located, v ' represents a specific subsystem, n v' represents the number of components included in the v ' subsystem, R 0 represents the lower limit of the reliability preset value, R represents the reliability function in the target maintenance model, v ' is the v ' subsystem, and u ' is the u ' component in the v ' subsystem.
Step Sb: the target maintenance model is converted into a Markov decision model, wherein the Markov decision model is a four-element function comprising a state space, an action space, a rewarding function and a state transfer function, the action space represents a set of all actions, and the actions comprise maintenance strategies of the component in each interruption time and running time of each maintenance task.
Since the transition of system component states (e.g., useful life, risk functions, reliability, etc.) is independent of historical values, the target maintenance model may be converted to a Markov decision model. The four-tuple function can be expressed as (S, a, P, R), where:
S is a state space, representing a set of all states of the system. For a component i at the kth maintenance task, its state i (k) may include the accumulation of component risk rate adjustment parameters a i,k, the effective lifetime of the component at the end of the maintenance task B i,k, the system reliability R i,k, the logical runtime t k of the overall system;
Component state i (k) is expressed as:
statei(k)=[Ai,k,Bi,k,Ri,k,tk];
The system state is expressed as:
state(k)=[state1(k),state2(k),...,staten(k)];
A is an action space, representing a set of all actions. Action i (k) for a component i at the kth maintenance task, including all possible maintenance policies l i,k at each interruption and the run length D k of the system at the kth maintenance task; the whole action space comprises maintenance strategies of various components and the whole operation time of the system, namely action (k) = [ l 1,k,...,ln,k,Dk ], wherein n represents the number of the components;
r is a reward function representing a set of all rewares, i.e. the value of the benefit obtained after an action is performed on the current state. In this embodiment, whether the cost function meets the requirement of the system reliability is taken as a reward function, and the calculation formula is as follows:
reward(k)=-(C+(R<R0)×ζ)
Zeta is a punishment coefficient when the reliability requirement is not met, and if the system reliability R is lower than the reliability value R 0 of the requirement, the punishment coefficient is added to the rewarding function;
P is a state transfer function, which is used for obtaining a transfer function of a next state after executing a certain action on a current state, and can refer to different maintenance strategies, namely, the calculation of a component risk rate function is caused, and the method is concretely as follows: when adopting a maintenance-free strategy, the risk rate adjustment parameter A i,k+1 is kept unchanged, namely A i,k+1=Ai,k, and the effective life attenuation parameter B i,k+1 is added with the running time of the current maintenance task, namely B i,k+1=Bi,k+Dk; when the component replacement strategy is adopted, the risk rate adjustment parameter is set to be an initial value, namely A i,k+1 =1, and the effective life attenuation parameter B i,k+1 is equal to the running time of the current maintenance task, namely B i,k+1=Dk; when an imperfect maintenance strategy is adopted, a risk rate adjustment parameter A i,k+1=ai,k×Ai,k is adjusted, and a valid life attenuation parameter B i,k+1=bi,k×Bi,k+Dk is adjusted; the logical runtime t k is equal to the logical runtime itself plus the current runtime and interrupt time.
Step Sc: training, optimizing and solving the Markov decision model by adopting a reinforcement learning framework and a neural network, wherein the training, optimizing and solving method specifically comprises the following steps:
Initializing a first maintenance strategy neural network Q act, a parameter theta act thereof and a target maintenance strategy neural network And its parameter/>First maintenance task operation time length neural network Q dura, parameter theta dura of first maintenance task operation time length neural network Q dura and target maintenance task operation time length neural network/>And its parameter/>
According to the four-tuple function, obtaining the input of the first maintenance strategy neural network and the first maintenance task operation time length neural network, specifically comprising: initializing a system state (k) at the kth task according to the state space S; according to the current system state (k) and the action space A, the input of a first maintenance strategy neural network and a first maintenance task operation time length neural network is obtained respectively: input act (k) and input dura (k);
storing the four-tuple function into a cache area;
selecting a certain number of samples { state (j), action (j), rewind (j+1), state (j+1) } from the buffer;
According to a maintenance strategy target value target act (j) and a maintenance task operation duration target value target dura (j) of the sample, respectively carrying out gradient descent solution on the target maintenance strategy neural network Q act and the target maintenance task operation duration neural network Q dura;
Further, the specific calculation of the target act (j) and the target dura (j) includes:
Wherein target act (j) represents a maintenance policy target value corresponding to the jth maintenance task, target dura (j) represents an operation duration target value corresponding to the jth maintenance task, raward (j+1) represents a bonus function corresponding to the jth maintenance task, Representing a state-action function corresponding to the target maintenance strategy neural network,/>Representing a state-action function corresponding to the target maintenance task duration neural network, wherein state (j+1) represents a state space corresponding to a j+1st maintenance task, action act (j) represents an action space corresponding to the target maintenance strategy neural network when the j+1st maintenance task is performed, and action dura (j) represents a state space corresponding to the target maintenance task duration neural network when the j+1st maintenance task is performed,/>Representing the target repair strategy neural network parameters,/>Representing the neural network parameters of the target maintenance task duration, wherein gamma represents the discount coefficient between the current income and the long-term maximum income,/>Representation is such that/>Maximum maintenance strategy,/>Representation is such that/>Maximum maintenance strategy;
updating the target maintenance strategy neural network and the target maintenance task operation time length neural network under a preset number of times, namely
When the logic running time of the system where the component is located is greater than the preset system running time T (namely, the state (j+1) is in a termination state), the step of storing the four-tuple function into a cache area is shifted; otherwise, when the training times meet the predicted training times, outputting the optimal maintenance strategy for all the components in each interruption time and the optimal running time of each maintenance task.
Step Sd: when the training times meet the predicted training times, outputting an optimal maintenance strategy for all components in each interruption time and the optimal running time of each maintenance task;
Step Se: and maintaining the assembly according to the optimal maintenance strategy and the optimal running time.
As an optional implementation manner, storing the four-tuple function in a buffer area specifically includes:
And selecting maintenance action act (k) and operation duration action dura (k) corresponding to the kth maintenance task according to the epsilon strategy: namely, if one random number is smaller than epsilon value, randomly selecting maintenance action and operation duration; otherwise, the maintenance action and the operation time length are selected under the condition of the maximum state-action function, and the calculation formula is as follows:
Wherein argmax represents a function for parameterizing the state-action function, Q act () represents a state-action function corresponding to the first maintenance policy neural network, Q dura () represents a state-action function corresponding to the first maintenance task duration neural network, input act (k) represents an input of the first maintenance policy neural network, input dura (k) represents an input of the first maintenance task duration neural network, action act represents an action space corresponding to the first maintenance policy neural network, action dura represents an action space corresponding to the first maintenance task duration neural network, θ act represents the first maintenance policy neural network parameter, and θ dura represents the first maintenance task duration neural network parameter;
The state-action value function Q (state (k), action (k)) represents the quality of long-term return obtained by selecting actions according to the policy pi under the condition of a known state, and is denoted as Q value. That is, implementing different repair actions and run times for different system states maximizes the long-term benefit Q value (i.e., minimizes the long-term consumption cost C) under conditions that meet the particular system resources;
enabling an action space action (k) = [ action act(k),actiondura (k) ] corresponding to the kth maintenance task;
Obtaining a state space state (k+1) of a next task according to a state space of a current task, and obtaining a reward function reward (k+1) of the next task according to a reward function of the current task;
And storing { state (k), action (k), rewind (k+1), state (k+1) } into the cache region D, wherein state (k) represents a system state corresponding to the kth maintenance task.
The neural network structure is adjusted based on the reinforcement learning framework, the multi-task selection model is solved, the problems of unstable dimension and dimension disaster are effectively solved, and the method is suitable for multi-task and multi-type selective maintenance models.
Example 3:
referring to fig. 4, the present invention also provides a system for selectively maintaining multiple tasks, comprising:
the model construction module M1 is used for constructing a target maintenance model by taking the minimum cost function as a target when the reliability function in the multi-task selection model meets the reliability preset value;
A model conversion module M2, configured to convert the target maintenance model into a markov decision model, where the markov decision model is a four-tuple function including a state space, an action space, a reward function, and a state transfer function, and the action space includes a maintenance policy for the component during each interruption and a runtime for each maintenance task;
the solving module M3 is used for carrying out training optimization solving on the Markov decision model by adopting a reinforcement learning framework and a neural network;
The output module M4 is used for outputting the optimal maintenance strategy of all components and the optimal running time of the maintenance task when the training times meet the predicted training times;
and a maintenance module M5, configured to maintain the assembly according to the optimal maintenance strategy and the optimal running time.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (4)

1. A method of multitasking selective maintenance comprising:
Constructing a target maintenance model by taking the minimum cost function as a target when the reliability function in the multi-task selection model meets a reliability preset value;
Converting the target maintenance model into a Markov decision model, wherein the Markov decision model is a four-tuple function comprising a state space, an action space, a reward function and a state transfer function, the action space represents a set of all actions, and the actions comprise maintenance strategies of the component in each interruption time and running time of each maintenance task;
Training, optimizing and solving the Markov decision model by adopting a reinforcement learning framework and a neural network;
When the training times meet the predicted training times, outputting an optimal maintenance strategy for all components in each interruption time and the optimal running time of each maintenance task;
Repairing the assembly according to the optimal repair strategy and the optimal running time;
The method for constructing the multi-task selection model comprises the following steps:
Calculating a risk rate function of each maintenance task under different maintenance strategies by adopting an effective life attenuation model and a risk function adjustment model, wherein the maintenance strategies comprise a component non-maintenance strategy, a component replacement strategy and an imperfect maintenance strategy;
Calculating a reliability function and a cost function according to the risk rate function;
establishing a multi-task selection model according to the reliability function and the cost function;
evaluating the reliability of a system where the component is located and the maintenance cost of the component according to the multi-task selection model;
Selecting a maintenance strategy for the assembly during each maintenance task according to the reliability and the maintenance cost;
the risk rate function corresponding to the component non-maintenance strategy is as follows:
hi,k(tk+x)=hi,k-1(Bi,k-1+x)
Wherein x represents the time of a component i from the last maintenance to the kth maintenance task, k is a positive integer greater than or equal to 1, h i,k () is the risk ratio function of the kth maintenance task to the component i, h i,k-1 () is the risk ratio function of the kth-1 to the maintenance task to the component i, t k is the logic running time of the system where the component i is located, and B i,k-1 is the effective life of the component i before maintenance;
the risk rate function corresponding to the replacement component strategy is as follows:
hi,k(tk+x)=hi,0(x)
wherein h i,0 () is a risk function before the component i is not repaired;
The risk rate function corresponding to the imperfect maintenance strategy is as follows:
hi,k(tk+x)=Ai,k-1·hi,k-1(bi,k-1·Bi,k-1+x)
Wherein A i,k-1 is the accumulation of the risk factor adjustment parameters a k' for the previous k-1 maintenance tasks for the component i, B i,k-1 is the effective life decay parameter of the component i, k 'is the kth' service interval;
the reliability function is expressed as:
Wherein R i is the reliability of the system in which the component i is located, D k is the running time of the kth maintenance task for the component i, x represents the time from the last maintenance of the component i to the kth maintenance task, h i,k () is the risk ratio function of the kth maintenance task for the component i, and t k is the logic running time of the system in which the component i is located;
The cost function is expressed as:
Wherein C i is the maintenance cost for the component i, For a fixed repair cost for the component i, c i (l) is the cost of taking a repair policy for the component i, when l=1, selecting the component not repair policy; selecting the replacement component policy when l=2; selecting the imperfect repair strategy when l=3;
the training optimization solution for the Markov decision model by adopting a reinforcement learning framework and a neural network specifically comprises the following steps:
Initializing a first maintenance strategy neural network Q act, a parameter theta act thereof and a target maintenance strategy neural network And parameters thereofFirst maintenance task operation time length neural network Q dura, parameter theta dura of first maintenance task operation time length neural network Q dura and target maintenance task operation time length neural network/>And its parameter/>
Obtaining the input of the first maintenance strategy neural network and the first maintenance task operation time length neural network according to the four-tuple function;
storing the four-tuple function into a cache area;
Selecting a certain number of samples from the buffer area;
according to the maintenance strategy target value and the maintenance task operation time length target value of the sample, respectively carrying out gradient descent solution on the target maintenance strategy neural network and the target maintenance task operation time length neural network;
updating the target maintenance strategy neural network and the target maintenance task operation duration neural network under a preset number of times;
when the logic running time of the system where the component is located is larger than the preset system running time, the step of storing the four-element function into a cache area is carried out; otherwise, when the training times meet the predicted training times, outputting the optimal maintenance strategy of all components in each interruption time and the optimal running time of each maintenance task;
the storing the four-tuple function in the buffer area specifically includes:
Selecting a maintenance action act (k) and a running duration action dura (k) corresponding to the kth maintenance task according to an epsilon strategy;
enabling an action space action (k) = [ action act(k),actiondura (k) ] corresponding to the kth maintenance task;
Obtaining a state space state (k+1) of a next task according to a state space of a current task, and obtaining a reward function reward (k+1) of the next task according to a reward function of the current task;
Storing { state (k), action (k), rewind (k+1), state (k+1) } into the cache region D, wherein state (k) represents a system state corresponding to the kth maintenance task;
the concrete calculation of the maintenance strategy target value and the maintenance task operation duration target value of the sample comprises the following steps:
Wherein target act (j) represents a maintenance policy target value corresponding to the jth maintenance task, target dura (j) represents an operation duration target value corresponding to the jth maintenance task, raward (j+1) represents a bonus function corresponding to the jth maintenance task, Representing a state-action function corresponding to the target maintenance strategy neural network,/>Representing a state-action function corresponding to the target maintenance task duration neural network, wherein state (j+1) represents a state space corresponding to a j+1st maintenance task, action act (j) represents an action space corresponding to the target maintenance strategy neural network when the j+1st maintenance task is performed, and action dura (j) represents a state space corresponding to the target maintenance task duration neural network when the j+1st maintenance task is performed,/>Representing the target repair strategy neural network parameters,/>Representing the neural network parameters of the target maintenance task duration, wherein gamma represents the discount coefficient between the current income and the long-term maximum income,/>Representation is such that/>Maximum maintenance strategy,/>Representation is such that/>Maximum maintenance strategy.
2. The method of claim 1, wherein the target maintenance model is expressed as:
Wherein s.t. denotes a constraint, C denotes a cost function in the target maintenance model, i, n denotes the number of components, V represents the number of subsystems included in the system where the component is located, v ' represents a specific subsystem, n v' represents the number of components included in the v ' subsystem, R 0 represents the lower limit of the reliability preset value, R represents the reliability function in the target maintenance model, v ' is the v ' subsystem, and u ' is the u ' component in the v ' subsystem.
3. The method for selectively maintaining multiple tasks according to claim 1, wherein the selecting the maintenance action act (k) and the operation duration action dura (k) corresponding to the kth maintenance task according to the epsilon policy specifically includes:
Selecting a random number;
Judging the sizes of the random number and the epsilon value;
if the random number is smaller than the epsilon value, randomly selecting maintenance action and operation duration; otherwise, selecting the maintenance action and the operation duration under the condition of the maximum state-action function;
the calculation formula for selecting the maintenance action and the operation duration under the condition of the maximum state-action function is as follows:
Wherein argmax represents a function for parameterizing the state-action function, Q act () represents a state-action function corresponding to the first maintenance policy neural network, Q dura () represents a state-action function corresponding to the first maintenance task duration neural network, input act (k) represents an input of the first maintenance policy neural network, input dura (k) represents an input of the first maintenance task duration neural network, action act represents an action space corresponding to the first maintenance policy neural network, action dura represents an action space corresponding to the first maintenance task duration neural network, θ act represents the first maintenance policy neural network parameter, and θ dura represents the first maintenance task duration neural network parameter.
4. A multi-task selective maintenance system, comprising:
the model construction module is used for constructing a target maintenance model by taking the minimum cost function as a target when the reliability function in the multi-task selection model meets the reliability preset value;
a model conversion module for converting the target maintenance model into a Markov decision model, wherein the Markov decision model is a four-tuple function comprising a state space, an action space, a reward function and a state transfer function, and the action space comprises a maintenance strategy for the component during each interruption and a running time of each maintenance task;
The solving module is used for carrying out training optimization solving on the Markov decision model by adopting a reinforcement learning framework and a neural network;
the output module is used for outputting the optimal maintenance strategies of all components and the optimal running time of the maintenance tasks when the training times meet the predicted training times;
A repair module for repairing the assembly according to the optimal repair strategy and the optimal run time;
The method for constructing the multi-task selection model comprises the following steps:
Calculating a risk rate function of each maintenance task under different maintenance strategies by adopting an effective life attenuation model and a risk function adjustment model, wherein the maintenance strategies comprise a component non-maintenance strategy, a component replacement strategy and an imperfect maintenance strategy;
Calculating a reliability function and a cost function according to the risk rate function;
establishing a multi-task selection model according to the reliability function and the cost function;
evaluating the reliability of a system where the component is located and the maintenance cost of the component according to the multi-task selection model;
Selecting a maintenance strategy for the assembly during each maintenance task according to the reliability and the maintenance cost;
the risk rate function corresponding to the component non-maintenance strategy is as follows:
hi,k(tk+x)=hi,k-1(Bi,k-1+x)
Wherein x represents the time of a component i from the last maintenance to the kth maintenance task, k is a positive integer greater than or equal to 1, h i,k () is the risk ratio function of the kth maintenance task to the component i, h i,k-1 () is the risk ratio function of the kth-1 to the maintenance task to the component i, t k is the logic running time of the system where the component i is located, and B i,k-1 is the effective life of the component i before maintenance;
the risk rate function corresponding to the replacement component strategy is as follows:
hi,k(tk+x)=hi,0(x)
wherein h i,0 () is a risk function before the component i is not repaired;
The risk rate function corresponding to the imperfect maintenance strategy is as follows:
hi,k(tk+x)=Ai,k-1·hi,k-1(bi,k-1·Bi,k-1+x)
Wherein A i,k-1 is the accumulation of the risk factor adjustment parameters a k' for the previous k-1 maintenance tasks for the component i, B i,k-1 is the effective life decay parameter of the component i, k 'is the kth' service interval;
the reliability function is expressed as:
Wherein R i is the reliability of the system in which the component i is located, D k is the running time of the kth maintenance task for the component i, x represents the time from the last maintenance of the component i to the kth maintenance task, h i,k () is the risk ratio function of the kth maintenance task for the component i, and t k is the logic running time of the system in which the component i is located;
The cost function is expressed as:
Wherein C i is the maintenance cost for the component i, For a fixed repair cost for the component i, c i (l) is the cost of taking a repair policy for the component i, when l=1, selecting the component not repair policy; selecting the replacement component policy when l=2; selecting the imperfect repair strategy when l=3;
the training optimization solution for the Markov decision model by adopting a reinforcement learning framework and a neural network specifically comprises the following steps:
Initializing a first maintenance strategy neural network Q act, a parameter theta act thereof and a target maintenance strategy neural network And parameters thereofFirst maintenance task operation time length neural network Q dura, parameter theta dura of first maintenance task operation time length neural network and target maintenance task operation time length neural networkAnd its parameter/>
Obtaining the input of the first maintenance strategy neural network and the first maintenance task operation time length neural network according to the four-tuple function;
storing the four-tuple function into a cache area;
Selecting a certain number of samples from the buffer area;
according to the maintenance strategy target value and the maintenance task operation time length target value of the sample, respectively carrying out gradient descent solution on the target maintenance strategy neural network and the target maintenance task operation time length neural network;
updating the target maintenance strategy neural network and the target maintenance task operation duration neural network under a preset number of times;
when the logic running time of the system where the component is located is larger than the preset system running time, the step of storing the four-element function into a cache area is carried out; otherwise, when the training times meet the predicted training times, outputting the optimal maintenance strategy of all components in each interruption time and the optimal running time of each maintenance task;
the storing the four-tuple function in the buffer area specifically includes:
Selecting a maintenance action act (k) and a running duration action dura (k) corresponding to the kth maintenance task according to an epsilon strategy;
enabling an action space action (k) = [ action act(k),actiondura (k) ] corresponding to the kth maintenance task;
Obtaining a state space state (k+1) of a next task according to a state space of a current task, and obtaining a reward function reward (k+1) of the next task according to a reward function of the current task;
Storing { state (k), action (k), rewind (k+1), state (k+1) } into the cache region D, wherein state (k) represents a system state corresponding to the kth maintenance task;
the concrete calculation of the maintenance strategy target value and the maintenance task operation duration target value of the sample comprises the following steps:
Wherein target act (j) represents a maintenance policy target value corresponding to the jth maintenance task, target dura (j) represents an operation duration target value corresponding to the jth maintenance task, raward (j+1) represents a bonus function corresponding to the jth maintenance task, Representing a state-action function corresponding to the target maintenance strategy neural network,/>Representing a state-action function corresponding to the target maintenance task duration neural network, wherein state (j+1) represents a state space corresponding to a j+1st maintenance task, action act (j) represents an action space corresponding to the target maintenance strategy neural network when the j+1st maintenance task is performed, and action dura (j) represents a state space corresponding to the target maintenance task duration neural network when the j+1st maintenance task is performed,/>Representing the target repair strategy neural network parameters,/>Representing the neural network parameters of the target maintenance task duration, wherein gamma represents the discount coefficient between the current income and the long-term maximum income,/>Representation is such that/>Maximum maintenance strategy,/>Representation is such that/>Maximum maintenance strategy.
CN202110123843.0A 2021-01-29 2021-01-29 Multi-task selection model construction method, multi-task selective maintenance method and system Active CN112800678B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110123843.0A CN112800678B (en) 2021-01-29 2021-01-29 Multi-task selection model construction method, multi-task selective maintenance method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110123843.0A CN112800678B (en) 2021-01-29 2021-01-29 Multi-task selection model construction method, multi-task selective maintenance method and system

Publications (2)

Publication Number Publication Date
CN112800678A CN112800678A (en) 2021-05-14
CN112800678B true CN112800678B (en) 2024-05-28

Family

ID=75812761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110123843.0A Active CN112800678B (en) 2021-01-29 2021-01-29 Multi-task selection model construction method, multi-task selective maintenance method and system

Country Status (1)

Country Link
CN (1) CN112800678B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295897A (en) * 2016-08-15 2017-01-04 南京航空航天大学 Aircaft configuration based on risk with cost analysis checks mission planning method
CN106600095A (en) * 2016-07-27 2017-04-26 中国特种设备检测研究院 Reliability-based maintenance evaluation method
CN108038349A (en) * 2017-12-18 2018-05-15 北京航天测控技术有限公司 A kind of repair determining method of aircraft system health status
CN108573303A (en) * 2018-04-25 2018-09-25 北京航空航天大学 It is a kind of that recovery policy is improved based on the complex network local failure for improving intensified learning certainly
CN110427712A (en) * 2019-08-07 2019-11-08 广东工业大学 A kind of preventative maintenance method and Shop Floor based on failure effect analysis (FEA)
CN110909465A (en) * 2019-11-20 2020-03-24 北京航空航天大学 Cooperative game cluster visual maintenance method based on intelligent learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106600095A (en) * 2016-07-27 2017-04-26 中国特种设备检测研究院 Reliability-based maintenance evaluation method
CN106295897A (en) * 2016-08-15 2017-01-04 南京航空航天大学 Aircaft configuration based on risk with cost analysis checks mission planning method
CN108038349A (en) * 2017-12-18 2018-05-15 北京航天测控技术有限公司 A kind of repair determining method of aircraft system health status
CN108573303A (en) * 2018-04-25 2018-09-25 北京航空航天大学 It is a kind of that recovery policy is improved based on the complex network local failure for improving intensified learning certainly
CN110427712A (en) * 2019-08-07 2019-11-08 广东工业大学 A kind of preventative maintenance method and Shop Floor based on failure effect analysis (FEA)
CN110909465A (en) * 2019-11-20 2020-03-24 北京航空航天大学 Cooperative game cluster visual maintenance method based on intelligent learning

Also Published As

Publication number Publication date
CN112800678A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
Frank et al. Reinforcement learning in the presence of rare events
CN108520152A (en) A kind of the service life distribution determination method and system of engineering equipment
CN114742250A (en) Numerical control equipment operation fault prediction system based on data analysis
CN112800678B (en) Multi-task selection model construction method, multi-task selective maintenance method and system
CN110750455B (en) Intelligent online self-updating fault diagnosis method and system based on system log analysis
CN110210531B (en) Fuzzy multi-state manufacturing system task reliability evaluation method based on extended random flow network
CN117749775A (en) Real-time communication system and method suitable for non-stationary network environment
Djelloul et al. Optimal selective maintenance policy for series-parallel systems operating missions of random durations
CN113723766A (en) Cement production line progress monitoring and early warning method based on buffer area
CN117408672A (en) Intelligent expressway maintenance system
CN109740766B (en) Industrial equipment maintenance service planning method
CN111489027A (en) Hydroelectric generating set waveform data trend prediction method and system
CN111428356A (en) Maintenance method and system for newly developed degraded equipment
CN108134687B (en) Gray model local area network peak flow prediction method based on Markov chain
CN111210361B (en) Power communication network routing planning method based on reliability prediction and particle swarm optimization
CN112488390B (en) Urban water discharge prediction method and system
Zhao et al. Selective maintenance modeling for a multi-state system considering human reliability
CN113821323A (en) Offline job task scheduling algorithm oriented to hybrid deployment data center scene
Wang et al. HARRD: Real-time software rejuvenation decision based on hierarchical analysis under weibull distribution
CN118428674A (en) System and method for analyzing selective maintenance sequence of production demand multitasking equipment
CN110580548A (en) Multi-step traffic speed prediction method based on class integration learning
Hayane et al. Optimal routing and scheduling for unreliable Markovian systems modeled with Timed Petri nets
CN117709806B (en) Cooperative multi-equipment abnormality automatic detection method and detection system
CN115718669A (en) Real-time wind speed index calculation method and system for bridge structure health monitoring
Aboelmagd SMART CRITICAL PATH METHOD AS A MODIFIED DETAILED SCHEDULING TECHNIQUE

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant