CN113283171A - Industrial platform resource optimal allocation device and method - Google Patents

Industrial platform resource optimal allocation device and method Download PDF

Info

Publication number
CN113283171A
CN113283171A CN202110582489.8A CN202110582489A CN113283171A CN 113283171 A CN113283171 A CN 113283171A CN 202110582489 A CN202110582489 A CN 202110582489A CN 113283171 A CN113283171 A CN 113283171A
Authority
CN
China
Prior art keywords
resource
robot
appeal
unit
distribution system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110582489.8A
Other languages
Chinese (zh)
Inventor
吴帆
郭李毅
郑臻哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110582489.8A priority Critical patent/CN113283171A/en
Publication of CN113283171A publication Critical patent/CN113283171A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/04Constraint-based CAD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2111/00Details relating to CAD techniques
    • G06F2111/08Probabilistic or stochastic CAD

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Educational Administration (AREA)
  • Feedback Control In General (AREA)

Abstract

An industrial platform resource optimization allocation device and method comprises the following steps: a content distribution system and a repository, wherein: the content distribution system generates a resource prediction request of the robot and outputs the resource prediction request to the resource library, optimal resource allocation is carried out according to feedback of the resource library, the neural network model in the appeal prediction unit in the content distribution system is updated based on newly added data while the robot service process is achieved, the resource library receives the resource prediction request sent by the content distribution system, the optimal resource allocation which is potentially allocable is predicted, the resource application of the resource scheduling unit of the content distribution system is received, and resources are allocated based on the resource application. According to the robot resource allocation method, the robot appeal and the optimization target are modeled, the robot resource which can be allocated to the robot is recommended, the reasonability of resource allocation is obtained through the feedback of the robot, and the information asymmetry impasse between the server side and the robot is broken.

Description

Industrial platform resource optimal allocation device and method
Technical Field
The invention relates to a technology in the field of industrial mass information processing, in particular to an industrial platform resource optimal allocation device and method.
Background
With the development of informatization, the scale of the industrial field system is larger and larger. For example, in a large-scale distributed task or system (e.g., a crowd-sourcing task covering multiple regions, a content distribution task of a recommendation system in the internet, etc.), a robot (or an agent, etc.) based on an intelligent algorithm needs to complete its respective task. However, when the data size is large or the robot cannot disclose all its information to the server for some reason, the control server cannot store the information of all robots, and the server cannot simultaneously allocate required resources and tasks to all robots.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the device and the method for optimizing and distributing the resources of the industrial platform.
The invention is realized by the following technical scheme:
the invention relates to an industrial platform resource optimization allocation device, which comprises: a content distribution system and a repository, wherein: the content distribution system generates a resource prediction request of the robot and outputs the resource prediction request to the resource library, optimal resource allocation is carried out according to feedback of the resource library, the neural network model in the appeal prediction unit in the content distribution system is updated based on newly added data while the robot service process is achieved, the resource library receives the resource prediction request sent by the content distribution system, the optimal resource allocation which is potentially allocable is predicted, the resource application of the resource scheduling unit of the content distribution system is received, and resources are allocated based on the resource application.
The content distribution system includes: the device comprises an interaction unit, an appeal prediction unit, a feature storage unit, a resource scheduling unit and a network training unit, wherein: the interaction unit receives a resource request of the robot and sends the robot ID and the budget to the appeal prediction unit; the appeal prediction unit sends the robot ID to the feature storage unit; the feature storage unit sends the robot features to the appeal prediction unit; the neural network in the appeal prediction unit predicts the appeal of the robot based on the characteristics of the robot and sends the appeal and budget to the resource library; the appeal prediction unit sends a resource prediction result from the resource library to the interaction unit, and the interaction unit inquires whether the robot is adopted or not; when the robot adopts the resource scheduling result, the resource scheduling result authorized by the robot is sent to a resource scheduling unit; the resource scheduling unit sends a resource application request to a resource library; the resource scheduling unit sends the resource to the robot; and after the round of interaction is finished, the interaction unit sends the latest round of interaction data to the feature storage unit.
The neural network model is trained in the following way: the network training unit sends a data request to the feature storage unit; the feature storage unit sends the training data to the network training unit; the network training unit trains the neural network model and updates the neural network model in the appeal prediction unit.
Technical effects
The invention integrally solves the defect that the robot appeal is difficult to express clearly and further difficult to meet individually in the prior art due to the limitation of robot communication or expression capacity or the limitation of storage processing capacity and computing capacity of a resource distribution system, so that the system resource distribution efficiency is low.
Compared with the prior art, the method has the advantages that through modeling of the robot appeal and the optimization target, the resources which can be distributed to the robot are recommended to the robot, and the rationality of resource distribution is obtained through feedback of the robot. The designed system can collect the information of the robot on the demands of different resources according to the adoption behavior of the robot, and breaks the impasse of information asymmetry between the server and the robot.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a schematic diagram of the internal structure of a content distribution system;
FIG. 3 is a graph showing the results of comparative experiments on the models of the examples;
in the figure: a) cumulative expectation regret at different ratios Dropout, b) cumulative acceptance at different ratios Dropout;
FIG. 4 is a diagram illustrating the impact of related information in accordance with an exemplary embodiment;
in the figure: a) appeal of the impact of the relevant information on the cumulative expectation regret, b) appeal of the impact of the relevant information on the cumulative adoption rate.
Detailed Description
As shown in fig. 1, the present embodiment relates to an industrial platform information optimizing and distributing apparatus, including: a content distribution system and a repository, wherein: and the resource library returns resources to the content distribution system according to the applied resource budget, and the content distribution system predicts the appeal of the robot, performs optimal resource allocation on the robot and updates the neural network model in the appeal prediction unit 2 based on the newly added data. The content distribution system receives the budget information applied by the robot resource, performs appeal prediction based on the historical data of the robot, and allocates the optimal resource configuration for the robot.
As shown in fig. 2, the content distribution system includes: interaction unit 1, appeal prediction unit 2, characteristic memory unit 3, resource scheduling unit 4, network training unit 5, wherein: the interaction unit 1 receives a resource request of the robot and sends the robot ID and budget to the appeal prediction unit 2; the appeal prediction unit 2 sends the robot ID to the feature storage unit 3; the feature storage unit 3 sends the robot features to the appeal prediction unit 2; the neural network in the appeal prediction unit 2 predicts the appeal of the robot based on the characteristics of the robot and sends the appeal and budget to a resource library; the resource library calculates potential allocable resources and sends the potential allocable resources to the appeal prediction unit 2; the appeal prediction unit 2 sends the resource prediction result to the interaction unit 1, and the interaction unit 1 inquires whether the robot is adopted or not; the robot adopts the resource scheduling result, and the interaction unit 1 sends the resource scheduling result authorized by the robot to the resource scheduling unit 4; the resource scheduling unit 4 sends a resource application request to a resource library; the resource library allocates corresponding resources to the resource scheduling unit 4; the resource scheduling unit 4 sends the resource to the robot; the round of interaction is finished and the interaction unit 1 sends the latest round of interaction data to the feature storage unit 3.
The training process of the neural network model comprises the following steps: (a) the network training unit 5 sends a data request to the feature storage unit 3; (b) the feature storage unit sends the training data to the network training unit 5; (c) the network training unit trains the neural network model and updates the neural network model in the appeal prediction unit 2.
The embodiment relates to the above industrial platform information optimized distribution device, which performs industrial platform information optimized distribution processing in the following manner: when a robot initiates a resource application request, the content distribution system analyzes the relevant information of the robot from the application request, generates an estimated robot demand, sends the information of the robot demand, budget allocated to the robot and the like to a resource library, and inquires allocable resources; the resource pool estimates available resources according to the demand and budget provided by the content distribution system, and sets the predicted allocable resource v ═ v1,v2,…,vn]TAnd returning, the content distribution system sends the resource application result to the robot according to the resource application result, and distributes the real resource result based on the requirement to the robot through the resource library according to the adoption feedback signal of the robot.
In conclusion, the system can collect preference information of the robot on different resources according to the adoption behavior of the robot, break through the impasse of information asymmetry between the server and the robot, and better configure the whole resources in the resource library.
The allocable resources refer to: under various constraints such as budget, the resource results that the robot can obtain are specifically: v ═ v1,v2,…,vn]TWhich isThe method comprises the following steps: n represents the number of classes of the resource, value viRepresenting the amount of the ith dimension resource.
The relevant information of the robot comprises: the robot has a resource application budget and the robot has preferences for different resources, namely an appeal weight vector: w ═ w1,w2,…,wn]TWherein: w is aiAnd the preference weight of the robot to the i-dimensional report is represented.
This embodiment defines Π as a resource allocation method. When the appeal vector of the robot is w*The server may configure its optimal resources based on the appeal vector. Let the optimization objective be w*TV, i.e. a weighted sum of the resource outcomes based on the appeal. For appeal of w*The robot recommends an optimal resource allocation strategy pi to the robotw*I.e. to solve the optimization problem:
Figure BDA0003086551400000031
wherein: resource application result vΠFor the optimal solution, w, of the resource pool reachable based on the strategy Π*T·vΠThe utility that can be obtained for the robot based on policy Π. Thus, the weight vector w*The method can help the content distribution platform to find and obtain the most satisfactory resource application result from the resource library.
When making a recommendation, the recommendation is taken when the robot is satisfied with the expected results. Otherwise, the robot skips the recommendation. Based on the above observation, the present embodiment models the problem to be optimized into the context slot machine problem, and designs a specific algorithm program, which specifically includes:
1) the state is as follows: information about the robot that the platform can observe. Such as: robot characteristics, budgets that may be allocated for the robot, historical queries and adoption by the robot, and the like.
2) The actions are as follows: the robot appeals to the vector. The action selection space of the algorithm program is a high-dimensional continuous space. The algorithm program needs to send a request to the resource library according to constraint information in the state and actions selected by the algorithm program, namely, pre-estimated robot appeal vectors, and obtain resource recommendation items predicted by the resource library.
3) And (3) returning: the present embodiment sets the reward as the adoption behavior of the robot.
Based on the modeling, the context slot machine algorithm can continuously carry out strategy recommendation for visiting robots, and in each round of recommendation:
1) the algorithm program observes the state of the robot in the round of recommendations.
2) The algorithm program selects an appeal vector based on the state, transmits constraint information such as the appeal vector and budget and the like to the resource library, and the resource library carries out pre-estimated resource results. The algorithm program recommends the result to the robot and gets the feedback of the robot.
3) The algorithm program stores the observations of the round (robot state, appeal vector, resource allocation result, robot feedback) as training data to update its own intelligent recommendation strategy.
Estimating the action value: in a classic context slot machine algorithm, an algorithm program can pull a slot machine arm based on a certain strategy according to the observed context and learn the expected value of each optional action. In the problem of this embodiment, the reward of the server-side selection action is whether the robot is adopted or not, and the maximum reward is the most likely strategy recommended to the robot.
The action value estimation process in the embodiment includes:
1) action selection is performed based on the observable information and the action selection policy.
2) And establishing a relation among the selected action, the resource application result and the robot acquisition rate.
In this embodiment, first, a relation between robot information and action selection is described, that is, under information that can be observed by a server, a demand vector w is obtained based on a certain action selection policy, where: the function f is a multilayer perceptron and represents the mapping relation between the environment state x and the appeal information w, and the input x of the function f is the feature expression of the environment. In the problem of the present embodiment, the output of the network is w, and the supervision information (i.e., the value of the action) of the network is the behavior of the robot. Intuitively, let v beOptimal resource allocation result at w, wTThe value of v reflects the utility that the robot of appeal w can obtain on the platform. Thus, the acceptance rate and w of the robotTV is in some positive correlation.
In the present embodiment, p (adoption) ═ σ (w) is usedTV) represents the robot acceptance rate and wTV, wherein: sigma is sigmoid function, and the value range is [0, 1 ]]The best bid result v based on w is also part of the model input.
Based on the method, the estimation of the network on the action value can be updated in a gradient updating mode. For each round of gradient update, the present embodiment updates the parameters of the model through the loss function L, specifically:
Figure BDA0003086551400000041
Figure BDA0003086551400000042
wherein: collection
Figure BDA0003086551400000043
For the data set with the size of N in the round of updating, the environmental characteristic x and the resource application result v are input of the model, p (x, v) is the predicted adoption rate of the model, and the label y is an adoption label. In the training process, the environmental characteristic x needs to be input firstly to obtain the appeal output w of the model, a result v is obtained according to w, and finally the estimated acquisition rate p (x, v) of the model is obtained.
And (3) an action selection algorithm: the present embodiment uses Thompson sampling for motion selection, which is a popular means of trade-off between Exploration (Exploration) and Exploitation (Exploitation). Generally speaking, thompson sampling requires bayesian processing of model parameters. At each step, thompson sampling re-samples a new set of model parameters, and then makes action selections based on the set of parameters. This can be seen as a random check: more likely parameters will be sampled more frequently and thus rejected or confirmed more quickly.
The Thompson sampling comprises the following steps: sampling a new set of parameters of the model; selecting the action with the highest expected yield according to the sampling parameters; and updating the model parameters.
Thompson sampling a neural network model requires characterizing model uncertainty, and a bayesian model provides reasoning model uncertainty based on a mathematical framework, but usually with prohibitive computational costs. Dropout refers to temporarily discarding a portion of neurons from the neural network with a certain probability during the training of the neural network. Yarin et al, in Dropout as a Bayesian approximation, reconstruction model uncaptaiyin de left, propose to utilize Dropout as a Bayesian approximation method to represent model uncertainty in deep learning and to demonstrate a nonlinear neural network with arbitrary depth, which is mathematically equivalent to an approximation of a depth probability Gaussian process when Dropout is applied before each weight layer. Furthermore, Dropout, a simple and common technique for preventing overfitting of neural networks, has been widely used in training neural networks due to its ease of implementation, high performance efficiency, and great effectiveness. Thus, the present embodiment uses Dropout in the neural network for thopson sampling, which is very simple but effective.
In the experiment, the input characteristics of the model are appeal-related characteristics and historical acquisition information of the robot. For appeal-related features, the present embodiment concatenates them as one of the model's inputs. During the training process, the present embodiment trains the network model using a small batch of gradient descent. In order to prevent the proportion of positive and negative samples obtained by the model from changing along with the training process, thereby affecting the performance of the model, the embodiment sets the proportion of positive and negative samples in each training batch as 1: 1. the optimizer in the model training process is Adam, and the experimental results are shown in table 1.
The optimization goal of the context slot machine is to recommend the expectation regret for the T round, therefore, the embodiment can recommend the regret by accumulating the expectation regret
Figure BDA0003086551400000051
And cumulative acceptance rate
Figure BDA0003086551400000052
To evaluate the performance of the model, where T denotes that the experiment has performed T rounds of interaction, value
Figure BDA0003086551400000053
Indicating the t-th round based on the inner appeal of the robot
Figure BDA0003086551400000054
Recommended acceptance rate, value
Figure BDA0003086551400000055
W representing the output based on the motion selection algorithm in the t-th roundtA recommended adoption rate is made.
TABLE 1
Figure BDA0003086551400000061
The comparative experiment results are as follows: in a simulation experiment, the embodiment verifies the effectiveness of the context slot machine algorithm. In comparative experiments, this example compares the effect of the model when Dropout is not performed or is performed at different ratios. The embodiment also introduces a random appeal recommendation strategy which does not apply any appeal estimation algorithm as a weak reference. In each set of experiment, the algorithm program and the environment perform 2000 rounds of interaction, the current accumulated expected regret and accumulated acceptance rate are recorded at intervals of a certain number of rounds, and the experimental result after the interaction is finished is shown in table 1. From the results, the present embodiment finds that the random appeal recommendation system causes a large reduction in the evaluation index, which indicates that the appeal of the robot must be considered when recommending the strategy. Fig. 3 shows cumulative expected regressions and cumulative percent adoption curves of different Dropout ratios, and since it is found in the present embodiment that different algorithms converge to different locally optimal solutions in an experiment, the expected regressions approximately linearly increase according to a certain slope after model convergence, in order to better understand the performance difference after model convergence, the present embodiment preprocesses the cumulative expected regressions by y ═ log (x +1), and normalizes the experiment results to draw a curve. Through the trends in fig. 3 and analyzing the real-time cumulative expected regret and cumulative adoption rate in the experimental process, the present embodiment finds that the cumulative expected regret increments of all models are gradually reduced and converged in the training process shown in fig. 3. During the training process shown in fig. 3, the cumulative adoption rates of all models are gradually increasing and converging. The observation shows that different models converge to different local optimal solutions, but all the models can learn the appeal of the robot to a certain degree and improve the performance of the recommendation system. For example, in table 1, even the model that does not utilize Dropout may reduce the cumulative expectation over the random appeal recommendation strategy (without the learning module) by 25.71%.
In the experiment, the effect of the motion selection algorithm using Dropout for motion Exploration is better than that of the motion selection algorithm without Dropout, because the motion sampling using Dropout can be approximately regarded as Thompson sampling, balance Exploration (Exploration) and utilization (Exploration), and the motion space of the model is better sampled, so that the model converges to a better local optimal solution. In four sets of experiments with Dropout ratios of 20%, 40%, 60%, and 80%, the model performance increased and then decreased as Dropout ratios increased. This may be because when Dropout ratio is low, the model adopts a more conservative exploration strategy, and is more likely to converge to a worse locally optimal solution; when Dropout ratio is high, the model is frequently explored, so that learned knowledge cannot be fully utilized, and performance is reduced. Wherein: when the Dropout ratio is 40%, the performance of the model achieves better effects in training and after convergence compared with other Dropout ratio models, which shows that the performance of the model can be optimized by setting the appropriate Dropout ratio to balance Exploration (optimization) and utilization (optimization).
As shown in fig. 3, a situation that the cumulative adoption rate is decreased may occur in the early stage of the interaction, which may be caused by a large uncertainty in training the early model. After analyzing the real-time accumulation expected regret and the accumulation acceptance rate in the experimental process, the embodiment finds that the increment of the accumulation expected regret is obviously reduced in the same period of the reduction of the accumulation acceptance rate, which indicates that the model can better learn the robot appeal through Exploration (Exploration).
To verify the generalization ability of the model, this example performed a control experiment. In the experiment, the experimental group is a model with a Dropout ratio of 40%, and the control group is the same model, but in the present embodiment, the randomisation processing is performed on the appeal-related information in the control group model input, and in the present embodiment, the experimental result is processed similarly to that in fig. two, and is shown in fig. 4 and table 1. The experimental results in fig. 4 and table 1 show that the performance of the model with the appeal-related information input is superior to that of the model without the appeal-related information input, which shows that the model can better learn the appeal of the robot through the appeal-related information.
The conventional means does not establish an interaction process with the resource allocation satisfaction degree of the robot, does not model the appeal preference of the robot, does not use a feedback signal of the robot on a resource allocation result to learn the appeal preference of the robot, does not use a mode of exploration and utilization in online learning to optimize personalized appeal recommendation for the robot, and does not realize large-scale generalized application of a robot appeal recommendation strategy.
Compared with the prior art, the method obviously improves the satisfaction rate of the robot appeal, the resource allocation efficiency and the generalization of the recommendation strategy.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (6)

1. An industrial platform resource optimization allocation device, comprising: a content distribution system and a repository, wherein: the content distribution system generates a resource prediction request of the robot and outputs the resource prediction request to the resource library, optimal resource allocation is carried out according to feedback of the resource library, the neural network model in the appeal prediction unit in the content distribution system is updated based on newly added data while the robot service process is achieved, the resource library receives the resource prediction request sent by the content distribution system, the optimal resource allocation which is potentially allocable is predicted, the resource application of the resource scheduling unit of the content distribution system is received, and resources are allocated based on the resource application.
2. The device for optimized allocation of resources in industrial platform as claimed in claim 1, wherein said content distribution system comprises: the device comprises an interaction unit, an appeal prediction unit, a feature storage unit, a resource scheduling unit and a network training unit, wherein: the interaction unit receives a resource request of the robot and sends the robot ID and the budget to the appeal prediction unit; the appeal prediction unit sends the robot ID to the feature storage unit; the feature storage unit sends the robot features to the appeal prediction unit; the neural network in the appeal prediction unit predicts the appeal of the robot based on the characteristics of the robot and sends the appeal and budget to the resource library; the appeal prediction unit sends a resource prediction result from the resource library to the interaction unit, and the interaction unit inquires whether the robot is adopted or not; when the robot adopts the resource scheduling result, the resource scheduling result authorized by the robot is sent to a resource scheduling unit; the resource scheduling unit sends a resource application request to a resource library; the resource scheduling unit sends the resource to the robot; and after the round of interaction is finished, the interaction unit sends the latest round of interaction data to the feature storage unit.
3. The device as claimed in claim 2, wherein the neural network model is trained by: the network training unit sends a data request to the feature storage unit; the feature storage unit sends the training data to the network training unit; the network training unit trains the neural network model and updates the neural network model in the appeal prediction unit.
4. The method for optimized distribution of industrial platform information according to any one of claims 1 to 3, wherein the content distribution system is configured to start from when a resource application request is issued by a robotAnalyzing the relevant information of the robot in the application request, generating an estimated robot demand, sending the information such as the robot demand and budget allocated to the robot to a resource library, and inquiring allocable resources; the resource pool estimates the resources that can be acquired according to the demand and budget provided by the content distribution system, and assigns the predicted allocable resources
Figure FDA0003086551390000011
And returning, the content distribution system sends the resource application result to the robot according to the resource application result, and distributes the real resource result based on the requirement to the robot through the resource library according to the adoption feedback signal of the robot.
5. The method for optimized distribution and processing of industrial platform information as claimed in claim 4, wherein the allocable resources are: under various constraints such as budget, the resource results obtained by the robot are specifically as follows:
Figure FDA0003086551390000021
wherein: n represents the number of classes of the resource, value viRepresenting the amount of the ith dimension resource.
6. The method as claimed in claim 4, wherein the robot-related information includes: the robot has a resource application budget and the robot has preferences for different resources, namely an appeal weight vector:
Figure FDA0003086551390000022
wherein: w is aiAnd the preference weight of the robot to the i-dimensional report is represented.
CN202110582489.8A 2021-05-27 2021-05-27 Industrial platform resource optimal allocation device and method Pending CN113283171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110582489.8A CN113283171A (en) 2021-05-27 2021-05-27 Industrial platform resource optimal allocation device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110582489.8A CN113283171A (en) 2021-05-27 2021-05-27 Industrial platform resource optimal allocation device and method

Publications (1)

Publication Number Publication Date
CN113283171A true CN113283171A (en) 2021-08-20

Family

ID=77281828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110582489.8A Pending CN113283171A (en) 2021-05-27 2021-05-27 Industrial platform resource optimal allocation device and method

Country Status (1)

Country Link
CN (1) CN113283171A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100017516A1 (en) * 2008-07-16 2010-01-21 General Instrument Corporation Demand-driven optimization and balancing of transcoding resources
CN101836227A (en) * 2007-08-06 2010-09-15 汤姆森许可贸易公司 Method and system for product services analysis and optimization
CN107888669A (en) * 2017-10-31 2018-04-06 武汉理工大学 A kind of extensive resource scheduling system and method based on deep learning neutral net
CN111126641A (en) * 2019-11-25 2020-05-08 泰康保险集团股份有限公司 Resource allocation method and device
CN111491006A (en) * 2020-03-03 2020-08-04 天津大学 Load-aware cloud computing resource elastic distribution system and method
CN111930524A (en) * 2020-10-10 2020-11-13 上海兴容信息技术有限公司 Method and system for distributing computing resources
CN112291335A (en) * 2020-10-27 2021-01-29 上海交通大学 Optimized task scheduling method in mobile edge calculation
CN112418699A (en) * 2020-11-30 2021-02-26 腾讯科技(深圳)有限公司 Resource allocation method, device, equipment and storage medium
CN112565378A (en) * 2020-11-30 2021-03-26 中国科学院深圳先进技术研究院 Cloud native resource dynamic prediction method and device, computer equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101836227A (en) * 2007-08-06 2010-09-15 汤姆森许可贸易公司 Method and system for product services analysis and optimization
US20100017516A1 (en) * 2008-07-16 2010-01-21 General Instrument Corporation Demand-driven optimization and balancing of transcoding resources
CN107888669A (en) * 2017-10-31 2018-04-06 武汉理工大学 A kind of extensive resource scheduling system and method based on deep learning neutral net
CN111126641A (en) * 2019-11-25 2020-05-08 泰康保险集团股份有限公司 Resource allocation method and device
CN111491006A (en) * 2020-03-03 2020-08-04 天津大学 Load-aware cloud computing resource elastic distribution system and method
CN111930524A (en) * 2020-10-10 2020-11-13 上海兴容信息技术有限公司 Method and system for distributing computing resources
CN112291335A (en) * 2020-10-27 2021-01-29 上海交通大学 Optimized task scheduling method in mobile edge calculation
CN112418699A (en) * 2020-11-30 2021-02-26 腾讯科技(深圳)有限公司 Resource allocation method, device, equipment and storage medium
CN112565378A (en) * 2020-11-30 2021-03-26 中国科学院深圳先进技术研究院 Cloud native resource dynamic prediction method and device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIYI GUO: "A Deep Prediction Network for Understanding Advertiser Intent and Satisfaction", 《CIKM"20》 *
LIYI GUO: "We Know What YouWant: An Advertising StrategyWe Know What YouWant: An Advertising Strategy Recommender System for Online Advertising", 《ARXIV》 *
吴帆: "基于博弈论的频谱动态管理研究", 《计算机研究与发展》 *

Similar Documents

Publication Publication Date Title
Gronauer et al. Multi-agent deep reinforcement learning: a survey
CN111556461A (en) Vehicle-mounted edge network task distribution and unloading method based on deep Q network
Djigal et al. Machine and deep learning for resource allocation in multi-access edge computing: A survey
CN113568727A (en) Mobile edge calculation task allocation method based on deep reinforcement learning
JP2007317068A (en) Recommending device and recommending system
CN114490057A (en) MEC unloaded task resource allocation method based on deep reinforcement learning
CN115686846B (en) Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation
Štula et al. Continuously self-adjusting fuzzy cognitive map with semi-autonomous concepts
Hafez et al. Topological Q-learning with internally guided exploration for mobile robot navigation
Kolomvatsos et al. A proactive statistical model supporting services and tasks management in pervasive applications
Iqbal et al. Intelligent multimedia content delivery in 5G/6G networks: a reinforcement learning approach
CN117436485A (en) Multi-exit point end-edge-cloud cooperative system and method based on trade-off time delay and precision
CN113283171A (en) Industrial platform resource optimal allocation device and method
CN114942799B (en) Workflow scheduling method based on reinforcement learning in cloud edge environment
US12019712B2 (en) Enhanced reinforcement learning algorithms using future state prediction scaled reward values
CN116367190A (en) Digital twin function virtualization method for 6G mobile network
CN115220818A (en) Real-time dependency task unloading method based on deep reinforcement learning
CN112632615B (en) Scientific workflow data layout method based on hybrid cloud environment
CN113157344B (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
Kim Reinforcement learning
CN115150335A (en) Optimal flow segmentation method and system based on deep reinforcement learning
CN111027709B (en) Information recommendation method and device, server and storage medium
CN114449536A (en) 5G ultra-dense network multi-user access selection method based on deep reinforcement learning
Mishra et al. Model-free reinforcement learning for mean field games
Kumaran et al. Deep Reinforcement Learning algorithms for Low Latency Edge Computing Systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210820