CN112052093A - Experimental big data resource allocation management system based on message queue technology - Google Patents
Experimental big data resource allocation management system based on message queue technology Download PDFInfo
- Publication number
- CN112052093A CN112052093A CN202010936736.5A CN202010936736A CN112052093A CN 112052093 A CN112052093 A CN 112052093A CN 202010936736 A CN202010936736 A CN 202010936736A CN 112052093 A CN112052093 A CN 112052093A
- Authority
- CN
- China
- Prior art keywords
- data
- application
- experimental
- message queue
- resource
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007726 management method Methods 0.000 title claims abstract description 30
- 238000013468 resource allocation Methods 0.000 title claims abstract description 24
- 238000005516 engineering process Methods 0.000 title claims abstract description 15
- 238000004458 analytical method Methods 0.000 claims abstract description 25
- 230000003993 interaction Effects 0.000 claims abstract description 22
- 238000002474 experimental method Methods 0.000 claims abstract description 16
- 230000002452 interceptive effect Effects 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims description 15
- 230000005540 biological transmission Effects 0.000 abstract description 8
- 238000004140 cleaning Methods 0.000 description 11
- 238000000034 method Methods 0.000 description 9
- 238000012549 training Methods 0.000 description 9
- 238000007405 data analysis Methods 0.000 description 7
- 238000013500 data storage Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000004806 packaging method and process Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 206010033799 Paralysis Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 230000034994 death Effects 0.000 description 1
- 231100000517 death Toxicity 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/544—Remote
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
Abstract
The invention discloses an experimental big data resource allocation management system based on a message queue technology, which comprises an experimental data interaction message queue cluster, an interaction information analysis system and an experimental resource management system, wherein the experimental data interaction message queue cluster is used for providing task data information to be processed by each application group and is processed by the interaction information analysis system; the interaction information analysis system is used for providing a hardware resource allocation scheme and an application group allocation scheme for the experiment resource management and control system; and the experimental resource management and control system readjusts and configures the hardware resources required by each application group according to the hardware resource allocation scheme and the application group allocation scheme provided by the interactive information analysis system and issues an application adjustment task to the experimental resource management and control system. The invention ensures the rapid transmission of experimental data, greatly improves the timeliness of the experimental data, ensures the stable operation of the experiment and saves a large amount of labor and material cost on the basis.
Description
Technical Field
The invention belongs to the technical field of big data processing, relates to a hardware resource allocation system based on a message queue, and particularly relates to a data processing system based on a message queue technology, such as complex experimental data repeated cleaning, recalculation, timeliness transmission, backup, real-time data storage and the like.
Background
Currently, data quality issues are of great concern in many areas. Statistics indicate that the error rate in current commercial databases is typically between 1-5%, and in some cases even as high as 30%. In the united states, poor quality data contributes to a total economic loss of $ 6000 billion in each field each year, and up to 98000 deaths occur from poor medical data. In order to improve the data quality, it is first necessary to determine the quality of the data set to determine whether further repairs are necessary. Particularly, the data quality is rapidly degraded as time passes. For example, due to customer information changes, about 2% of business data per month is outdated and, more seriously, about 50% of data will be unusable within two years due to the outdated failure. Therefore, research on improvement of data timeliness is necessary.
In fact, industrial data has similar problems, especially in the efficient communication of data for large complex devices. If the system needs to guarantee the conditions of high response speed, low coupling degree, high concurrency, high stability and the like, the normal operation of each auxiliary subsystem in the platform needs to be guaranteed. If the coupling of each system is too strong, the program flow is too tight, the running process of the system not only consumes a lot of time, but also the whole system is paralyzed once a problem occurs in a certain step.
The rapid development of computer technology makes the problem of restricting the operation speed of us to be solved step by step, and how to make the system operate durably and effectively becomes the most important point in the current data king which restricts us from advancing back to the architecture of the system. In a traditional mode, a task needs to be processed, a series of exclusive computer hardware resources need to be allocated for the task, and limited hardware resources need to be directly occupied to do the same thing when data of a first hand is transmitted to a designated series of work platforms such as operation, monitoring and early warning, so that the problem can be solved only by continuously increasing hardware investment of the part of hardware.
A "message queue" is a container that holds messages during their transmission. As a new internet application technology, compared with an original transmission mode, the method has the characteristics of decoupling, asynchronization, peak clipping and the like. The occurrence of the message queue technology breaks through the problems of resource allocation and the like in the traditional mode, so that the same amount of data can be transmitted by using very few hardware resources; and on the basis of ensuring the timeliness of data transmission, a large amount of hardware cost and data bandwidth are saved for the whole platform.
Disclosure of Invention
The invention provides an experimental type big data resource allocation management system based on a message queue technology, which aims to reduce the problem that the actual hardware occupation condition of a later data processing process cannot be accurately estimated due to overlarge data amount of an experimental platform and further solve the problem that the analysis timeliness of the whole experimental platform is weakened due to a fault of a certain link in the data transmission process.
The purpose of the invention is realized by the following technical scheme:
an experimental big data resource allocation management system based on a message queue technology comprises an experimental data interaction message queue cluster, an interaction information analysis system and an experimental resource management and control system, wherein:
the experimental data interaction message queue cluster is used for providing task data information required to be processed by each application group, releasing the processed data of the task data information again according to the application requirements, and handing the task data information to an interaction information analysis system for processing, wherein the task data information comprises data such as the total amount of information tasks, the number of processed tasks, the number of tasks to be processed per second and the like corresponding to each application group within a certain time;
the interaction information analysis system is used for analyzing task data provided by the experiment data interaction message queue cluster, application resource consumption data provided by the experiment resource control system, resource consumption trend data during server operation and the like, so as to provide a hardware resource allocation scheme and an application group allocation scheme for the experiment resource control system;
and the experimental resource management and control system readjusts and configures the hardware resources required by each application group according to the hardware resource allocation scheme and the application group allocation scheme provided by the interactive information analysis system, and issues application adjustment tasks to the application groups.
Compared with the prior art, the invention has the following advantages:
the hardware resource allocation scheme based on the message queue ensures the rapid transmission of experimental data, greatly improves the timeliness of the experimental data, ensures the stable operation of the experiment, and saves a large amount of labor and material cost on the basis.
Drawings
FIG. 1 is an overall architecture of an experimental big data resource allocation management system based on a message queue technology according to the present invention;
FIG. 2 is a diagram of a message queue cluster architecture;
FIG. 3 is a diagram of an interaction information analysis system architecture;
FIG. 4 is a diagram of an experimental resource management and control system.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings, but not limited thereto, and any modification or equivalent replacement of the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention shall be covered by the protection scope of the present invention.
The invention provides an experimental big data resource allocation management system based on a message queue technology, the overall architecture of which is shown in figure 1 and comprises an experimental data interactive message queue cluster, an interactive information analysis system and an experimental resource management system, and the data recalculation application group, the data cleaning application group, the real-time data storage application group and the historical data backup application group shown in figure 1 belong to consumers of the message queue cluster (namely, required task data are obtained from the message queue) and belong to the experimental resource management system for management.
The specific management steps are as follows:
firstly, after receiving original acquisition data, a message queue cluster publishes the original data to message queues of a data cleaning application group and a real-time data storage application group in a data processing task mode according to an actual application group (namely, the consumer) subscription rule;
after finding tasks needing to be processed in a message queue subscribed by a data cleaning application group and a real-time data storage application group, the data cleaning application group and the real-time data storage application group collect hardware resources below the data cleaning application group to create application examples to process the tasks, after the tasks are completed, the data cleaning application group sends processed data to a data recalculation cluster in a task mode through the message queue cluster, after the data recalculation cluster processes the completed tasks, the calculated data is sent to the real-time data storage application group in a task mode through the message queue cluster, the real-time data storage application group stores the data into a real-time database, the data stored in the real-time database always ensures the real-time performance of the data, and the historical data backup application group acquires the data from the real-time database at regular time for a certain period of time and adds the data into the historical database;
in the process, each application group sends task processing conditions, resource consumption, acceleration and other data of each application group to the experimental resource management and control system at regular time; then, the interactive information analysis system comprehensively analyzes the overall operation health degree of the system by acquiring the information data of the message queue cluster task and the application group data and the hardware resource residual data collected by the experimental resource control system, timely adjusts the resource ratio of the application group to be completed, and finally generates a deployment scheme;
sending the generated allocation scheme to an experiment resource management and control system, readjusting and allocating hardware resources required by each application group, and issuing an application adjustment task to the hardware resources:
1. sending an application state instruction to the application groups according to the application group allocation scheme, closing excess application (excess refers to the difference part between the number of currently operated applications and the applications required after calculation) of each application group on the premise of not influencing the normal work of each application group, and releasing hardware resources;
2. adjusting a hardware resource pool of the experiment platform, and reconfiguring required hardware resources for each application group;
3. after the hardware resource configuration required by each application group is completed, the experimental resource management and control system issues application adjustment tasks to each application group, and the application groups increase the application number on the premise of not influencing normal work.
In the invention, the experimental data interaction message queue cluster is used for providing task data information required to be processed by each application cluster and releasing the task data information again from the processed data according to the application requirement, and the task data information is delivered to the interaction information analysis system for processing.
And deploying an experimental data interaction message queue cluster according to the mode of FIG. 2, and ensuring that when data is transmitted, if one message queue server fails, other servers take over information data transmitting and receiving tasks.
The message queue is subjected to comprehensive consideration and technology model selection to be RabbitMQ; the cluster mode adopts a multi-active mode. Taking the structure shown in fig. 2 as an example, the cluster specifically comprises the following steps:
(1) preparing 6 servers and installing RabbitMQ for the servers respectively;
(2) it is divided into 2 groups as shown in fig. 2, and different ports are set for each message queue, and servers between two groups are connected by Federation plug-in, so that data can be synchronized between any two servers.
In the invention, the interactive information analysis system is used for analyzing data such as stock, processing speed and the like of various data processing tasks in the experimental data interactive message queue cluster, obtaining the real-time running state of the current experimental platform by analyzing the data in real time, and finally assisting the platform to adjust the allocation scheme of hardware resources by using the data.
Taking fig. 3 as an example, the mutual information analysis system is divided into 6 subsystems, which are: the method comprises the following steps of data receiving, cleaning and packaging application, data analysis model training application, a data analysis model library, data analysis application, application processing capacity analysis application and application resource allocation scheme generation application, wherein the following steps are included:
the data receiving, cleaning and packaging application is mainly used for connecting a message queue cluster management end, regularly collecting all queue information analysis data in the whole message queue cluster system, and comprises the total received task number, the processing completion number, the task processing number in a certain time, the task increase or decrease and the like of each queue;
the data analysis model training application mainly comprises an own training database and a distributed computing server group, when data receiving, cleaning and packaging application sends corresponding data packets, the data packets are stored in the training database, then new model training is carried out by using a training program written by Python and the gradually increased training database, the data packets are stored in the data analysis model database after model training is finished, model training and data packet storage are not carried out synchronously, the training program can carry out timing operation and is preliminarily set to 4 times per day, and specific actual training can be changed through a system background;
the data analysis model library is mainly used for storing the trained analysis data models of all versions and helping to obtain the optimal hardware deployment scheme and application operation scheme;
the method comprises the steps that an application resource allocation scheme generation application is mainly used for generating a final application hardware resource allocation scheme, the scheme is obtained by an application processing capacity analysis application and a data analysis application through a data transmission analysis model, data collection packets sent by the data receiving, cleaning and packaging application within a certain time are substituted into a mathematical model for operation, so that corresponding application and hardware resource optimized use data are obtained, the optimized data are transmitted into the application resource allocation scheme generation application, all resources which are used at present and resources to be selected are allocated and calculated reasonably through application allocation, a resource allocation scheme is generated finally, and various resource adjustment and control commands are issued to an experimental resource management and control system through the scheme, so that resource reallocation is achieved.
In the invention, the experimental resource management and control system is used for storing actual parameters and occupied shares of various hardware resources and processing application program copies of various data works, and after receiving a resource allocation scheme sent by the interactive information analysis system, the experimental resource management and control system intelligently opens or closes the application copies of various data works according to the actual resource occupation condition, thereby reasonably allocating the hardware resources on the premise of not influencing the experimental effect and maximizing the use of the hardware resources.
Fig. 4 is an architecture diagram of an experimental resource management and control system, which includes the following steps:
(1) and installing server software and hardware control application for each server to be controlled, configuring information such as a corresponding control system IP and the like, and ensuring that the server software and hardware control application can be normally connected with the experiment resource control system.
(2) And loading the deployment copy packet of each application for the server software and hardware control application, and ensuring that the experiment resource control system can remotely add the process of each application.
(3) And (3) after the step (1) and the step (2) are completed, verifying whether the resource overrun protection function of the server normally operates.
(4) And (4) after the step (1), the step (2) and the step (3) are completed, verifying whether the remote start-stop function of the server is normally operated.
Claims (3)
1. An experiment type big data resource allocation management system based on a message queue technology is characterized by comprising an experiment data interaction message queue cluster, an interaction information analysis system and an experiment resource management and control system, wherein:
the experimental data interaction message queue cluster is used for providing task data information to be processed by each application cluster, releasing the task data information again from the processed data according to the application requirements, and delivering the task data information to an interaction information analysis system for processing;
the interaction information analysis system is used for analyzing task data provided by the experiment data interaction message queue cluster, application resource consumption data provided by the experiment resource control system and resource consumption trend data during server operation, and accordingly providing a hardware resource allocation scheme and an application group allocation scheme for the experiment resource control system;
and the experimental resource management and control system readjusts and configures the hardware resources required by each application group according to the hardware resource allocation scheme and the application group allocation scheme provided by the interactive information analysis system, and issues application adjustment tasks to the application groups.
2. The system according to claim 1, wherein the task data information includes a total amount of information tasks, a number of processed tasks, a number of tasks to be processed, and a number of tasks to be processed per second corresponding to each application group within a certain period of time.
3. The message queue technology-based experimental big data resource allocation management system according to claim 1, wherein the experimental data interaction message queue clustering technology is modeled as RabbitMQ; the cluster mode adopts a multi-active mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010936736.5A CN112052093A (en) | 2020-09-08 | 2020-09-08 | Experimental big data resource allocation management system based on message queue technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010936736.5A CN112052093A (en) | 2020-09-08 | 2020-09-08 | Experimental big data resource allocation management system based on message queue technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112052093A true CN112052093A (en) | 2020-12-08 |
Family
ID=73610857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010936736.5A Pending CN112052093A (en) | 2020-09-08 | 2020-09-08 | Experimental big data resource allocation management system based on message queue technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112052093A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102739452A (en) * | 2012-06-28 | 2012-10-17 | 浪潮(北京)电子信息产业有限公司 | Method and system for monitoring resources |
US20170048120A1 (en) * | 2015-08-11 | 2017-02-16 | Txmq, Inc. | Systems and Methods for WebSphere MQ Performance Metrics Analysis |
CN106445675A (en) * | 2016-10-20 | 2017-02-22 | 焦点科技股份有限公司 | B2B platform distributed application scheduling and resource allocation method |
CN107395729A (en) * | 2017-07-27 | 2017-11-24 | 深圳乐信软件技术有限公司 | A kind of consumption system of message queue, method and device |
CN107766147A (en) * | 2016-08-23 | 2018-03-06 | 上海宝信软件股份有限公司 | Distributed data analysis task scheduling system |
CN109408236A (en) * | 2018-10-22 | 2019-03-01 | 福建南威软件有限公司 | A kind of task load equalization methods of ETL on cluster |
-
2020
- 2020-09-08 CN CN202010936736.5A patent/CN112052093A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102739452A (en) * | 2012-06-28 | 2012-10-17 | 浪潮(北京)电子信息产业有限公司 | Method and system for monitoring resources |
US20170048120A1 (en) * | 2015-08-11 | 2017-02-16 | Txmq, Inc. | Systems and Methods for WebSphere MQ Performance Metrics Analysis |
CN107766147A (en) * | 2016-08-23 | 2018-03-06 | 上海宝信软件股份有限公司 | Distributed data analysis task scheduling system |
CN106445675A (en) * | 2016-10-20 | 2017-02-22 | 焦点科技股份有限公司 | B2B platform distributed application scheduling and resource allocation method |
CN107395729A (en) * | 2017-07-27 | 2017-11-24 | 深圳乐信软件技术有限公司 | A kind of consumption system of message queue, method and device |
CN109408236A (en) * | 2018-10-22 | 2019-03-01 | 福建南威软件有限公司 | A kind of task load equalization methods of ETL on cluster |
Non-Patent Citations (1)
Title |
---|
孙震宇;石京燕;姜晓巍;邹佳恒;杜然;: "大型高能物理计算集群资源管理方法的评测", 计算机科学, no. 10, 15 October 2017 (2017-10-15) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103761309A (en) | Operation data processing method and system | |
CN109491790A (en) | Industrial Internet of Things edge calculations resource allocation methods and system based on container | |
CN105279603A (en) | Dynamically configured big data analysis system and method | |
CN103581332B (en) | HDFS framework and pressure decomposition method for NameNodes in HDFS framework | |
CN103581339A (en) | Storage resource allocation monitoring and processing method based on cloud computing | |
TWI725744B (en) | Method for establishing system resource prediction and resource management model through multi-layer correlations | |
CN101408861A (en) | Real time monitoring system and method of application program | |
CN104378665A (en) | Distributed transcoding system and method based on digital television | |
CN107450491A (en) | A kind of robot scheduling system, method, electronic equipment and storage medium | |
CN112565415B (en) | Cross-region resource management system and method based on cloud edge cooperation | |
CN106209482A (en) | A kind of data center monitoring method and system | |
CN103164283A (en) | Method and system for dynamic scheduling management of virtualized resources in virtualized desktop system | |
CN106528341B (en) | Automation disaster tolerance system based on Greenplum database | |
CN104735095A (en) | Method and device for job scheduling of cloud computing platform | |
CN102104496A (en) | Fault tolerance optimizing method of intermediate data in cloud computing environment | |
CN107992392A (en) | A kind of automatic monitoring repair system and method for cloud rendering system | |
CN103617098A (en) | Intelligent backup method and system based on data changes | |
CN103595815A (en) | Method for distributing storage resources based on cloud computing | |
CN111724046B (en) | Electricity purchase management system | |
CN113254279A (en) | Intelligent disaster recovery and backup management platform system | |
CN111324460A (en) | Power monitoring control system and method based on cloud computing platform | |
CN103561092B (en) | Method and device for managing resources under private cloud environment | |
CN114745606A (en) | Flexible industrial data acquisition system and method based on rule scheduling | |
CN113515363A (en) | Special-shaped task high-concurrency multi-level data processing system dynamic scheduling platform | |
CN112052093A (en) | Experimental big data resource allocation management system based on message queue technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |