CN112052093A

CN112052093A - Experimental big data resource allocation management system based on message queue technology

Info

Publication number: CN112052093A
Application number: CN202010936736.5A
Authority: CN
Inventors: 万杰; 姚坤; 石家魁; 付俊丰; 曹勇; 金成刚; 鄂鹏
Original assignee: Harbin Institute of Technology; CERNET Corp
Current assignee: Harbin Institute of Technology; CERNET Corp
Priority date: 2020-09-08
Filing date: 2020-09-08
Publication date: 2020-12-08

Abstract

The invention discloses an experimental big data resource allocation management system based on a message queue technology, which comprises an experimental data interaction message queue cluster, an interaction information analysis system and an experimental resource management system, wherein the experimental data interaction message queue cluster is used for providing task data information to be processed by each application group and is processed by the interaction information analysis system; the interaction information analysis system is used for providing a hardware resource allocation scheme and an application group allocation scheme for the experiment resource management and control system; and the experimental resource management and control system readjusts and configures the hardware resources required by each application group according to the hardware resource allocation scheme and the application group allocation scheme provided by the interactive information analysis system and issues an application adjustment task to the experimental resource management and control system. The invention ensures the rapid transmission of experimental data, greatly improves the timeliness of the experimental data, ensures the stable operation of the experiment and saves a large amount of labor and material cost on the basis.

Description

Experimental big data resource allocation management system based on message queue technology

Technical Field

The invention belongs to the technical field of big data processing, relates to a hardware resource allocation system based on a message queue, and particularly relates to a data processing system based on a message queue technology, such as complex experimental data repeated cleaning, recalculation, timeliness transmission, backup, real-time data storage and the like.

Background

Currently, data quality issues are of great concern in many areas. Statistics indicate that the error rate in current commercial databases is typically between 1-5%, and in some cases even as high as 30%. In the united states, poor quality data contributes to a total economic loss of $ 6000 billion in each field each year, and up to 98000 deaths occur from poor medical data. In order to improve the data quality, it is first necessary to determine the quality of the data set to determine whether further repairs are necessary. Particularly, the data quality is rapidly degraded as time passes. For example, due to customer information changes, about 2% of business data per month is outdated and, more seriously, about 50% of data will be unusable within two years due to the outdated failure. Therefore, research on improvement of data timeliness is necessary.

In fact, industrial data has similar problems, especially in the efficient communication of data for large complex devices. If the system needs to guarantee the conditions of high response speed, low coupling degree, high concurrency, high stability and the like, the normal operation of each auxiliary subsystem in the platform needs to be guaranteed. If the coupling of each system is too strong, the program flow is too tight, the running process of the system not only consumes a lot of time, but also the whole system is paralyzed once a problem occurs in a certain step.

The rapid development of computer technology makes the problem of restricting the operation speed of us to be solved step by step, and how to make the system operate durably and effectively becomes the most important point in the current data king which restricts us from advancing back to the architecture of the system. In a traditional mode, a task needs to be processed, a series of exclusive computer hardware resources need to be allocated for the task, and limited hardware resources need to be directly occupied to do the same thing when data of a first hand is transmitted to a designated series of work platforms such as operation, monitoring and early warning, so that the problem can be solved only by continuously increasing hardware investment of the part of hardware.

A "message queue" is a container that holds messages during their transmission. As a new internet application technology, compared with an original transmission mode, the method has the characteristics of decoupling, asynchronization, peak clipping and the like. The occurrence of the message queue technology breaks through the problems of resource allocation and the like in the traditional mode, so that the same amount of data can be transmitted by using very few hardware resources; and on the basis of ensuring the timeliness of data transmission, a large amount of hardware cost and data bandwidth are saved for the whole platform.

Disclosure of Invention

The invention provides an experimental type big data resource allocation management system based on a message queue technology, which aims to reduce the problem that the actual hardware occupation condition of a later data processing process cannot be accurately estimated due to overlarge data amount of an experimental platform and further solve the problem that the analysis timeliness of the whole experimental platform is weakened due to a fault of a certain link in the data transmission process.

The purpose of the invention is realized by the following technical scheme:

an experimental big data resource allocation management system based on a message queue technology comprises an experimental data interaction message queue cluster, an interaction information analysis system and an experimental resource management and control system, wherein:

the experimental data interaction message queue cluster is used for providing task data information required to be processed by each application group, releasing the processed data of the task data information again according to the application requirements, and handing the task data information to an interaction information analysis system for processing, wherein the task data information comprises data such as the total amount of information tasks, the number of processed tasks, the number of tasks to be processed per second and the like corresponding to each application group within a certain time;

the interaction information analysis system is used for analyzing task data provided by the experiment data interaction message queue cluster, application resource consumption data provided by the experiment resource control system, resource consumption trend data during server operation and the like, so as to provide a hardware resource allocation scheme and an application group allocation scheme for the experiment resource control system;

and the experimental resource management and control system readjusts and configures the hardware resources required by each application group according to the hardware resource allocation scheme and the application group allocation scheme provided by the interactive information analysis system, and issues application adjustment tasks to the application groups.

Compared with the prior art, the invention has the following advantages:

the hardware resource allocation scheme based on the message queue ensures the rapid transmission of experimental data, greatly improves the timeliness of the experimental data, ensures the stable operation of the experiment, and saves a large amount of labor and material cost on the basis.

Drawings

FIG. 1 is an overall architecture of an experimental big data resource allocation management system based on a message queue technology according to the present invention;

FIG. 2 is a diagram of a message queue cluster architecture;

FIG. 3 is a diagram of an interaction information analysis system architecture;

FIG. 4 is a diagram of an experimental resource management and control system.

Detailed Description

The technical solution of the present invention is further described below with reference to the accompanying drawings, but not limited thereto, and any modification or equivalent replacement of the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention shall be covered by the protection scope of the present invention.

The invention provides an experimental big data resource allocation management system based on a message queue technology, the overall architecture of which is shown in figure 1 and comprises an experimental data interactive message queue cluster, an interactive information analysis system and an experimental resource management system, and the data recalculation application group, the data cleaning application group, the real-time data storage application group and the historical data backup application group shown in figure 1 belong to consumers of the message queue cluster (namely, required task data are obtained from the message queue) and belong to the experimental resource management system for management.

The specific management steps are as follows:

firstly, after receiving original acquisition data, a message queue cluster publishes the original data to message queues of a data cleaning application group and a real-time data storage application group in a data processing task mode according to an actual application group (namely, the consumer) subscription rule;

after finding tasks needing to be processed in a message queue subscribed by a data cleaning application group and a real-time data storage application group, the data cleaning application group and the real-time data storage application group collect hardware resources below the data cleaning application group to create application examples to process the tasks, after the tasks are completed, the data cleaning application group sends processed data to a data recalculation cluster in a task mode through the message queue cluster, after the data recalculation cluster processes the completed tasks, the calculated data is sent to the real-time data storage application group in a task mode through the message queue cluster, the real-time data storage application group stores the data into a real-time database, the data stored in the real-time database always ensures the real-time performance of the data, and the historical data backup application group acquires the data from the real-time database at regular time for a certain period of time and adds the data into the historical database;

in the process, each application group sends task processing conditions, resource consumption, acceleration and other data of each application group to the experimental resource management and control system at regular time; then, the interactive information analysis system comprehensively analyzes the overall operation health degree of the system by acquiring the information data of the message queue cluster task and the application group data and the hardware resource residual data collected by the experimental resource control system, timely adjusts the resource ratio of the application group to be completed, and finally generates a deployment scheme;

sending the generated allocation scheme to an experiment resource management and control system, readjusting and allocating hardware resources required by each application group, and issuing an application adjustment task to the hardware resources:

1. sending an application state instruction to the application groups according to the application group allocation scheme, closing excess application (excess refers to the difference part between the number of currently operated applications and the applications required after calculation) of each application group on the premise of not influencing the normal work of each application group, and releasing hardware resources;

2. adjusting a hardware resource pool of the experiment platform, and reconfiguring required hardware resources for each application group;

3. after the hardware resource configuration required by each application group is completed, the experimental resource management and control system issues application adjustment tasks to each application group, and the application groups increase the application number on the premise of not influencing normal work.

In the invention, the experimental data interaction message queue cluster is used for providing task data information required to be processed by each application cluster and releasing the task data information again from the processed data according to the application requirement, and the task data information is delivered to the interaction information analysis system for processing.

And deploying an experimental data interaction message queue cluster according to the mode of FIG. 2, and ensuring that when data is transmitted, if one message queue server fails, other servers take over information data transmitting and receiving tasks.

The message queue is subjected to comprehensive consideration and technology model selection to be RabbitMQ; the cluster mode adopts a multi-active mode. Taking the structure shown in fig. 2 as an example, the cluster specifically comprises the following steps:

(1) preparing 6 servers and installing RabbitMQ for the servers respectively;

(2) it is divided into 2 groups as shown in fig. 2, and different ports are set for each message queue, and servers between two groups are connected by Federation plug-in, so that data can be synchronized between any two servers.

In the invention, the interactive information analysis system is used for analyzing data such as stock, processing speed and the like of various data processing tasks in the experimental data interactive message queue cluster, obtaining the real-time running state of the current experimental platform by analyzing the data in real time, and finally assisting the platform to adjust the allocation scheme of hardware resources by using the data.

Taking fig. 3 as an example, the mutual information analysis system is divided into 6 subsystems, which are: the method comprises the following steps of data receiving, cleaning and packaging application, data analysis model training application, a data analysis model library, data analysis application, application processing capacity analysis application and application resource allocation scheme generation application, wherein the following steps are included:

the data receiving, cleaning and packaging application is mainly used for connecting a message queue cluster management end, regularly collecting all queue information analysis data in the whole message queue cluster system, and comprises the total received task number, the processing completion number, the task processing number in a certain time, the task increase or decrease and the like of each queue;

the data analysis model training application mainly comprises an own training database and a distributed computing server group, when data receiving, cleaning and packaging application sends corresponding data packets, the data packets are stored in the training database, then new model training is carried out by using a training program written by Python and the gradually increased training database, the data packets are stored in the data analysis model database after model training is finished, model training and data packet storage are not carried out synchronously, the training program can carry out timing operation and is preliminarily set to 4 times per day, and specific actual training can be changed through a system background;

the data analysis model library is mainly used for storing the trained analysis data models of all versions and helping to obtain the optimal hardware deployment scheme and application operation scheme;

the method comprises the steps that an application resource allocation scheme generation application is mainly used for generating a final application hardware resource allocation scheme, the scheme is obtained by an application processing capacity analysis application and a data analysis application through a data transmission analysis model, data collection packets sent by the data receiving, cleaning and packaging application within a certain time are substituted into a mathematical model for operation, so that corresponding application and hardware resource optimized use data are obtained, the optimized data are transmitted into the application resource allocation scheme generation application, all resources which are used at present and resources to be selected are allocated and calculated reasonably through application allocation, a resource allocation scheme is generated finally, and various resource adjustment and control commands are issued to an experimental resource management and control system through the scheme, so that resource reallocation is achieved.

In the invention, the experimental resource management and control system is used for storing actual parameters and occupied shares of various hardware resources and processing application program copies of various data works, and after receiving a resource allocation scheme sent by the interactive information analysis system, the experimental resource management and control system intelligently opens or closes the application copies of various data works according to the actual resource occupation condition, thereby reasonably allocating the hardware resources on the premise of not influencing the experimental effect and maximizing the use of the hardware resources.

Fig. 4 is an architecture diagram of an experimental resource management and control system, which includes the following steps:

(1) and installing server software and hardware control application for each server to be controlled, configuring information such as a corresponding control system IP and the like, and ensuring that the server software and hardware control application can be normally connected with the experiment resource control system.

(2) And loading the deployment copy packet of each application for the server software and hardware control application, and ensuring that the experiment resource control system can remotely add the process of each application.

(3) And (3) after the step (1) and the step (2) are completed, verifying whether the resource overrun protection function of the server normally operates.

(4) And (4) after the step (1), the step (2) and the step (3) are completed, verifying whether the remote start-stop function of the server is normally operated.

Claims

1. An experiment type big data resource allocation management system based on a message queue technology is characterized by comprising an experiment data interaction message queue cluster, an interaction information analysis system and an experiment resource management and control system, wherein:

the experimental data interaction message queue cluster is used for providing task data information to be processed by each application cluster, releasing the task data information again from the processed data according to the application requirements, and delivering the task data information to an interaction information analysis system for processing;

the interaction information analysis system is used for analyzing task data provided by the experiment data interaction message queue cluster, application resource consumption data provided by the experiment resource control system and resource consumption trend data during server operation, and accordingly providing a hardware resource allocation scheme and an application group allocation scheme for the experiment resource control system;

2. The system according to claim 1, wherein the task data information includes a total amount of information tasks, a number of processed tasks, a number of tasks to be processed, and a number of tasks to be processed per second corresponding to each application group within a certain period of time.

3. The message queue technology-based experimental big data resource allocation management system according to claim 1, wherein the experimental data interaction message queue clustering technology is modeled as RabbitMQ; the cluster mode adopts a multi-active mode.