CN112765121A

CN112765121A - Administration and application system based on big data service

Info

Publication number: CN112765121A
Application number: CN202110022976.9A
Authority: CN
Inventors: 孙铭
Original assignee: Beijing Hongxin Wanda Technology Co ltd
Current assignee: Beijing Hongxin Wanda Technology Co ltd
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2021-05-07

Abstract

The invention relates to the technical field of big data governance, and discloses a governance and application system based on big data service, which comprises: the system comprises a collection management server CMSbds running with data collection management system server software, an application server ASbds running with big data query system server software, a collection server cluster CSCrtuda running with non-real-time structured data collection system server software and configured with a Hive database, a collection server cluster CSCrtstdc running with real-time structured data collection system server software and configured with an HBase database, and a collection server cluster CSCsada running with semi-structured and unstructured data collection system server software and configured with an HDFS database. The invention solves the technical problem of how to integrate and uniformly manage big data.

Description

Administration and application system based on big data service

Technical Field

The invention relates to the technical field of big data management, in particular to a management and application system based on big data service.

Background

The big data service integrates new generation information technologies such as big data, cloud computing and mobile internet, various resources based on data are virtualized and serviced through interactive cooperation among data service main bodies, and data ecological service from basic data resource acquisition, storage, organization, mining, analysis and decision-making to subsequent service evaluation, management, safety and the like is provided for users, so that the big data service is a brand-new data information service mode.

In view of the characteristics of multi-source isomerization of the big data, if the big data service platform has no unified planning and data standard, the collected data is difficult to integrate and uniformly manage.

Therefore, how to do big data management becomes a key problem to be solved urgently for building a big data service platform.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the invention provides a treatment and application system based on big data service, which aims to solve the technical problem of how to integrate and uniformly manage big data.

(II) technical scheme

In order to achieve the purpose, the invention provides the following technical scheme:

a big data service-based governance and application system comprises: the method comprises the steps that an acquisition management server CMSbds with data acquisition management system server software is operated, and an application server ASbds with big data query system server software is operated;

further comprising: the system comprises a collection server cluster CSCrtuda, a collection server cluster CSCrtsdc and a collection server cluster CSCssauda, wherein the collection server cluster CSCrtuda runs non-real-time structured data collection system server software and is configured with a Hive database, the collection server cluster CSCrtsdc runs real-time structured data collection system server software and is configured with an HBase database, and the collection server cluster CSCssauda runs semi-structured and non-structured data collection system server software and is configured with an HDFS database;

the acquisition management server CMSbds is in communication connection with an acquisition server cluster CSCrtuda, an acquisition server cluster CSCrtsdc and an acquisition server cluster CSCssauda respectively;

the application server ASbgs is in communication connection with the acquisition server cluster CSCrtuda, the acquisition server cluster CSCrtsdc and the acquisition server cluster CSCssauda respectively;

and the application server ASbds performs data interaction with an external service system through a firewall.

2. The big data service based governance and application system according to claim 1, wherein the data collection system on the collection server cluster CSC collects data by the following collection method:

step1, the data acquisition system acquires the total number Nt of acquisition channels of the acquisition server cluster CSC and the data source node information distributed by each acquisition channel;

step2, the data acquisition system judges whether an acquisition channel which is not distributed to the data source acquisition node exists in the acquisition channels of the acquisition server cluster CSC;

if not, namely the data does not exist, returning to Step 1;

if yes, executing Step 3;

step3, acquiring the total number Mt of the data source nodes which can be acquired by the data acquisition system;

step4, calculating the theoretically allocatable average acquisition channel number [ Nt/Mt ] of any data source node by the data acquisition system;

step5, acquiring the number Ni of actually operated acquisition channels of any data source node DSNi by the data acquisition system;

step6, the data acquisition system judges whether the number Ni of the acquisition channels actually operated on the data source node DSNi is less than [ Nt/Mt ];

if not, namely Ni is not less than [ Nt/Mt ], returning to Step 5;

if so, i.e. Ni is less than [ Nt/Mt ], executing Step 7;

step7, the data acquisition system allocates acquisition channels to the data source node DSNi until the number Ni of the acquisition channels actually operated on the data source node DSNi reaches [ Nt/Mt ] acquisition channels;

step8, the data acquisition system judges whether there is an acquisition channel which is not distributed to the data source acquisition node in the acquisition channel of the acquisition server cluster CSC;

if not, namely the data does not exist, returning to Step 1;

if so, then Step5 is returned.

Further, the data acquisition management system distributes acquisition task orientation of the non-real-time structured data to the non-real-time structured data acquisition system running on an acquisition server cluster CSCrtuda, the acquisition server cluster CSCrtuda only acquires the structured data types of the non-real-time structured data and does not acquire other data types, and the acquired non-real-time structured data is directionally stored in the Hive database.

Further, the data acquisition management system directionally allocates acquisition tasks of the real-time structured data to the real-time structured data acquisition system running on the acquisition server cluster CSCrtsdc, the acquisition server cluster CSCrtsdc only acquires the structured data types of the real-time structured data, but not acquires other data types, and directionally stores the acquired real-time structured data into the HBase database.

Further, the data collection management system distributes collection tasks of the semi-structured and unstructured data to the semi-structured and unstructured data collection systems running on the collection server cluster cscscssauda, wherein the collection server cluster cssauda collects only semi-structured and unstructured data types and does not collect other data types, and the collected semi-structured and unstructured data is stored in the HDFS database in an oriented manner.

(III) advantageous technical effects

Compared with the prior art, the invention has the following beneficial technical effects:

the data acquisition management system directionally distributes the data acquisition tasks according to the data structure types on the data source nodes, and the acquisition server cluster CSC directionally acquires and directionally stores the acquired data according to the distribution tasks, so that the directionally acquired isomorphic data can be more efficiently integrated and the uniform management of the isomorphic data is facilitated.

Drawings

FIG. 1 is a flow chart of the acquisition steps of the data acquisition system of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

the administration and application system based on big data service still includes: the system comprises a collection server cluster CSCrtuda, a collection server cluster CSCrtsdc and a collection server cluster CSCssauda, wherein the collection server cluster CSCrtuda runs non-real-time structured data collection system server software and is configured with a Hive database, the collection server cluster CSCrtsdc runs real-time structured data collection system server software and is configured with an HBase database, and the collection server cluster CSCssauda runs semi-structured and non-structured data collection system server software and is configured with an HDFS database;

further, the application server ASbds performs data interaction with an external service system through a firewall;

the Hive database is used for storing periodically acquired non-real-time data, the data has a fixed table structure, and the non-real-time structured data is extracted from a source data node increment to the Hive database at regular time through an Sqoop script;

the HBase database is used for storing real-time structured data, the acquisition rate is in millisecond level and second level, and the data is stored in a key value pair mode;

the HDFS database is used for storing semi-structured data and unstructured data, the semi-structured data comprises waveform files and model files, the unstructured data comprises images and videos stored in a file form, and the semi-structured data and the unstructured data are stored in catalogues of the HDFS file system according to file categories and time through a file transmission protocol;

the data acquisition management system running on the acquisition management server CMSbds firstly acquires metadata information of each data source node and a data structure type on the data source node, and then executes the following operations:

directionally distributing a collection task of non-real-time structured data to a non-real-time structured data collection system running on a collection server cluster CSCrtuda, wherein the collection server cluster CSCrtuda only collects non-real-time structured data types but not other data types, and directionally stores the collected non-real-time structured data into a Hive database;

directionally distributing the acquisition task of the real-time structured data to a real-time structured data acquisition system running on an acquisition server cluster CSCrtsdc, wherein the acquisition server cluster CSCrtsdc only acquires the real-time structured data types but not other data types, and directionally storing the acquired real-time structured data into an HBase database;

directionally distributing the collection task of the semi-structured and unstructured data to a semi-structured and unstructured data collection system running on a collection server cluster CSCSCSSAuda, wherein the collection server cluster CSCSCSCSSAuda only collects semi-structured and unstructured data types but not other data types, and directionally stores the collected semi-structured and unstructured data into an HDFS database;

further, as shown in fig. 1, the data collection system running on the collection server cluster CSC collects data according to the following collection method:

if not, namely the data does not exist, returning to Step 1;

if yes, executing Step 3;

if not, namely Ni is not less than [ Nt/Mt ], returning to Step 5;

if so, i.e. Ni is less than [ Nt/Mt ], executing Step 7;

if not, namely the data does not exist, returning to Step 1;

if yes, returning to Step 5;

the data acquisition management system directionally distributes data acquisition tasks according to the data structure types on the data source nodes, and the acquisition server cluster CSC directionally acquires and directionally stores the acquired data according to the distribution tasks, so that the directionally acquired isomorphic data can be integrated more efficiently, and the uniform management of the isomorphic data is facilitated;

further, in the above acquisition method, because [ Nt/Mt ] acquisition channels are running on any data source node DSNi, that is, the acquisition server cluster CSC realizes the balanced and dispersed running of the acquisition channels on each data source node DSNi, it is ensured that the directional data acquisition task can be effectively and reliably completed.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A governance and application system based on big data service is characterized by comprising: the method comprises the steps that an acquisition management server CMSbds with data acquisition management system server software is operated, and an application server ASbds with big data query system server software is operated;

if not, namely the data does not exist, returning to Step 1;

if yes, executing Step 3;

if not, namely Ni is not less than [ Nt/Mt ], returning to Step 5;

if so, i.e. Ni is less than [ Nt/Mt ], executing Step 7;

if not, namely the data does not exist, returning to Step 1;

if so, then Step5 is returned.

3. An administration and application system based on big data services according to claim 2, characterized in that the data collection management system allocates collection tasks of non-real time structured data to non-real time structured data collection systems running on a collection server cluster CSCrtuda that collects only structured data types of non-real time classes and not other data types and stores collected non-real time structured data in Hive database.

4. A big data services based governance and application system according to claim 3, wherein said data collection management system assigns collection task orientation of real-time class structured data to real-time class structured data collection systems running on collection server cluster cscrtdc that collects only real-time class structured data types and not other data types and stores collected real-time class structured data orientation in HBase database.

5. An administration and application system based on big data services as claimed in claim 4, wherein the data collection management system distributes the collection task orientation of semi-structured and unstructured data to semi-structured and unstructured data collection systems running on a collection server cluster CSCSCSSAuda that collects only semi-structured and unstructured data types and does not collect other data types, and stores the collected semi-structured and unstructured data orientation into the HDFS database.