CN112765121A - Administration and application system based on big data service - Google Patents

Administration and application system based on big data service Download PDF

Info

Publication number
CN112765121A
CN112765121A CN202110022976.9A CN202110022976A CN112765121A CN 112765121 A CN112765121 A CN 112765121A CN 202110022976 A CN202110022976 A CN 202110022976A CN 112765121 A CN112765121 A CN 112765121A
Authority
CN
China
Prior art keywords
data
acquisition
collection
server cluster
structured
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110022976.9A
Other languages
Chinese (zh)
Inventor
孙铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hongxin Wanda Technology Co ltd
Original Assignee
Beijing Hongxin Wanda Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hongxin Wanda Technology Co ltd filed Critical Beijing Hongxin Wanda Technology Co ltd
Priority to CN202110022976.9A priority Critical patent/CN112765121A/en
Publication of CN112765121A publication Critical patent/CN112765121A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention relates to the technical field of big data governance, and discloses a governance and application system based on big data service, which comprises: the system comprises a collection management server CMSbds running with data collection management system server software, an application server ASbds running with big data query system server software, a collection server cluster CSCrtuda running with non-real-time structured data collection system server software and configured with a Hive database, a collection server cluster CSCrtstdc running with real-time structured data collection system server software and configured with an HBase database, and a collection server cluster CSCsada running with semi-structured and unstructured data collection system server software and configured with an HDFS database. The invention solves the technical problem of how to integrate and uniformly manage big data.

Description

Administration and application system based on big data service
Technical Field
The invention relates to the technical field of big data management, in particular to a management and application system based on big data service.
Background
The big data service integrates new generation information technologies such as big data, cloud computing and mobile internet, various resources based on data are virtualized and serviced through interactive cooperation among data service main bodies, and data ecological service from basic data resource acquisition, storage, organization, mining, analysis and decision-making to subsequent service evaluation, management, safety and the like is provided for users, so that the big data service is a brand-new data information service mode.
In view of the characteristics of multi-source isomerization of the big data, if the big data service platform has no unified planning and data standard, the collected data is difficult to integrate and uniformly manage.
Therefore, how to do big data management becomes a key problem to be solved urgently for building a big data service platform.
Disclosure of Invention
Technical problem to be solved
Aiming at the defects of the prior art, the invention provides a treatment and application system based on big data service, which aims to solve the technical problem of how to integrate and uniformly manage big data.
(II) technical scheme
In order to achieve the purpose, the invention provides the following technical scheme:
a big data service-based governance and application system comprises: the method comprises the steps that an acquisition management server CMSbds with data acquisition management system server software is operated, and an application server ASbds with big data query system server software is operated;
further comprising: the system comprises a collection server cluster CSCrtuda, a collection server cluster CSCrtsdc and a collection server cluster CSCssauda, wherein the collection server cluster CSCrtuda runs non-real-time structured data collection system server software and is configured with a Hive database, the collection server cluster CSCrtsdc runs real-time structured data collection system server software and is configured with an HBase database, and the collection server cluster CSCssauda runs semi-structured and non-structured data collection system server software and is configured with an HDFS database;
the acquisition management server CMSbds is in communication connection with an acquisition server cluster CSCrtuda, an acquisition server cluster CSCrtsdc and an acquisition server cluster CSCssauda respectively;
the application server ASbgs is in communication connection with the acquisition server cluster CSCrtuda, the acquisition server cluster CSCrtsdc and the acquisition server cluster CSCssauda respectively;
and the application server ASbds performs data interaction with an external service system through a firewall.
2. The big data service based governance and application system according to claim 1, wherein the data collection system on the collection server cluster CSC collects data by the following collection method:
step1, the data acquisition system acquires the total number Nt of acquisition channels of the acquisition server cluster CSC and the data source node information distributed by each acquisition channel;
step2, the data acquisition system judges whether an acquisition channel which is not distributed to the data source acquisition node exists in the acquisition channels of the acquisition server cluster CSC;
if not, namely the data does not exist, returning to Step 1;
if yes, executing Step 3;
step3, acquiring the total number Mt of the data source nodes which can be acquired by the data acquisition system;
step4, calculating the theoretically allocatable average acquisition channel number [ Nt/Mt ] of any data source node by the data acquisition system;
step5, acquiring the number Ni of actually operated acquisition channels of any data source node DSNi by the data acquisition system;
step6, the data acquisition system judges whether the number Ni of the acquisition channels actually operated on the data source node DSNi is less than [ Nt/Mt ];
if not, namely Ni is not less than [ Nt/Mt ], returning to Step 5;
if so, i.e. Ni is less than [ Nt/Mt ], executing Step 7;
step7, the data acquisition system allocates acquisition channels to the data source node DSNi until the number Ni of the acquisition channels actually operated on the data source node DSNi reaches [ Nt/Mt ] acquisition channels;
step8, the data acquisition system judges whether there is an acquisition channel which is not distributed to the data source acquisition node in the acquisition channel of the acquisition server cluster CSC;
if not, namely the data does not exist, returning to Step 1;
if so, then Step5 is returned.
Further, the data acquisition management system distributes acquisition task orientation of the non-real-time structured data to the non-real-time structured data acquisition system running on an acquisition server cluster CSCrtuda, the acquisition server cluster CSCrtuda only acquires the structured data types of the non-real-time structured data and does not acquire other data types, and the acquired non-real-time structured data is directionally stored in the Hive database.
Further, the data acquisition management system directionally allocates acquisition tasks of the real-time structured data to the real-time structured data acquisition system running on the acquisition server cluster CSCrtsdc, the acquisition server cluster CSCrtsdc only acquires the structured data types of the real-time structured data, but not acquires other data types, and directionally stores the acquired real-time structured data into the HBase database.
Further, the data collection management system distributes collection tasks of the semi-structured and unstructured data to the semi-structured and unstructured data collection systems running on the collection server cluster cscscssauda, wherein the collection server cluster cssauda collects only semi-structured and unstructured data types and does not collect other data types, and the collected semi-structured and unstructured data is stored in the HDFS database in an oriented manner.
(III) advantageous technical effects
Compared with the prior art, the invention has the following beneficial technical effects:
the data acquisition management system directionally distributes the data acquisition tasks according to the data structure types on the data source nodes, and the acquisition server cluster CSC directionally acquires and directionally stores the acquired data according to the distribution tasks, so that the directionally acquired isomorphic data can be more efficiently integrated and the uniform management of the isomorphic data is facilitated.
Drawings
FIG. 1 is a flow chart of the acquisition steps of the data acquisition system of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
A big data service-based governance and application system comprises: the method comprises the steps that an acquisition management server CMSbds with data acquisition management system server software is operated, and an application server ASbds with big data query system server software is operated;
the administration and application system based on big data service still includes: the system comprises a collection server cluster CSCrtuda, a collection server cluster CSCrtsdc and a collection server cluster CSCssauda, wherein the collection server cluster CSCrtuda runs non-real-time structured data collection system server software and is configured with a Hive database, the collection server cluster CSCrtsdc runs real-time structured data collection system server software and is configured with an HBase database, and the collection server cluster CSCssauda runs semi-structured and non-structured data collection system server software and is configured with an HDFS database;
the acquisition management server CMSbds is in communication connection with an acquisition server cluster CSCrtuda, an acquisition server cluster CSCrtsdc and an acquisition server cluster CSCssauda respectively;
the application server ASbgs is in communication connection with the acquisition server cluster CSCrtuda, the acquisition server cluster CSCrtsdc and the acquisition server cluster CSCssauda respectively;
further, the application server ASbds performs data interaction with an external service system through a firewall;
the Hive database is used for storing periodically acquired non-real-time data, the data has a fixed table structure, and the non-real-time structured data is extracted from a source data node increment to the Hive database at regular time through an Sqoop script;
the HBase database is used for storing real-time structured data, the acquisition rate is in millisecond level and second level, and the data is stored in a key value pair mode;
the HDFS database is used for storing semi-structured data and unstructured data, the semi-structured data comprises waveform files and model files, the unstructured data comprises images and videos stored in a file form, and the semi-structured data and the unstructured data are stored in catalogues of the HDFS file system according to file categories and time through a file transmission protocol;
the data acquisition management system running on the acquisition management server CMSbds firstly acquires metadata information of each data source node and a data structure type on the data source node, and then executes the following operations:
directionally distributing a collection task of non-real-time structured data to a non-real-time structured data collection system running on a collection server cluster CSCrtuda, wherein the collection server cluster CSCrtuda only collects non-real-time structured data types but not other data types, and directionally stores the collected non-real-time structured data into a Hive database;
directionally distributing the acquisition task of the real-time structured data to a real-time structured data acquisition system running on an acquisition server cluster CSCrtsdc, wherein the acquisition server cluster CSCrtsdc only acquires the real-time structured data types but not other data types, and directionally storing the acquired real-time structured data into an HBase database;
directionally distributing the collection task of the semi-structured and unstructured data to a semi-structured and unstructured data collection system running on a collection server cluster CSCSCSSAuda, wherein the collection server cluster CSCSCSCSSAuda only collects semi-structured and unstructured data types but not other data types, and directionally stores the collected semi-structured and unstructured data into an HDFS database;
further, as shown in fig. 1, the data collection system running on the collection server cluster CSC collects data according to the following collection method:
step1, the data acquisition system acquires the total number Nt of acquisition channels of the acquisition server cluster CSC and the data source node information distributed by each acquisition channel;
step2, the data acquisition system judges whether an acquisition channel which is not distributed to the data source acquisition node exists in the acquisition channels of the acquisition server cluster CSC;
if not, namely the data does not exist, returning to Step 1;
if yes, executing Step 3;
step3, acquiring the total number Mt of the data source nodes which can be acquired by the data acquisition system;
step4, calculating the theoretically allocatable average acquisition channel number [ Nt/Mt ] of any data source node by the data acquisition system;
step5, acquiring the number Ni of actually operated acquisition channels of any data source node DSNi by the data acquisition system;
step6, the data acquisition system judges whether the number Ni of the acquisition channels actually operated on the data source node DSNi is less than [ Nt/Mt ];
if not, namely Ni is not less than [ Nt/Mt ], returning to Step 5;
if so, i.e. Ni is less than [ Nt/Mt ], executing Step 7;
step7, the data acquisition system allocates acquisition channels to the data source node DSNi until the number Ni of the acquisition channels actually operated on the data source node DSNi reaches [ Nt/Mt ] acquisition channels;
step8, the data acquisition system judges whether there is an acquisition channel which is not distributed to the data source acquisition node in the acquisition channel of the acquisition server cluster CSC;
if not, namely the data does not exist, returning to Step 1;
if yes, returning to Step 5;
the data acquisition management system directionally distributes data acquisition tasks according to the data structure types on the data source nodes, and the acquisition server cluster CSC directionally acquires and directionally stores the acquired data according to the distribution tasks, so that the directionally acquired isomorphic data can be integrated more efficiently, and the uniform management of the isomorphic data is facilitated;
further, in the above acquisition method, because [ Nt/Mt ] acquisition channels are running on any data source node DSNi, that is, the acquisition server cluster CSC realizes the balanced and dispersed running of the acquisition channels on each data source node DSNi, it is ensured that the directional data acquisition task can be effectively and reliably completed.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. A governance and application system based on big data service is characterized by comprising: the method comprises the steps that an acquisition management server CMSbds with data acquisition management system server software is operated, and an application server ASbds with big data query system server software is operated;
further comprising: the system comprises a collection server cluster CSCrtuda, a collection server cluster CSCrtsdc and a collection server cluster CSCssauda, wherein the collection server cluster CSCrtuda runs non-real-time structured data collection system server software and is configured with a Hive database, the collection server cluster CSCrtsdc runs real-time structured data collection system server software and is configured with an HBase database, and the collection server cluster CSCssauda runs semi-structured and non-structured data collection system server software and is configured with an HDFS database;
the acquisition management server CMSbds is in communication connection with an acquisition server cluster CSCrtuda, an acquisition server cluster CSCrtsdc and an acquisition server cluster CSCssauda respectively;
the application server ASbgs is in communication connection with the acquisition server cluster CSCrtuda, the acquisition server cluster CSCrtsdc and the acquisition server cluster CSCssauda respectively;
and the application server ASbds performs data interaction with an external service system through a firewall.
2. The big data service based governance and application system according to claim 1, wherein the data collection system on the collection server cluster CSC collects data by the following collection method:
step1, the data acquisition system acquires the total number Nt of acquisition channels of the acquisition server cluster CSC and the data source node information distributed by each acquisition channel;
step2, the data acquisition system judges whether an acquisition channel which is not distributed to the data source acquisition node exists in the acquisition channels of the acquisition server cluster CSC;
if not, namely the data does not exist, returning to Step 1;
if yes, executing Step 3;
step3, acquiring the total number Mt of the data source nodes which can be acquired by the data acquisition system;
step4, calculating the theoretically allocatable average acquisition channel number [ Nt/Mt ] of any data source node by the data acquisition system;
step5, acquiring the number Ni of actually operated acquisition channels of any data source node DSNi by the data acquisition system;
step6, the data acquisition system judges whether the number Ni of the acquisition channels actually operated on the data source node DSNi is less than [ Nt/Mt ];
if not, namely Ni is not less than [ Nt/Mt ], returning to Step 5;
if so, i.e. Ni is less than [ Nt/Mt ], executing Step 7;
step7, the data acquisition system allocates acquisition channels to the data source node DSNi until the number Ni of the acquisition channels actually operated on the data source node DSNi reaches [ Nt/Mt ] acquisition channels;
step8, the data acquisition system judges whether there is an acquisition channel which is not distributed to the data source acquisition node in the acquisition channel of the acquisition server cluster CSC;
if not, namely the data does not exist, returning to Step 1;
if so, then Step5 is returned.
3. An administration and application system based on big data services according to claim 2, characterized in that the data collection management system allocates collection tasks of non-real time structured data to non-real time structured data collection systems running on a collection server cluster CSCrtuda that collects only structured data types of non-real time classes and not other data types and stores collected non-real time structured data in Hive database.
4. A big data services based governance and application system according to claim 3, wherein said data collection management system assigns collection task orientation of real-time class structured data to real-time class structured data collection systems running on collection server cluster cscrtdc that collects only real-time class structured data types and not other data types and stores collected real-time class structured data orientation in HBase database.
5. An administration and application system based on big data services as claimed in claim 4, wherein the data collection management system distributes the collection task orientation of semi-structured and unstructured data to semi-structured and unstructured data collection systems running on a collection server cluster CSCSCSSAuda that collects only semi-structured and unstructured data types and does not collect other data types, and stores the collected semi-structured and unstructured data orientation into the HDFS database.
CN202110022976.9A 2021-01-08 2021-01-08 Administration and application system based on big data service Pending CN112765121A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110022976.9A CN112765121A (en) 2021-01-08 2021-01-08 Administration and application system based on big data service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110022976.9A CN112765121A (en) 2021-01-08 2021-01-08 Administration and application system based on big data service

Publications (1)

Publication Number Publication Date
CN112765121A true CN112765121A (en) 2021-05-07

Family

ID=75700942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110022976.9A Pending CN112765121A (en) 2021-01-08 2021-01-08 Administration and application system based on big data service

Country Status (1)

Country Link
CN (1) CN112765121A (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103369054A (en) * 2013-07-30 2013-10-23 北京搜狐新媒体信息技术有限公司 Acquisition task management method and system
CN104065741A (en) * 2014-07-04 2014-09-24 用友软件股份有限公司 Data collection system and method
WO2016115735A1 (en) * 2015-01-23 2016-07-28 Murthy Sharad R Processing high volume network data
US20170063965A1 (en) * 2015-08-25 2017-03-02 Denis Grenader Data transfer in a collaborative file sharing system
CN108038226A (en) * 2017-12-25 2018-05-15 郑州云海信息技术有限公司 A kind of data Fast Acquisition System and method
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
CN109218385A (en) * 2018-06-28 2019-01-15 西安华为技术有限公司 The method and apparatus for handling data
CN109656684A (en) * 2018-12-11 2019-04-19 杭州涂鸦信息技术有限公司 A kind of partition method of Kafka, partition system and relevant apparatus
CN109740038A (en) * 2019-01-02 2019-05-10 安徽芃睿科技有限公司 Network data distributed parallel computing environment and method
CN109948079A (en) * 2019-03-11 2019-06-28 湖南衍金征信数据服务有限公司 A kind of method that distributed capture discloses page data
CN110275927A (en) * 2019-06-26 2019-09-24 浪潮卓数大数据产业发展有限公司 A kind of streaming real-time data synchronization system based on MySQL
CN111092921A (en) * 2018-10-24 2020-05-01 北大方正集团有限公司 Data acquisition method, device and storage medium
CN111694518A (en) * 2020-05-29 2020-09-22 苏州浪潮智能科技有限公司 Method, device and equipment for automatically migrating data after cluster expansion or contraction

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103369054A (en) * 2013-07-30 2013-10-23 北京搜狐新媒体信息技术有限公司 Acquisition task management method and system
CN104065741A (en) * 2014-07-04 2014-09-24 用友软件股份有限公司 Data collection system and method
WO2016115735A1 (en) * 2015-01-23 2016-07-28 Murthy Sharad R Processing high volume network data
US20170063965A1 (en) * 2015-08-25 2017-03-02 Denis Grenader Data transfer in a collaborative file sharing system
CN108038226A (en) * 2017-12-25 2018-05-15 郑州云海信息技术有限公司 A kind of data Fast Acquisition System and method
CN108846076A (en) * 2018-06-08 2018-11-20 山大地纬软件股份有限公司 The massive multi-source ETL process method and system of supporting interface adaptation
CN109218385A (en) * 2018-06-28 2019-01-15 西安华为技术有限公司 The method and apparatus for handling data
CN111092921A (en) * 2018-10-24 2020-05-01 北大方正集团有限公司 Data acquisition method, device and storage medium
CN109656684A (en) * 2018-12-11 2019-04-19 杭州涂鸦信息技术有限公司 A kind of partition method of Kafka, partition system and relevant apparatus
CN109740038A (en) * 2019-01-02 2019-05-10 安徽芃睿科技有限公司 Network data distributed parallel computing environment and method
CN109948079A (en) * 2019-03-11 2019-06-28 湖南衍金征信数据服务有限公司 A kind of method that distributed capture discloses page data
CN110275927A (en) * 2019-06-26 2019-09-24 浪潮卓数大数据产业发展有限公司 A kind of streaming real-time data synchronization system based on MySQL
CN111694518A (en) * 2020-05-29 2020-09-22 苏州浪潮智能科技有限公司 Method, device and equipment for automatically migrating data after cluster expansion or contraction

Similar Documents

Publication Publication Date Title
CN109491790B (en) Container-based industrial Internet of things edge computing resource allocation method and system
CN105631026A (en) Security data analysis system
CN111984830A (en) Management operation and maintenance platform and data processing method
CN104461740A (en) Cross-domain colony computing resource gathering and distributing method
CN107612984B (en) Big data platform based on internet
CN106033476B (en) A kind of increment type figure calculation method under distributed computation mode in cloud computing environment
CN104969213A (en) Data stream splitting for low-latency data access
CN102833289B (en) A kind of distributed cloud computing resources tissue and method for allocating tasks
CN104239144A (en) Multilevel distributed task processing system
CN101902497B (en) Cloud computing based internet information monitoring system and method
CN112671893A (en) Data acquisition and edge calculation industrial system
CN103455633A (en) Method of distributed analysis for massive network detailed invoice data
CN104468737A (en) Storage hierarchical scheduling method and system based on service type characteristics
CN111432005A (en) Service migration method under narrow-band weak networking condition
CN115733754A (en) Resource management system based on cloud native middle platform technology and elastic construction method thereof
CN115934856A (en) Method and system for constructing comprehensive energy data assets
Lin et al. A bottom-up tree based storage approach for efficient IoT data analytics in cloud systems
CN112650579B (en) Cloud resource distribution scheduling method based on gridding
CN112765121A (en) Administration and application system based on big data service
CN115941426B (en) Multi-service resource collaboration method, system and computer equipment
CN116633880A (en) Kubenetes multi-tenant resource isolation allocation method of multi-cloud platform
CN106355315A (en) Tourism service integration system
CN111049898A (en) Method and system for realizing cross-domain architecture of computing cluster resources
CN106339956A (en) Tourism service integrated system
CN102185713B (en) Global optimization method of internet service resource distribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210507