CN104933119A

CN104933119A - Big data management method

Info

Publication number: CN104933119A
Application number: CN201510306918.3A
Authority: CN
Inventors: 陈勇; 王剑冰; 陈纲
Original assignee: Fujian Fujitsu Communication Software Co Ltd
Current assignee: Fujian Fujitsu Communication Software Co Ltd
Priority date: 2015-06-05
Filing date: 2015-06-05
Publication date: 2015-09-23

Abstract

The invention provides a big data management method. The method comprises the steps that data in a database is extracted or a text file is used as an input; and a user configures a data processing procedure according to business needs, processes input data, and finally stores a process result in the database or a big data cluster. As for the specific processing of the input data, the user configures the data processing procedure according to needs, each processing node of the data processing procedure is a paragraph of structured query language (SQL) statements or a procedure segment of business processing, and a scheduler starts the data processing procedure regularly, sequentially executes the SQL statements or procedure segment of each processing node of the data processing procedure, and stores the result obtained after execution in a buffer.

Description

A kind of large data managing method

Technical field

The present invention relates to communication technical field, particularly relate to a kind of large data managing method.

Background technology

Large data (big data, mega data), or claim flood tide data, refer to and need new tupe just can have the magnanimity of stronger decision edge, clairvoyance and process optimization ability, high growth rate and diversified information assets.Along with the arriving in cloud epoch, large data (Big data) have also attracted increasing concern.That enters along with technical development and open source software over year is prevailing, and large data fields emerges increasing Excellent Software, for practical application solves many problems.But it is higher that these softwares all exist technical threshold, use the problems such as complicated.Present patent application is devoted to reduce large data and is used difficulty, utilizes simple universal SQL statement to describe business, provides visualization tool to describe flow process, can dispose by finishing service fast.

Prior art discloses " a kind of voltage dip data analysing method based on cloud computing technology ", see that publication number is: 103412942A, publication date is: the Chinese patent of 2013-11-27; The method adopts the cloud computing platform based on hadoop1.1.2, by MapReduce programming mechanism, is utilized by the Wave data in relevant database transfer tool Sqoop to transfer in KV database, stores the data of a specific cycle with key-value pair form; Each Mapper (mapping class) reads a key-value pair as input from KV database; Travel through cycle data, calculate RMS (root-mean-square valve) value; The Output rusults of all Mapper (mapping class) is sorted; Reducer merges the RMS data and curves of the same phase of same event, and travels through this curve, calculates respectively to fall eigenwert temporarily.This invention makes the related data of voltage dip can directly by Hadoop cloud computing platform, multiple stage computing machine calculates concurrently, really achieve the superposition of multiple stage physical computer computing power, thus substantially increasing counting yield, the fault-tolerance of cloud platform also improves the reliability of falling result of calculation temporarily.The technical scheme that this invention adopts and the present invention are not identical; Present patent application utilizes simple universal SQL statement to describe business, provides visualization tool to describe flow process.

Summary of the invention

The technical problem to be solved in the present invention, be to provide a kind of large data managing method, the applicable threshold of the large data of effective reduction, rapid deployment can be carried out according to service needed, and realize a key cluster, the reaching the standard grade and roll off the production line of online management server, visual cluster monitoring are installed, thus greatly reduce operation cost.

The present invention is achieved in that a kind of large data managing method, described method is: using database data extract or text as input, user is according to service needed configuration data treatment scheme, process the data of input, the result processed the most at last is saved in database or large data sets group; The described data to input are carried out process and are specially: user is configuration data treatment scheme according to demand, each processing node of flow chart of data processing is the usability of program fragments of one section of SQL statement or a section business process, scheduler program meeting start by set date flow chart of data processing, and perform each node SQL statement or the usability of program fragments of flow chart of data processing successively, the result obtained after execution is stored in a buffer memory.

Further, describedly database data to be extracted or text is specially as input: configuration database data source information, and use sqoop instrument, the table data of database are directly drawn in distributed file system by a setting form; Or allow business procedure data result to be put in the assigned catalogue of file server, file server by this catalogue of monitoring, once find that new file will upload to distributed file system automatically.

Further, the described form that sets is: with tab as field delimiter, take carriage return character as record decollator.

Further, the data that described method also comprises the result of process export, and these data export and are specially: by user's specified database data source information, by sqoop instrument, write direct in the table of database by the data in buffer memory; Or by the formulation catalogue of the data of buffer memory write distributed file system, and according to user's needs, be sent to the file server of specifying.

Further, described flow chart of data processing regularly performs, and carries out Resourse Distribute and task management by dispatch service.

Tool of the present invention has the following advantages: the present invention is devoted to reduce large data and uses difficulty, simple universal SQL statement is utilized to describe business, visualization tool is provided to describe flow process, using database data extract or text as input, user is according to service needed configuration data treatment scheme, process the data of input, the result processed the most at last is saved in database or large data sets group; The applicable threshold of the large data of effective reduction, can carry out rapid deployment according to service needed, and realizes a key and install cluster, the reaching the standard grade and roll off the production line of online management server, visual cluster monitoring, thus greatly reduces operation cost.

Accompanying drawing explanation

Fig. 1 is the inventive method schematic flow sheet.

Fig. 2 be data stream of the present invention move towards schematic diagram.

Embodiment

Refer to shown in Fig. 1 and Fig. 2, the large data managing method of one of the present invention, described method is: using database data extract or text as input, user is according to service needed configuration data treatment scheme, process the data of input, the result processed the most at last is saved in database or large data sets group; The described data to input are carried out process and are specially: user is configuration data treatment scheme according to demand, each processing node of flow chart of data processing is the usability of program fragments of one section of SQL statement or a section business process, scheduler program meeting start by set date flow chart of data processing, and perform each node SQL statement or the usability of program fragments of flow chart of data processing successively, the result obtained after execution is stored in a buffer memory.Described flow chart of data processing regularly performs, and carries out Resourse Distribute and task management by dispatch service, within 7 × 24 hours, can provide data, services.

Wherein, describedly database data to be extracted or text is specially as input: configuration database data source information, and use sqoop instrument, the table data of database are directly drawn in distributed file system by a setting form; Or allow business procedure data result to be put in the assigned catalogue of file server, file server by this catalogue of monitoring, once find that new file will upload to distributed file system automatically.This sets form: with tab as field delimiter, with the carriage return character of standard for record decollator.In addition, also user-defined format is supported.

The data that described method also comprises the result of process export, and these data export and are specially: by user's specified database data source information, by sqoop instrument, write direct in the table of database by the data in buffer memory; Or by the formulation catalogue of the data of buffer memory write distributed file system, and according to user's needs, be sent to the file server of specifying.

In a word, the present invention is devoted to reduce large data and uses difficulty, simple universal SQL statement is utilized to describe business, visualization tool is provided to describe flow process, using database data extract or text as input, user is according to service needed configuration data treatment scheme, and process the data of input, the result processed the most at last is saved in database or large data sets group; The applicable threshold of the large data of effective reduction, can carry out rapid deployment according to service needed, and realizes a key and install cluster, the reaching the standard grade and roll off the production line of online management server, visual cluster monitoring, thus greatly reduces operation cost.

The foregoing is only preferred embodiment of the present invention, all equalizations done according to the present patent application the scope of the claims change and modify, and all should belong to covering scope of the present invention.

Claims

1. a large data managing method, it is characterized in that: described method is: using database data extract or text as input, user is according to service needed configuration data treatment scheme, and process the data of input, the result processed the most at last is saved in database or large data sets group; The described data to input are carried out process and are specially: user is configuration data treatment scheme according to demand, each processing node of flow chart of data processing is the usability of program fragments of one section of SQL statement or a section business process, scheduler program meeting start by set date flow chart of data processing, and perform each node SQL statement or the usability of program fragments of flow chart of data processing successively, the result obtained after execution is stored in a buffer memory.

2. the large data managing method of one according to claim 1, it is characterized in that: described using database data extract or text be specially as input: configuration database data source information, and use sqoop instrument, the table data of database are directly drawn in distributed file system by a setting form; Or allow business procedure data result to be put in the assigned catalogue of file server, file server by this catalogue of monitoring, once find that new file will upload to distributed file system automatically.

3. the large data managing method of one according to claim 2, is characterized in that: the described form that sets is: with tab as field delimiter, take carriage return character as record decollator.

4. the large data managing method of one according to claim 1, it is characterized in that: the data that described method also comprises the result of process export, these data export and are specially: by user's specified database data source information, by sqoop instrument, the data in buffer memory are write direct in the table of database; Or by the formulation catalogue of the data of buffer memory write distributed file system, and according to user's needs, be sent to the file server of specifying.

5. the large data managing method of one according to claim 1, is characterized in that: described flow chart of data processing regularly performs, carries out Resourse Distribute and task management by dispatch service.