CN104933119A - Big data management method - Google Patents

Big data management method Download PDF

Info

Publication number
CN104933119A
CN104933119A CN201510306918.3A CN201510306918A CN104933119A CN 104933119 A CN104933119 A CN 104933119A CN 201510306918 A CN201510306918 A CN 201510306918A CN 104933119 A CN104933119 A CN 104933119A
Authority
CN
China
Prior art keywords
data
database
input
user
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510306918.3A
Other languages
Chinese (zh)
Inventor
陈勇
王剑冰
陈纲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Fujitsu Communication Software Co Ltd
Original Assignee
Fujian Fujitsu Communication Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Fujitsu Communication Software Co Ltd filed Critical Fujian Fujitsu Communication Software Co Ltd
Priority to CN201510306918.3A priority Critical patent/CN104933119A/en
Publication of CN104933119A publication Critical patent/CN104933119A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a big data management method. The method comprises the steps that data in a database is extracted or a text file is used as an input; and a user configures a data processing procedure according to business needs, processes input data, and finally stores a process result in the database or a big data cluster. As for the specific processing of the input data, the user configures the data processing procedure according to needs, each processing node of the data processing procedure is a paragraph of structured query language (SQL) statements or a procedure segment of business processing, and a scheduler starts the data processing procedure regularly, sequentially executes the SQL statements or procedure segment of each processing node of the data processing procedure, and stores the result obtained after execution in a buffer.

Description

A kind of large data managing method
Technical field
The present invention relates to communication technical field, particularly relate to a kind of large data managing method.
Background technology
Large data (big data, mega data), or claim flood tide data, refer to and need new tupe just can have the magnanimity of stronger decision edge, clairvoyance and process optimization ability, high growth rate and diversified information assets.Along with the arriving in cloud epoch, large data (Big data) have also attracted increasing concern.That enters along with technical development and open source software over year is prevailing, and large data fields emerges increasing Excellent Software, for practical application solves many problems.But it is higher that these softwares all exist technical threshold, use the problems such as complicated.Present patent application is devoted to reduce large data and is used difficulty, utilizes simple universal SQL statement to describe business, provides visualization tool to describe flow process, can dispose by finishing service fast.
Prior art discloses " a kind of voltage dip data analysing method based on cloud computing technology ", see that publication number is: 103412942A, publication date is: the Chinese patent of 2013-11-27; The method adopts the cloud computing platform based on hadoop1.1.2, by MapReduce programming mechanism, is utilized by the Wave data in relevant database transfer tool Sqoop to transfer in KV database, stores the data of a specific cycle with key-value pair form; Each Mapper (mapping class) reads a key-value pair as input from KV database; Travel through cycle data, calculate RMS (root-mean-square valve) value; The Output rusults of all Mapper (mapping class) is sorted; Reducer merges the RMS data and curves of the same phase of same event, and travels through this curve, calculates respectively to fall eigenwert temporarily.This invention makes the related data of voltage dip can directly by Hadoop cloud computing platform, multiple stage computing machine calculates concurrently, really achieve the superposition of multiple stage physical computer computing power, thus substantially increasing counting yield, the fault-tolerance of cloud platform also improves the reliability of falling result of calculation temporarily.The technical scheme that this invention adopts and the present invention are not identical; Present patent application utilizes simple universal SQL statement to describe business, provides visualization tool to describe flow process.
Summary of the invention
The technical problem to be solved in the present invention, be to provide a kind of large data managing method, the applicable threshold of the large data of effective reduction, rapid deployment can be carried out according to service needed, and realize a key cluster, the reaching the standard grade and roll off the production line of online management server, visual cluster monitoring are installed, thus greatly reduce operation cost.
The present invention is achieved in that a kind of large data managing method, described method is: using database data extract or text as input, user is according to service needed configuration data treatment scheme, process the data of input, the result processed the most at last is saved in database or large data sets group; The described data to input are carried out process and are specially: user is configuration data treatment scheme according to demand, each processing node of flow chart of data processing is the usability of program fragments of one section of SQL statement or a section business process, scheduler program meeting start by set date flow chart of data processing, and perform each node SQL statement or the usability of program fragments of flow chart of data processing successively, the result obtained after execution is stored in a buffer memory.
Further, describedly database data to be extracted or text is specially as input: configuration database data source information, and use sqoop instrument, the table data of database are directly drawn in distributed file system by a setting form; Or allow business procedure data result to be put in the assigned catalogue of file server, file server by this catalogue of monitoring, once find that new file will upload to distributed file system automatically.
Further, the described form that sets is: with tab as field delimiter, take carriage return character as record decollator.
Further, the data that described method also comprises the result of process export, and these data export and are specially: by user's specified database data source information, by sqoop instrument, write direct in the table of database by the data in buffer memory; Or by the formulation catalogue of the data of buffer memory write distributed file system, and according to user's needs, be sent to the file server of specifying.
Further, described flow chart of data processing regularly performs, and carries out Resourse Distribute and task management by dispatch service.
Tool of the present invention has the following advantages: the present invention is devoted to reduce large data and uses difficulty, simple universal SQL statement is utilized to describe business, visualization tool is provided to describe flow process, using database data extract or text as input, user is according to service needed configuration data treatment scheme, process the data of input, the result processed the most at last is saved in database or large data sets group; The applicable threshold of the large data of effective reduction, can carry out rapid deployment according to service needed, and realizes a key and install cluster, the reaching the standard grade and roll off the production line of online management server, visual cluster monitoring, thus greatly reduces operation cost.
Accompanying drawing explanation
Fig. 1 is the inventive method schematic flow sheet.
Fig. 2 be data stream of the present invention move towards schematic diagram.
Embodiment
Refer to shown in Fig. 1 and Fig. 2, the large data managing method of one of the present invention, described method is: using database data extract or text as input, user is according to service needed configuration data treatment scheme, process the data of input, the result processed the most at last is saved in database or large data sets group; The described data to input are carried out process and are specially: user is configuration data treatment scheme according to demand, each processing node of flow chart of data processing is the usability of program fragments of one section of SQL statement or a section business process, scheduler program meeting start by set date flow chart of data processing, and perform each node SQL statement or the usability of program fragments of flow chart of data processing successively, the result obtained after execution is stored in a buffer memory.Described flow chart of data processing regularly performs, and carries out Resourse Distribute and task management by dispatch service, within 7 × 24 hours, can provide data, services.
Wherein, describedly database data to be extracted or text is specially as input: configuration database data source information, and use sqoop instrument, the table data of database are directly drawn in distributed file system by a setting form; Or allow business procedure data result to be put in the assigned catalogue of file server, file server by this catalogue of monitoring, once find that new file will upload to distributed file system automatically.This sets form: with tab as field delimiter, with the carriage return character of standard for record decollator.In addition, also user-defined format is supported.
The data that described method also comprises the result of process export, and these data export and are specially: by user's specified database data source information, by sqoop instrument, write direct in the table of database by the data in buffer memory; Or by the formulation catalogue of the data of buffer memory write distributed file system, and according to user's needs, be sent to the file server of specifying.
In a word, the present invention is devoted to reduce large data and uses difficulty, simple universal SQL statement is utilized to describe business, visualization tool is provided to describe flow process, using database data extract or text as input, user is according to service needed configuration data treatment scheme, and process the data of input, the result processed the most at last is saved in database or large data sets group; The applicable threshold of the large data of effective reduction, can carry out rapid deployment according to service needed, and realizes a key and install cluster, the reaching the standard grade and roll off the production line of online management server, visual cluster monitoring, thus greatly reduces operation cost.
The foregoing is only preferred embodiment of the present invention, all equalizations done according to the present patent application the scope of the claims change and modify, and all should belong to covering scope of the present invention.

Claims (5)

1. a large data managing method, it is characterized in that: described method is: using database data extract or text as input, user is according to service needed configuration data treatment scheme, and process the data of input, the result processed the most at last is saved in database or large data sets group; The described data to input are carried out process and are specially: user is configuration data treatment scheme according to demand, each processing node of flow chart of data processing is the usability of program fragments of one section of SQL statement or a section business process, scheduler program meeting start by set date flow chart of data processing, and perform each node SQL statement or the usability of program fragments of flow chart of data processing successively, the result obtained after execution is stored in a buffer memory.
2. the large data managing method of one according to claim 1, it is characterized in that: described using database data extract or text be specially as input: configuration database data source information, and use sqoop instrument, the table data of database are directly drawn in distributed file system by a setting form; Or allow business procedure data result to be put in the assigned catalogue of file server, file server by this catalogue of monitoring, once find that new file will upload to distributed file system automatically.
3. the large data managing method of one according to claim 2, is characterized in that: the described form that sets is: with tab as field delimiter, take carriage return character as record decollator.
4. the large data managing method of one according to claim 1, it is characterized in that: the data that described method also comprises the result of process export, these data export and are specially: by user's specified database data source information, by sqoop instrument, the data in buffer memory are write direct in the table of database; Or by the formulation catalogue of the data of buffer memory write distributed file system, and according to user's needs, be sent to the file server of specifying.
5. the large data managing method of one according to claim 1, is characterized in that: described flow chart of data processing regularly performs, carries out Resourse Distribute and task management by dispatch service.
CN201510306918.3A 2015-06-05 2015-06-05 Big data management method Pending CN104933119A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510306918.3A CN104933119A (en) 2015-06-05 2015-06-05 Big data management method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510306918.3A CN104933119A (en) 2015-06-05 2015-06-05 Big data management method

Publications (1)

Publication Number Publication Date
CN104933119A true CN104933119A (en) 2015-09-23

Family

ID=54120286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510306918.3A Pending CN104933119A (en) 2015-06-05 2015-06-05 Big data management method

Country Status (1)

Country Link
CN (1) CN104933119A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363782A (en) * 2018-02-11 2018-08-03 中国联合网络通信集团有限公司 A kind of data cleaning method and Data clean system
CN110851426A (en) * 2019-11-19 2020-02-28 重庆华龙网海数科技有限公司 Data DNA visualization relation analysis system and method
CN113360558A (en) * 2021-06-04 2021-09-07 北京京东振世信息技术有限公司 Data processing method, data processing device, electronic device, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090172050A1 (en) * 2008-01-02 2009-07-02 Sandisk Il Ltd. Dual representation of stored digital content
CN103049556A (en) * 2012-12-28 2013-04-17 中国科学院深圳先进技术研究院 Fast statistical query method for mass medical data
CN103810272A (en) * 2014-02-11 2014-05-21 北京邮电大学 Data processing method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090172050A1 (en) * 2008-01-02 2009-07-02 Sandisk Il Ltd. Dual representation of stored digital content
CN103049556A (en) * 2012-12-28 2013-04-17 中国科学院深圳先进技术研究院 Fast statistical query method for mass medical data
CN103810272A (en) * 2014-02-11 2014-05-21 北京邮电大学 Data processing method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108363782A (en) * 2018-02-11 2018-08-03 中国联合网络通信集团有限公司 A kind of data cleaning method and Data clean system
CN110851426A (en) * 2019-11-19 2020-02-28 重庆华龙网海数科技有限公司 Data DNA visualization relation analysis system and method
CN113360558A (en) * 2021-06-04 2021-09-07 北京京东振世信息技术有限公司 Data processing method, data processing device, electronic device, and storage medium
CN113360558B (en) * 2021-06-04 2023-09-29 北京京东振世信息技术有限公司 Data processing method, data processing device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103235811B (en) A kind of date storage method and device
CN107544984A (en) A kind of method and apparatus of data processing
CN104536965B (en) A kind of data query display systems under the conditions of big data and method
CN105069109B (en) A kind of method and system of distributed data base dilatation
CN104112026A (en) Short message text classifying method and system
CN110502583A (en) Distributed Data Synchronization method, apparatus, equipment and readable storage medium storing program for executing
CN104572415A (en) Event log recording method applicable to distributed system
CN108536745A (en) Tables of data extracting method, terminal, equipment and storage medium based on Shell
CN103944964A (en) Distributed system and method carrying out expansion step by step through same
CN104933119A (en) Big data management method
CN102508886A (en) Extensive makeup language (XML)-based method for synchronously updating increment of spatial data
CN105610899A (en) Text file parallel uploading method and device
CN105740410A (en) Data statistics method based on Hbase secondary index
CN106156227A (en) A kind of data transmission method and device
CN102404411A (en) Data synchronization method of cloud storage system
CN104636395A (en) Count processing method and device
CN106777272A (en) A kind of comparing and synchronous method
CN101393526A (en) Data synchronization method capable of implementing programmable data conversion and file conversion function
CN101645073A (en) Method for guiding prior database file into embedded type database
CN103593345A (en) Webpage flow chart editing method and system
CN104133891A (en) Method for storing massive structural data based on relational database
CN106990913B (en) A kind of distributed approach of extensive streaming collective data
CN103440302A (en) Real-time data exchange method and system
CN112130846A (en) Three-micro one-screen publishing engine system and publishing method
CN109840844B (en) Financial big data acquisition processing device and system based on FPGA

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150923