CN105760505A - Hive-based historical data analysis and archiving method - Google Patents

Hive-based historical data analysis and archiving method Download PDF

Info

Publication number
CN105760505A
CN105760505A CN201610098013.6A CN201610098013A CN105760505A CN 105760505 A CN105760505 A CN 105760505A CN 201610098013 A CN201610098013 A CN 201610098013A CN 105760505 A CN105760505 A CN 105760505A
Authority
CN
China
Prior art keywords
data
hbase
hive
archiving
data analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610098013.6A
Other languages
Chinese (zh)
Inventor
孙海峰
王传超
毛立花
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Group Co Ltd
Original Assignee
Inspur Software Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Group Co Ltd filed Critical Inspur Software Group Co Ltd
Priority to CN201610098013.6A priority Critical patent/CN105760505A/en
Publication of CN105760505A publication Critical patent/CN105760505A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a hive-based historical data analysis and archiving method, which comprises the steps of constructing a relational data model and a cloud data model, setting a storage mode of metadata, a storage mode of entity data and a mapping step of a data mapping layer in a relational database, analyzing a large data volume, performing association decomposition and integration on a large data volume file stored by a hbase, optimizing an analysis result, storing and archiving according to different dimensions, and providing an operation interface. According to the method, the key/value format data file is converted into the structured data format through the data storage model, the result data is subjected to partition archiving and storage through multi-table analysis and association, and the problems of hbase data rapid analysis, hbase data archiving, non-professional operation and the like are solved.

Description

Historical data analysis and archiving method based on hive
Technical field
The present invention relates to the storage of big data, analysis technical field, be specifically related to a kind of historical data analysis based on hive and archiving method.
Background technology
Big data, while bringing huge technological challenge, also bring huge technological innovation and business opportunities.Constantly the big packet of accumulation is containing the deep knowledge much not possessed when small data quantity and value, it will be that industry/enterprise brings huge commercial value that big data analysis excavates, realize the value-added service of various high added value, promote economic benefit and the social benefit of industry/enterprise further.Owing to big data imply huge deep value, U.S. government thinks that big data are " following young oils ", and following scientific and technological and economic development will be brought profound influence.Therefore, in future, country has the scale of data and the ability of maintenance data will become the important component part of overall national strength, occupying, control and using and also will become contention focus new between country and between enterprise data.
Summary of the invention
The technical problem to be solved in the present invention is: the present invention provides a kind of historical data analysis based on hive and archiving method, the model stored by data is by the data file of key/value form, it is converted into structurized data form, and by multilist analyzing and associating, result data is carried out subregion filing storage.
The technical solution adopted in the present invention is:
Historical data analysis and archiving method based on hive, described method is by building relational data model and cloud data model, the mapping step etc. of the memory module of metadata in relational database, the memory module of solid data and data mapping layer is set, big data quantity is analyzed, it is associated decomposing by the file of the hbase big data quantity stored and integrates, optimize and analyze result, achieve by different dimensions storage, and operation interface is provided, be easy to operation, solve hbase data are quickly analyzed, hbase archives data, the problem such as layman's operation.
The data analysis system deploying step that described method relates to is as follows:
The first step, disposes service and the Hbase distributed storage nodes such as hbase, hive relevant for Hadoop;
Second step, mounting structure data base;
3rd step, disposes data analysis system.
Described method concrete operation step is as follows:
Step 1:hive is outside maps hbase data;
Step 2: create hive and achieve table, and create the division;
Step 3: the hbase data that outside maps are analyzed, are stored in interim table;
Step 4: verify interim table content;
Step 5: after being verified, is inserted with interim table content in the archive table of subregion;
Step 6: be operated by interface, carries out data summarization and inquiry.
Hive is based on a Tool for Data Warehouse of Hadoop, it is possible to structurized data file is mapped as a database table, and provides complete sql query function, it is possible to sql statement is converted to MapReduce task and runs.Its advantage is that learning cost is low, it is possible to quickly realize simple MapReduce statistics by class SQL statement, it is not necessary to develops special MapReduce application, is very suitable for the statistical analysis of data warehouse.
The invention have the benefit that
The model that the present invention is stored by data is by the data file of key/value form, it is converted into structurized data form, and by multilist analyzing and associating, result data is carried out subregion filing storage, solve hbase data are quickly analyzed, hbase archives data, the problem such as layman's operation.
Accompanying drawing explanation
Fig. 1 is the flowage structure figure of the present invention.
Detailed description of the invention
Below in conjunction with Figure of description, by detailed description of the invention, the present invention is further described:
Embodiment 1:
Historical data analysis and archiving method based on hive, described method is by building relational data model and cloud data model, the mapping step etc. of the memory module of metadata in relational database, the memory module of solid data and data mapping layer is set, big data quantity is analyzed, it is associated decomposing by the file of the hbase big data quantity stored and integrates, optimize and analyze result, achieve by different dimensions storage, and operation interface is provided, be easy to operation, solve hbase data are quickly analyzed, hbase archives data, the problem such as layman's operation.
Embodiment 2:
On the basis of embodiment 1, the data analysis system deploying step that method described in the present embodiment relates to is as follows:
The first step, disposes service and the Hbase distributed storage nodes such as hbase, hive relevant for Hadoop;
Second step, mounting structure data base;
3rd step, disposes data analysis system.
Embodiment 2:
As it is shown in figure 1, on the basis of embodiment 1 or 2, described in the present embodiment, method concrete operation step is as follows:
Step 1:hive is outside maps hbase data;
Step 2: create hive and achieve table, and create the division;
Step 3: the hbase data that outside maps are analyzed, are stored in interim table;
Step 4: verify interim table content;
Step 5: after being verified, is inserted with interim table content in the archive table of subregion;
Step 6: be operated by interface, carries out data summarization and inquiry.
Embodiment of above is merely to illustrate the present invention; and it is not limitation of the present invention; those of ordinary skill about technical field; without departing from the spirit and scope of the present invention; can also make a variety of changes and modification; therefore all equivalent technical schemes fall within scope of the invention, and the scope of patent protection of the present invention should be defined by the claims.

Claims (3)

1. based on the historical data analysis of hive and archiving method, it is characterized in that: described method is by building relational data model and cloud data model, the mapping step of the memory module of metadata in relational database, the memory module of solid data and data mapping layer is set, big data quantity is analyzed, it is associated decomposing by the file of the hbase big data quantity stored and integrates, optimize and analyze result, achieve by different dimensions storage, and operation interface is provided.
2. the historical data analysis based on hive according to claim 1 and archiving method, it is characterised in that the data analysis system deploying step that described method relates to is as follows:
The first step, disposes hbase, hive service relevant for Hadoop and Hbase distributed storage node;
Second step, mounting structure data base;
3rd step, disposes data analysis system.
3. the historical data analysis based on hive according to claim 1 and 2 and archiving method, it is characterised in that described method concrete operation step is as follows:
Step 1:hive is outside maps hbase data;
Step 2: create hive and achieve table, and create the division;
Step 3: the hbase data that outside maps are analyzed, are stored in interim table;
Step 4: verify interim table content;
Step 5: after being verified, is inserted with interim table content in the archive table of subregion;
Step 6: be operated by interface, carries out data summarization and inquiry.
CN201610098013.6A 2016-02-23 2016-02-23 Hive-based historical data analysis and archiving method Pending CN105760505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610098013.6A CN105760505A (en) 2016-02-23 2016-02-23 Hive-based historical data analysis and archiving method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610098013.6A CN105760505A (en) 2016-02-23 2016-02-23 Hive-based historical data analysis and archiving method

Publications (1)

Publication Number Publication Date
CN105760505A true CN105760505A (en) 2016-07-13

Family

ID=56331049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610098013.6A Pending CN105760505A (en) 2016-02-23 2016-02-23 Hive-based historical data analysis and archiving method

Country Status (1)

Country Link
CN (1) CN105760505A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227862A (en) * 2016-07-29 2016-12-14 浪潮软件集团有限公司 E-commerce data integration method based on distribution
CN106250465A (en) * 2016-07-29 2016-12-21 沈阳华创风能有限公司 A kind of method and device improving database filing efficiency
CN106844496A (en) * 2016-12-26 2017-06-13 山东中创软件商用中间件股份有限公司 Data transmission scheduling method, device and server based on ESB
CN106909645A (en) * 2017-02-21 2017-06-30 中国科学院电子学研究所 A kind of space-time data organization of unity method of expansible definition
CN108268565A (en) * 2017-01-04 2018-07-10 北京京东尚科信息技术有限公司 Method and system based on data warehouse processing user browsing behavior data
CN108470045A (en) * 2018-03-06 2018-08-31 平安科技(深圳)有限公司 The method and storage medium that electronic device, data chain type are filed

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521246A (en) * 2011-11-11 2012-06-27 国网信息通信有限公司 Cloud data warehouse system
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
US20140156638A1 (en) * 2012-11-30 2014-06-05 Orbis Technologies, Inc. Ontology harmonization and mediation systems and methods

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521246A (en) * 2011-11-11 2012-06-27 国网信息通信有限公司 Cloud data warehouse system
US20140156638A1 (en) * 2012-11-30 2014-06-05 Orbis Technologies, Inc. Ontology harmonization and mediation systems and methods
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227862A (en) * 2016-07-29 2016-12-14 浪潮软件集团有限公司 E-commerce data integration method based on distribution
CN106250465A (en) * 2016-07-29 2016-12-21 沈阳华创风能有限公司 A kind of method and device improving database filing efficiency
CN106844496A (en) * 2016-12-26 2017-06-13 山东中创软件商用中间件股份有限公司 Data transmission scheduling method, device and server based on ESB
CN106844496B (en) * 2016-12-26 2020-04-10 山东中创软件商用中间件股份有限公司 Data transmission scheduling method and device based on enterprise service bus and server
CN108268565A (en) * 2017-01-04 2018-07-10 北京京东尚科信息技术有限公司 Method and system based on data warehouse processing user browsing behavior data
CN106909645A (en) * 2017-02-21 2017-06-30 中国科学院电子学研究所 A kind of space-time data organization of unity method of expansible definition
CN106909645B (en) * 2017-02-21 2019-03-26 中国科学院电子学研究所 A kind of space-time data organization of unity method of expansible definition
CN108470045A (en) * 2018-03-06 2018-08-31 平安科技(深圳)有限公司 The method and storage medium that electronic device, data chain type are filed
CN108470045B (en) * 2018-03-06 2020-02-18 平安科技(深圳)有限公司 Electronic device, data chain archiving method and storage medium
US11106649B2 (en) 2018-03-06 2021-08-31 Ping An Technology (Shenzhen) Co., Ltd. Electronic apparatus, data chain archiving method, system and storage medium

Similar Documents

Publication Publication Date Title
CN105760505A (en) Hive-based historical data analysis and archiving method
US9904694B2 (en) NoSQL relational database (RDB) data movement
CN107038207B (en) Data query method, data processing method and device
CN111046630B (en) Syntax tree extraction method of JSON data
CN104133858B (en) Intelligence analysis system with double engines and method based on row storage
CN104123288B (en) A kind of data query method and device
Karnitis et al. Migration of relational database to document-oriented database: structure denormalization and data transformation
CN105701098B (en) The method and apparatus for generating index for the table in database
US11941034B2 (en) Conversational database analysis
CN105912594B (en) SQL statement processing method and system
CN103778133A (en) Database object changing method and device
CN111159180A (en) Data processing method and system based on data resource directory construction
CN104239377A (en) Platform-crossing data retrieval method and device
CN103778148A (en) Life cycle management method and equipment for data file of Hadoop distributed file system
CN108446391A (en) Processing method, device, electronic equipment and the computer-readable medium of data
CN104102701A (en) Hive-based method for filing and inquiring historical data
US11556571B2 (en) Phrase indexing
CN104391941A (en) Method for rapidly establishing full-text retrieval tool for common files
CN103177046B (en) A kind of data processing method based on row storage data base and equipment
CN110287379B (en) Table splitting and data extracting method based on logic tree
CN101645073A (en) Method for guiding prior database file into embedded type database
CN110968555B (en) Dimension data processing method and device
US9881055B1 (en) Language conversion based on S-expression tabular structure
CN111221967A (en) Language data classification storage system based on block chain architecture
KR101256922B1 (en) Method for distributed decision tree induction algorithm for prediction and analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160713

RJ01 Rejection of invention patent application after publication