CN105760505A

CN105760505A - Hive-based historical data analysis and archiving method

Info

Publication number: CN105760505A
Application number: CN201610098013.6A
Authority: CN
Inventors: 孙海峰; 王传超; 毛立花
Original assignee: Inspur Software Group Co Ltd
Current assignee: Inspur Software Group Co Ltd
Priority date: 2016-02-23
Filing date: 2016-02-23
Publication date: 2016-07-13

Abstract

The invention discloses a hive-based historical data analysis and archiving method, which comprises the steps of constructing a relational data model and a cloud data model, setting a storage mode of metadata, a storage mode of entity data and a mapping step of a data mapping layer in a relational database, analyzing a large data volume, performing association decomposition and integration on a large data volume file stored by a hbase, optimizing an analysis result, storing and archiving according to different dimensions, and providing an operation interface. According to the method, the key/value format data file is converted into the structured data format through the data storage model, the result data is subjected to partition archiving and storage through multi-table analysis and association, and the problems of hbase data rapid analysis, hbase data archiving, non-professional operation and the like are solved.

Description

Historical data analysis and archiving method based on hive

Technical field

The present invention relates to the storage of big data, analysis technical field, be specifically related to a kind of historical data analysis based on hive and archiving method.

Background technology

Big data, while bringing huge technological challenge, also bring huge technological innovation and business opportunities.Constantly the big packet of accumulation is containing the deep knowledge much not possessed when small data quantity and value, it will be that industry/enterprise brings huge commercial value that big data analysis excavates, realize the value-added service of various high added value, promote economic benefit and the social benefit of industry/enterprise further.Owing to big data imply huge deep value, U.S. government thinks that big data are " following young oils ", and following scientific and technological and economic development will be brought profound influence.Therefore, in future, country has the scale of data and the ability of maintenance data will become the important component part of overall national strength, occupying, control and using and also will become contention focus new between country and between enterprise data.

Summary of the invention

The technical problem to be solved in the present invention is: the present invention provides a kind of historical data analysis based on hive and archiving method, the model stored by data is by the data file of key/value form, it is converted into structurized data form, and by multilist analyzing and associating, result data is carried out subregion filing storage.

The technical solution adopted in the present invention is:

Historical data analysis and archiving method based on hive, described method is by building relational data model and cloud data model, the mapping step etc. of the memory module of metadata in relational database, the memory module of solid data and data mapping layer is set, big data quantity is analyzed, it is associated decomposing by the file of the hbase big data quantity stored and integrates, optimize and analyze result, achieve by different dimensions storage, and operation interface is provided, be easy to operation, solve hbase data are quickly analyzed, hbase archives data, the problem such as layman's operation.

The data analysis system deploying step that described method relates to is as follows:

The first step, disposes service and the Hbase distributed storage nodes such as hbase, hive relevant for Hadoop；

Second step, mounting structure data base；

3rd step, disposes data analysis system.

Described method concrete operation step is as follows:

Step 1:hive is outside maps hbase data；

Step 2: create hive and achieve table, and create the division；

Step 3: the hbase data that outside maps are analyzed, are stored in interim table；

Step 4: verify interim table content；

Step 5: after being verified, is inserted with interim table content in the archive table of subregion；

Step 6: be operated by interface, carries out data summarization and inquiry.

Hive is based on a Tool for Data Warehouse of Hadoop, it is possible to structurized data file is mapped as a database table, and provides complete sql query function, it is possible to sql statement is converted to MapReduce task and runs.Its advantage is that learning cost is low, it is possible to quickly realize simple MapReduce statistics by class SQL statement, it is not necessary to develops special MapReduce application, is very suitable for the statistical analysis of data warehouse.

The invention have the benefit that

The model that the present invention is stored by data is by the data file of key/value form, it is converted into structurized data form, and by multilist analyzing and associating, result data is carried out subregion filing storage, solve hbase data are quickly analyzed, hbase archives data, the problem such as layman's operation.

Accompanying drawing explanation

Fig. 1 is the flowage structure figure of the present invention.

Detailed description of the invention

Below in conjunction with Figure of description, by detailed description of the invention, the present invention is further described:

Embodiment 1:

Embodiment 2:

On the basis of embodiment 1, the data analysis system deploying step that method described in the present embodiment relates to is as follows:

Second step, mounting structure data base；

3rd step, disposes data analysis system.

Embodiment 2:

As it is shown in figure 1, on the basis of embodiment 1 or 2, described in the present embodiment, method concrete operation step is as follows:

Step 1:hive is outside maps hbase data；

Step 2: create hive and achieve table, and create the division；

Step 4: verify interim table content；

Step 6: be operated by interface, carries out data summarization and inquiry.

Embodiment of above is merely to illustrate the present invention; and it is not limitation of the present invention; those of ordinary skill about technical field; without departing from the spirit and scope of the present invention; can also make a variety of changes and modification; therefore all equivalent technical schemes fall within scope of the invention, and the scope of patent protection of the present invention should be defined by the claims.

Claims

1. based on the historical data analysis of hive and archiving method, it is characterized in that: described method is by building relational data model and cloud data model, the mapping step of the memory module of metadata in relational database, the memory module of solid data and data mapping layer is set, big data quantity is analyzed, it is associated decomposing by the file of the hbase big data quantity stored and integrates, optimize and analyze result, achieve by different dimensions storage, and operation interface is provided.

2. the historical data analysis based on hive according to claim 1 and archiving method, it is characterised in that the data analysis system deploying step that described method relates to is as follows:

The first step, disposes hbase, hive service relevant for Hadoop and Hbase distributed storage node；

Second step, mounting structure data base；

3rd step, disposes data analysis system.

3. the historical data analysis based on hive according to claim 1 and 2 and archiving method, it is characterised in that described method concrete operation step is as follows:

Step 1:hive is outside maps hbase data；

Step 2: create hive and achieve table, and create the division；

Step 4: verify interim table content；

Step 6: be operated by interface, carries out data summarization and inquiry.