CN110765173A

CN110765173A - Data management method and system under big data environment

Info

Publication number: CN110765173A
Application number: CN201910811160.7A
Authority: CN
Inventors: 李卫群; 张涛; 陆苇; 雷厚宇; 兰海翔
Original assignee: Guizhou Li Chuang Technology Development Co Ltd
Current assignee: Guizhou Li Chuang Technology Development Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2020-02-07

Abstract

The invention relates to a data management method and a system under a big data environment, wherein the method comprises the following steps of collecting big data of various data sources by using an ETL tool, and preprocessing the big data to form preprocessed big data; performing algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form basic environment big data; and providing data visualization service for the basic environment big data through a resource scheduling interface. The invention provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; the method provides an application framework of accessing the ETL tool to data, distributed computing, flow computing and memory computing, and has the capability of providing big data technology mining and the capability of algorithm modeling.

Description

Data management method and system under big data environment

Technical Field

The invention relates to the field of big data, in particular to a data management method and a data management system under a big data environment.

Background

Big data (big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth rate and diversified information asset which needs a new processing mode to have stronger decision-making power, insight discovery power and flow optimization capability. After obtaining the big data resource, how to manage the obtained big data resource is an important problem at present and is also a problem which is needed to be solved urgently at present.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a data management method and a data management system under a big data environment, which can effectively manage big data.

The technical scheme for solving the technical problems is as follows: a data management method in a big data environment comprises the following steps,

s1, acquiring big data of multiple data sources by using an ETL tool, and preprocessing the big data to form preprocessed big data;

s2, performing algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of a basic environment;

and S3, providing data visualization service for the basic environment big data through a resource scheduling interface.

The invention has the beneficial effects that: the data management method under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.

On the basis of the technical scheme, the invention can be further improved as follows.

Further, the data sources include databases, NoSql databases, text files, and unstructured databases.

Further, the pretreatment includes an extraction process, a cleaning process, and a conversion process.

Further, the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.

Further, the data visualization service includes data management, platform management, and security management.

Based on the data management method under the big data environment, the invention also provides a data management system under the big data environment.

A data management system in big data environment comprises the following modules,

the acquisition preprocessing module is used for acquiring big data of various data sources by utilizing an ETL tool and preprocessing the big data to form preprocessed big data;

the big data infrastructure module is used for carrying out algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of the foundation environment;

and the service providing module is used for providing data visualization service for the basic environment big data through a resource scheduling interface.

The invention has the beneficial effects that: the data management system under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.

Drawings

FIG. 1 is a flow chart of a data management method in a big data environment according to the present invention;

FIG. 2 is a block diagram of a data management system in a big data environment according to the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

As shown in fig. 1, a data management method in a big data environment includes the following steps,

In this particular embodiment: the data sources include databases, NoSql databases, text files, and unstructured databases.

In this particular embodiment: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.

In this particular embodiment: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.

In this particular embodiment: the data visualization service includes data management, platform management, and security management. The data management comprises metadata management, data access management, data extraction management and data table management; the platform management comprises cluster management, cluster monitoring and task scheduling management; the security management includes authentication center management and user management.

The data management method under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.

The data management system under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A data management method under a big data environment is characterized in that: comprises the following steps of (a) carrying out,

2. The data management method in big data environment according to claim 1, wherein: the data sources include databases, NoSql databases, text files, and unstructured databases.

3. The data management method in big data environment according to claim 1 or 2, characterized in that: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.

4. The data management method in big data environment according to claim 1 or 2, characterized in that: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.

5. The data management method in big data environment according to claim 1 or 2, characterized in that: the data visualization service includes data management, platform management, and security management.

6. A data management system under big data environment, characterized by: comprises the following modules which are used for realizing the functions of the system,

7. The data management system in big data environment according to claim 6, wherein: the data sources include databases, NoSql databases, text files, and unstructured databases.

8. The data management system in big data environment according to claim 6 or 7, characterized in that: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.

9. The data management system in big data environment according to claim 6 or 7, characterized in that: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.

10. The data management system in big data environment according to claim 6 or 7, characterized in that: the data visualization service includes data management, platform management, and security management.