CN110765173A - Data management method and system under big data environment - Google Patents

Data management method and system under big data environment Download PDF

Info

Publication number
CN110765173A
CN110765173A CN201910811160.7A CN201910811160A CN110765173A CN 110765173 A CN110765173 A CN 110765173A CN 201910811160 A CN201910811160 A CN 201910811160A CN 110765173 A CN110765173 A CN 110765173A
Authority
CN
China
Prior art keywords
data
big data
big
environment
management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910811160.7A
Other languages
Chinese (zh)
Inventor
李卫群
张涛
陆苇
雷厚宇
兰海翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Li Chuang Technology Development Co Ltd
Original Assignee
Guizhou Li Chuang Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Li Chuang Technology Development Co Ltd filed Critical Guizhou Li Chuang Technology Development Co Ltd
Priority to CN201910811160.7A priority Critical patent/CN110765173A/en
Publication of CN110765173A publication Critical patent/CN110765173A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a data management method and a system under a big data environment, wherein the method comprises the following steps of collecting big data of various data sources by using an ETL tool, and preprocessing the big data to form preprocessed big data; performing algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form basic environment big data; and providing data visualization service for the basic environment big data through a resource scheduling interface. The invention provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; the method provides an application framework of accessing the ETL tool to data, distributed computing, flow computing and memory computing, and has the capability of providing big data technology mining and the capability of algorithm modeling.

Description

Data management method and system under big data environment
Technical Field
The invention relates to the field of big data, in particular to a data management method and a data management system under a big data environment.
Background
Big data (big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth rate and diversified information asset which needs a new processing mode to have stronger decision-making power, insight discovery power and flow optimization capability. After obtaining the big data resource, how to manage the obtained big data resource is an important problem at present and is also a problem which is needed to be solved urgently at present.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a data management method and a data management system under a big data environment, which can effectively manage big data.
The technical scheme for solving the technical problems is as follows: a data management method in a big data environment comprises the following steps,
s1, acquiring big data of multiple data sources by using an ETL tool, and preprocessing the big data to form preprocessed big data;
s2, performing algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of a basic environment;
and S3, providing data visualization service for the basic environment big data through a resource scheduling interface.
The invention has the beneficial effects that: the data management method under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, the data sources include databases, NoSql databases, text files, and unstructured databases.
Further, the pretreatment includes an extraction process, a cleaning process, and a conversion process.
Further, the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
Further, the data visualization service includes data management, platform management, and security management.
Based on the data management method under the big data environment, the invention also provides a data management system under the big data environment.
A data management system in big data environment comprises the following modules,
the acquisition preprocessing module is used for acquiring big data of various data sources by utilizing an ETL tool and preprocessing the big data to form preprocessed big data;
the big data infrastructure module is used for carrying out algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of the foundation environment;
and the service providing module is used for providing data visualization service for the basic environment big data through a resource scheduling interface.
The invention has the beneficial effects that: the data management system under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, the data sources include databases, NoSql databases, text files, and unstructured databases.
Further, the pretreatment includes an extraction process, a cleaning process, and a conversion process.
Further, the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
Further, the data visualization service includes data management, platform management, and security management.
Drawings
FIG. 1 is a flow chart of a data management method in a big data environment according to the present invention;
FIG. 2 is a block diagram of a data management system in a big data environment according to the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, a data management method in a big data environment includes the following steps,
s1, acquiring big data of multiple data sources by using an ETL tool, and preprocessing the big data to form preprocessed big data;
s2, performing algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of a basic environment;
and S3, providing data visualization service for the basic environment big data through a resource scheduling interface.
In this particular embodiment: the data sources include databases, NoSql databases, text files, and unstructured databases.
In this particular embodiment: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.
In this particular embodiment: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
In this particular embodiment: the data visualization service includes data management, platform management, and security management. The data management comprises metadata management, data access management, data extraction management and data table management; the platform management comprises cluster management, cluster monitoring and task scheduling management; the security management includes authentication center management and user management.
The data management method under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.
Based on the data management method under the big data environment, the invention also provides a data management system under the big data environment.
A data management system in big data environment comprises the following modules,
the acquisition preprocessing module is used for acquiring big data of various data sources by utilizing an ETL tool and preprocessing the big data to form preprocessed big data;
the big data infrastructure module is used for carrying out algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of the foundation environment;
and the service providing module is used for providing data visualization service for the basic environment big data through a resource scheduling interface.
In this particular embodiment: the data sources include databases, NoSql databases, text files, and unstructured databases.
In this particular embodiment: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.
In this particular embodiment: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
In this particular embodiment: the data visualization service includes data management, platform management, and security management. The data management comprises metadata management, data access management, data extraction management and data table management; the platform management comprises cluster management, cluster monitoring and task scheduling management; the security management includes authentication center management and user management.
The data management system under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A data management method under a big data environment is characterized in that: comprises the following steps of (a) carrying out,
s1, acquiring big data of multiple data sources by using an ETL tool, and preprocessing the big data to form preprocessed big data;
s2, performing algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of a basic environment;
and S3, providing data visualization service for the basic environment big data through a resource scheduling interface.
2. The data management method in big data environment according to claim 1, wherein: the data sources include databases, NoSql databases, text files, and unstructured databases.
3. The data management method in big data environment according to claim 1 or 2, characterized in that: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.
4. The data management method in big data environment according to claim 1 or 2, characterized in that: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
5. The data management method in big data environment according to claim 1 or 2, characterized in that: the data visualization service includes data management, platform management, and security management.
6. A data management system under big data environment, characterized by: comprises the following modules which are used for realizing the functions of the system,
the acquisition preprocessing module is used for acquiring big data of various data sources by utilizing an ETL tool and preprocessing the big data to form preprocessed big data;
the big data infrastructure module is used for carrying out algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of the foundation environment;
and the service providing module is used for providing data visualization service for the basic environment big data through a resource scheduling interface.
7. The data management system in big data environment according to claim 6, wherein: the data sources include databases, NoSql databases, text files, and unstructured databases.
8. The data management system in big data environment according to claim 6 or 7, characterized in that: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.
9. The data management system in big data environment according to claim 6 or 7, characterized in that: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
10. The data management system in big data environment according to claim 6 or 7, characterized in that: the data visualization service includes data management, platform management, and security management.
CN201910811160.7A 2019-08-30 2019-08-30 Data management method and system under big data environment Pending CN110765173A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910811160.7A CN110765173A (en) 2019-08-30 2019-08-30 Data management method and system under big data environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910811160.7A CN110765173A (en) 2019-08-30 2019-08-30 Data management method and system under big data environment

Publications (1)

Publication Number Publication Date
CN110765173A true CN110765173A (en) 2020-02-07

Family

ID=69329261

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910811160.7A Pending CN110765173A (en) 2019-08-30 2019-08-30 Data management method and system under big data environment

Country Status (1)

Country Link
CN (1) CN110765173A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597342A (en) * 2020-12-15 2021-04-02 福建省星云大数据应用服务有限公司 Environment-friendly data management method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9092502B1 (en) * 2013-02-25 2015-07-28 Leidos, Inc. System and method for correlating cloud-based big data in real-time for intelligent analytics and multiple end uses
CN105095653A (en) * 2015-07-13 2015-11-25 湖南互动传媒有限公司 Basic service system for medical large data application
CN107361396A (en) * 2017-07-10 2017-11-21 红云红河烟草(集团)有限责任公司 Tobacco based on big data dries the prediction of silk moisture and control system
CN107920126A (en) * 2017-11-30 2018-04-17 河南云保遥感科技有限公司 Big data management method between a kind of distributed space under cloud environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9092502B1 (en) * 2013-02-25 2015-07-28 Leidos, Inc. System and method for correlating cloud-based big data in real-time for intelligent analytics and multiple end uses
CN105095653A (en) * 2015-07-13 2015-11-25 湖南互动传媒有限公司 Basic service system for medical large data application
CN107361396A (en) * 2017-07-10 2017-11-21 红云红河烟草(集团)有限责任公司 Tobacco based on big data dries the prediction of silk moisture and control system
CN107920126A (en) * 2017-11-30 2018-04-17 河南云保遥感科技有限公司 Big data management method between a kind of distributed space under cloud environment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597342A (en) * 2020-12-15 2021-04-02 福建省星云大数据应用服务有限公司 Environment-friendly data management method

Similar Documents

Publication Publication Date Title
Yang et al. A system architecture for manufacturing process analysis based on big data and process mining techniques
CN110750650A (en) Construction method and device of enterprise knowledge graph
CN106709012A (en) Method and device for analyzing big data
CN105608758A (en) Big data analysis platform apparatus and method based on algorithm configuration and distributed stream computing
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN110990664A (en) Big data operation management system
CN103186541A (en) Generation method and device for mapping relationship
CN105556517A (en) Smart search refinement
CN104317970A (en) Data flow type processing method based on data processing center
CN106777142A (en) Service layer's system and method based on mobile Internet mass data
CN104216966A (en) Method supporting index creation in various modes
CN103970891A (en) Method for inquiring user interest information based on context
CN114238388A (en) Heterogeneous data collection and retrieval system based on multiple protocols
Kun et al. Application of big data technology in scientific research data management of military enterprises
CN115237857A (en) Log processing method and device, computer equipment and storage medium
Sundarakumar et al. A heuristic approach to improve the data processing in big data using enhanced Salp Swarm algorithm (ESSA) and MK-means algorithm
CN104573074A (en) High-speed calculating and analyzing method based on hospital data
CN111159152A (en) Secondary operation and maintenance data fusion method based on big data processing technology
CN110765173A (en) Data management method and system under big data environment
CN112288317B (en) Industrial big data analysis platform and method based on multi-source heterogeneous data governance
CN111737490B (en) Knowledge graph ontology model generation method and device based on banking channel
CN107423035B (en) Product data management system in software development process
CN113254517A (en) Service providing method based on internet big data
CN112650739A (en) Data storage processing method and device for coal mine data middling station
CN107992590B (en) Big data system beneficial to information comparison

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200207

RJ01 Rejection of invention patent application after publication