CN110765173A - Data management method and system under big data environment - Google Patents
Data management method and system under big data environment Download PDFInfo
- Publication number
- CN110765173A CN110765173A CN201910811160.7A CN201910811160A CN110765173A CN 110765173 A CN110765173 A CN 110765173A CN 201910811160 A CN201910811160 A CN 201910811160A CN 110765173 A CN110765173 A CN 110765173A
- Authority
- CN
- China
- Prior art keywords
- data
- big data
- big
- environment
- management
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013523 data management Methods 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000007726 management method Methods 0.000 claims abstract description 42
- 238000004364 calculation method Methods 0.000 claims abstract description 31
- 238000004458 analytical method Methods 0.000 claims abstract description 13
- 238000007418 data mining Methods 0.000 claims abstract description 13
- 238000013500 data storage Methods 0.000 claims abstract description 13
- 238000013079 data visualisation Methods 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 238000004140 cleaning Methods 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 230000002452 interceptive effect Effects 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 6
- 230000005055 memory storage Effects 0.000 claims description 6
- 230000006870 function Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 5
- 239000002360 explosive Substances 0.000 abstract description 5
- 238000005065 mining Methods 0.000 abstract description 5
- 230000004927 fusion Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013075 data extraction Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a data management method and a system under a big data environment, wherein the method comprises the following steps of collecting big data of various data sources by using an ETL tool, and preprocessing the big data to form preprocessed big data; performing algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form basic environment big data; and providing data visualization service for the basic environment big data through a resource scheduling interface. The invention provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; the method provides an application framework of accessing the ETL tool to data, distributed computing, flow computing and memory computing, and has the capability of providing big data technology mining and the capability of algorithm modeling.
Description
Technical Field
The invention relates to the field of big data, in particular to a data management method and a data management system under a big data environment.
Background
Big data (big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth rate and diversified information asset which needs a new processing mode to have stronger decision-making power, insight discovery power and flow optimization capability. After obtaining the big data resource, how to manage the obtained big data resource is an important problem at present and is also a problem which is needed to be solved urgently at present.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a data management method and a data management system under a big data environment, which can effectively manage big data.
The technical scheme for solving the technical problems is as follows: a data management method in a big data environment comprises the following steps,
s1, acquiring big data of multiple data sources by using an ETL tool, and preprocessing the big data to form preprocessed big data;
s2, performing algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of a basic environment;
and S3, providing data visualization service for the basic environment big data through a resource scheduling interface.
The invention has the beneficial effects that: the data management method under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, the data sources include databases, NoSql databases, text files, and unstructured databases.
Further, the pretreatment includes an extraction process, a cleaning process, and a conversion process.
Further, the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
Further, the data visualization service includes data management, platform management, and security management.
Based on the data management method under the big data environment, the invention also provides a data management system under the big data environment.
A data management system in big data environment comprises the following modules,
the acquisition preprocessing module is used for acquiring big data of various data sources by utilizing an ETL tool and preprocessing the big data to form preprocessed big data;
the big data infrastructure module is used for carrying out algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of the foundation environment;
and the service providing module is used for providing data visualization service for the basic environment big data through a resource scheduling interface.
The invention has the beneficial effects that: the data management system under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.
On the basis of the technical scheme, the invention can be further improved as follows.
Further, the data sources include databases, NoSql databases, text files, and unstructured databases.
Further, the pretreatment includes an extraction process, a cleaning process, and a conversion process.
Further, the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
Further, the data visualization service includes data management, platform management, and security management.
Drawings
FIG. 1 is a flow chart of a data management method in a big data environment according to the present invention;
FIG. 2 is a block diagram of a data management system in a big data environment according to the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1, a data management method in a big data environment includes the following steps,
s1, acquiring big data of multiple data sources by using an ETL tool, and preprocessing the big data to form preprocessed big data;
s2, performing algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of a basic environment;
and S3, providing data visualization service for the basic environment big data through a resource scheduling interface.
In this particular embodiment: the data sources include databases, NoSql databases, text files, and unstructured databases.
In this particular embodiment: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.
In this particular embodiment: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
In this particular embodiment: the data visualization service includes data management, platform management, and security management. The data management comprises metadata management, data access management, data extraction management and data table management; the platform management comprises cluster management, cluster monitoring and task scheduling management; the security management includes authentication center management and user management.
The data management method under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.
Based on the data management method under the big data environment, the invention also provides a data management system under the big data environment.
A data management system in big data environment comprises the following modules,
the acquisition preprocessing module is used for acquiring big data of various data sources by utilizing an ETL tool and preprocessing the big data to form preprocessed big data;
the big data infrastructure module is used for carrying out algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of the foundation environment;
and the service providing module is used for providing data visualization service for the basic environment big data through a resource scheduling interface.
In this particular embodiment: the data sources include databases, NoSql databases, text files, and unstructured databases.
In this particular embodiment: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.
In this particular embodiment: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
In this particular embodiment: the data visualization service includes data management, platform management, and security management. The data management comprises metadata management, data access management, data extraction management and data table management; the platform management comprises cluster management, cluster monitoring and task scheduling management; the security management includes authentication center management and user management.
The data management system under the big data environment provides a big data operation basic technical environment, and realizes multi-data source access, data acquisition and processing, data table management, file management, data exploration and model establishment; the method can support the storage of TB-level data and well support the explosive growth storage requirement of data volume; an application framework of access data, distributed computation, flow computation and memory computation of an ETL tool is provided, the mining capability of big data technology is provided, the algorithm modeling capability is provided, and multi-language fusion is supported; so that large data can be effectively managed.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (10)
1. A data management method under a big data environment is characterized in that: comprises the following steps of (a) carrying out,
s1, acquiring big data of multiple data sources by using an ETL tool, and preprocessing the big data to form preprocessed big data;
s2, performing algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of a basic environment;
and S3, providing data visualization service for the basic environment big data through a resource scheduling interface.
2. The data management method in big data environment according to claim 1, wherein: the data sources include databases, NoSql databases, text files, and unstructured databases.
3. The data management method in big data environment according to claim 1 or 2, characterized in that: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.
4. The data management method in big data environment according to claim 1 or 2, characterized in that: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
5. The data management method in big data environment according to claim 1 or 2, characterized in that: the data visualization service includes data management, platform management, and security management.
6. A data management system under big data environment, characterized by: comprises the following modules which are used for realizing the functions of the system,
the acquisition preprocessing module is used for acquiring big data of various data sources by utilizing an ETL tool and preprocessing the big data to form preprocessed big data;
the big data infrastructure module is used for carrying out algorithm analysis or/and data mining or/and data calculation or/and data storage processing on the preprocessed big data according to preset requirements to form big data of the foundation environment;
and the service providing module is used for providing data visualization service for the basic environment big data through a resource scheduling interface.
7. The data management system in big data environment according to claim 6, wherein: the data sources include databases, NoSql databases, text files, and unstructured databases.
8. The data management system in big data environment according to claim 6 or 7, characterized in that: the pretreatment comprises extraction treatment, cleaning treatment and conversion treatment.
9. The data management system in big data environment according to claim 6 or 7, characterized in that: the algorithm analysis comprises machine learning, algorithm modeling and deep learning; the data mining comprises Sql query, interactive query and search query; the data calculation comprises memory calculation, stream calculation and batch calculation; the types of data storage include memory storage, column storage, data warehouses, and distributed file systems.
10. The data management system in big data environment according to claim 6 or 7, characterized in that: the data visualization service includes data management, platform management, and security management.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910811160.7A CN110765173A (en) | 2019-08-30 | 2019-08-30 | Data management method and system under big data environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910811160.7A CN110765173A (en) | 2019-08-30 | 2019-08-30 | Data management method and system under big data environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110765173A true CN110765173A (en) | 2020-02-07 |
Family
ID=69329261
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910811160.7A Pending CN110765173A (en) | 2019-08-30 | 2019-08-30 | Data management method and system under big data environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110765173A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112597342A (en) * | 2020-12-15 | 2021-04-02 | 福建省星云大数据应用服务有限公司 | Environment-friendly data management method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9092502B1 (en) * | 2013-02-25 | 2015-07-28 | Leidos, Inc. | System and method for correlating cloud-based big data in real-time for intelligent analytics and multiple end uses |
CN105095653A (en) * | 2015-07-13 | 2015-11-25 | 湖南互动传媒有限公司 | Basic service system for medical large data application |
CN107361396A (en) * | 2017-07-10 | 2017-11-21 | 红云红河烟草(集团)有限责任公司 | Tobacco based on big data dries the prediction of silk moisture and control system |
CN107920126A (en) * | 2017-11-30 | 2018-04-17 | 河南云保遥感科技有限公司 | Big data management method between a kind of distributed space under cloud environment |
-
2019
- 2019-08-30 CN CN201910811160.7A patent/CN110765173A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9092502B1 (en) * | 2013-02-25 | 2015-07-28 | Leidos, Inc. | System and method for correlating cloud-based big data in real-time for intelligent analytics and multiple end uses |
CN105095653A (en) * | 2015-07-13 | 2015-11-25 | 湖南互动传媒有限公司 | Basic service system for medical large data application |
CN107361396A (en) * | 2017-07-10 | 2017-11-21 | 红云红河烟草(集团)有限责任公司 | Tobacco based on big data dries the prediction of silk moisture and control system |
CN107920126A (en) * | 2017-11-30 | 2018-04-17 | 河南云保遥感科技有限公司 | Big data management method between a kind of distributed space under cloud environment |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112597342A (en) * | 2020-12-15 | 2021-04-02 | 福建省星云大数据应用服务有限公司 | Environment-friendly data management method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yang et al. | A system architecture for manufacturing process analysis based on big data and process mining techniques | |
CN110750650A (en) | Construction method and device of enterprise knowledge graph | |
CN106709012A (en) | Method and device for analyzing big data | |
CN105608758A (en) | Big data analysis platform apparatus and method based on algorithm configuration and distributed stream computing | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
CN110990664A (en) | Big data operation management system | |
CN103186541A (en) | Generation method and device for mapping relationship | |
CN105556517A (en) | Smart search refinement | |
CN104317970A (en) | Data flow type processing method based on data processing center | |
CN106777142A (en) | Service layer's system and method based on mobile Internet mass data | |
CN104216966A (en) | Method supporting index creation in various modes | |
CN103970891A (en) | Method for inquiring user interest information based on context | |
CN114238388A (en) | Heterogeneous data collection and retrieval system based on multiple protocols | |
Kun et al. | Application of big data technology in scientific research data management of military enterprises | |
CN115237857A (en) | Log processing method and device, computer equipment and storage medium | |
Sundarakumar et al. | A heuristic approach to improve the data processing in big data using enhanced Salp Swarm algorithm (ESSA) and MK-means algorithm | |
CN104573074A (en) | High-speed calculating and analyzing method based on hospital data | |
CN111159152A (en) | Secondary operation and maintenance data fusion method based on big data processing technology | |
CN110765173A (en) | Data management method and system under big data environment | |
CN112288317B (en) | Industrial big data analysis platform and method based on multi-source heterogeneous data governance | |
CN111737490B (en) | Knowledge graph ontology model generation method and device based on banking channel | |
CN107423035B (en) | Product data management system in software development process | |
CN113254517A (en) | Service providing method based on internet big data | |
CN112650739A (en) | Data storage processing method and device for coal mine data middling station | |
CN107992590B (en) | Big data system beneficial to information comparison |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200207 |
|
RJ01 | Rejection of invention patent application after publication |