CN113626414A - Data dimension reduction and denoising method for high-dimensional data set - Google Patents

Data dimension reduction and denoising method for high-dimensional data set Download PDF

Info

Publication number
CN113626414A
CN113626414A CN202110990653.9A CN202110990653A CN113626414A CN 113626414 A CN113626414 A CN 113626414A CN 202110990653 A CN202110990653 A CN 202110990653A CN 113626414 A CN113626414 A CN 113626414A
Authority
CN
China
Prior art keywords
data
dimensional
cloud platform
unit
dimension reduction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110990653.9A
Other languages
Chinese (zh)
Inventor
夏晨益
徐建平
敖飞翔
姚保明
陈俊梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangxi Electric Power Co ltd Yingtan Power Supply Branch
State Grid Corp of China SGCC
Original Assignee
State Grid Jiangxi Electric Power Co ltd Yingtan Power Supply Branch
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangxi Electric Power Co ltd Yingtan Power Supply Branch, State Grid Corp of China SGCC filed Critical State Grid Jiangxi Electric Power Co ltd Yingtan Power Supply Branch
Priority to CN202110990653.9A priority Critical patent/CN113626414A/en
Publication of CN113626414A publication Critical patent/CN113626414A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/217Database tuning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a data dimension reduction and denoising method of a high-dimensional data set, which comprises the following steps of inputting the high-dimensional data set into a cloud platform through a data input module, transmitting data to a central processing unit by the cloud platform, and classifying the data set linearly and nonlinearly by the central processing unit. According to the invention, a high-dimensional data noise reduction and dimension reduction system is formed by the superposition denoising autoencoder, the one-dimensional convolutional neural network, the central processing unit and the result detection report module, so that the dimension reduction accuracy of high-dimensional data is greatly improved, a more objective and comprehensive dimension reduction mode is brought to the data, the dimension reduction data is more precise and more flexible, and the problems that the number of potential feature subspaces is exponentially increased along with the increase of input dimensions, so that a search space is also exponentially increased sharply, and meanwhile, a large number of irrelevant features in the data generate unnecessary data noise to influence the selection of the data subspace which can represent relevant attributes most are solved.

Description

Data dimension reduction and denoising method for high-dimensional data set
Technical Field
The invention relates to the technical field of data processing, in particular to a data dimension reduction and denoising method for a high-dimensional data set.
Background
With the increase of input dimension, the number of potential feature subspaces grows exponentially, so that the search space also sharply increases exponentially, meanwhile, a large number of irrelevant features in data generate unnecessary data noise and influence the selection of the data subspace which can represent the relevant attributes most, so that the real intrusion behavior is covered, the data stream has certain time sequence and is in a grouping form, and the multidimensional CNN cannot well process the features of the data stream.
Disclosure of Invention
In order to solve the problems in the background art, the present invention provides a method for reducing dimensions and denoising data of a high-dimensional data set, which has the advantages of improving the accuracy of reducing dimensions and reducing data noise, and solves the problems that as the input dimension increases, the number of potential feature subspaces increases exponentially, which causes the search space to increase exponentially, and meanwhile, a large number of irrelevant features in the data generate unnecessary data noise, which affects the selection of the data subspace most representing relevant attributes.
In order to achieve the purpose, the invention provides the following technical scheme: a data dimension reduction and denoising method for a high-dimensional data set comprises the following steps:
s1: inputting a high-dimensional data set into a cloud platform through a data input module, transmitting the data to a central processing unit by the cloud platform, classifying the data set linearly and nonlinearly by the central processing unit, and processing the data by using a superposition denoising autoencoder, a long-short term memory network and a principal component analysis method respectively;
s2: the superposition denoising autoencoder fully learns the data characteristics and eliminates part of irrelevant attributes, on the other hand, the dimensionality reduction is also carried out on the characteristic vector output by the OHE encoding, and finally, the data are mapped to a one-dimensional convolution neural network for display;
s3: and the data subjected to the dimensionality reduction enters a result detection report module to form a yield report, and the yield report is output to the cloud platform again for feedback.
Preferably, in the data dimension reduction processing process, the data dimension matching unit and the generation time matching unit which are included in the data log recording module can record the data according to the data dimension and the generation time and store the data in the storage unit, and a user can extract the past data through the cloud platform.
Preferably, the rule model processing module is internally provided with a data feature learning unit, and the data feature learning unit can learn the flow features of the rule model through a one-dimensional convolutional neural network.
Preferably, the output end of the data feature learning unit is electrically connected with a monte carlo search tree, and the data feature learning unit can perform record search on the learning data through the monte carlo search tree.
Preferably, the result enhancement unit is arranged in the result detection report module, and can enhance the dimension reduction data mapped in the one-dimensional convolutional neural network, so that the data display accuracy is improved.
Preferably, the result detection report module, the central processing unit and the storage unit are in data communication with the cloud platform through the wireless transmission unit.
Preferably, in the process of transmitting the data, the cloud platform encrypts the data through the encryption unit.
Preferably, the cloud platform can select the dimensionality of the high-dimensional data set through the pre-matching unit when the high-dimensional data set is received.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention forms a high-dimensional data noise reduction and dimension reduction system by the superposition denoising autoencoder, the one-dimensional convolutional neural network, the central processing unit and the result detection report module, can greatly improve the dimension reduction accuracy of the high-dimensional data, brings a more objective and comprehensive dimension reduction mode for the data, enables the dimension reduction data to be more precise and more flexible, reduces the repetitive workload generated by data noise, shortens the dimension reduction period of the data, and solves the problems that the number of potential feature subspaces is exponentially increased along with the increase of input dimension, so that the search space is also exponentially increased, and simultaneously a large number of irrelevant features in the data can generate unnecessary data noise to influence the selection of the data subspace which can represent relevant attributes most.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, the method for reducing and denoising data of a high-dimensional data set provided by the present invention includes the following steps:
s1: inputting a high-dimensional data set into a cloud platform through a data input module, transmitting the data to a central processing unit by the cloud platform, classifying the data set linearly and nonlinearly by the central processing unit, processing the data by the central processing unit through a superposition denoising self-encoder, and processing the data respectively through a superposition denoising self-encoder, a long-short term memory network and a principal component analysis method;
s2: the superposition denoising autoencoder fully learns the data characteristics and eliminates part of irrelevant attributes, on the other hand, the dimensionality reduction is also carried out on the characteristic vector output by the OHE encoding, and finally, the data are mapped to a one-dimensional convolution neural network for display;
s3: and the data subjected to the dimensionality reduction enters a result detection report module to form a yield report, and the yield report is output to the cloud platform again for feedback.
The data dimension matching unit and the generation time matching unit which are contained in the data log recording module can record the data according to the data dimension and the generation time and store the data into the storage unit in the data dimension reduction processing process, and a user can extract the past data through the cloud-end platform.
The invention is further set that a data characteristic learning unit is arranged in the rule model processing module, and the data characteristic learning unit can learn the flow characteristics of the rule model through a one-dimensional convolution neural network.
The invention is further set that the output end of the data characteristic learning unit is electrically connected with a Monte Carlo search tree, and the data characteristic learning unit can perform record search on the learning data through the Monte Carlo search tree.
The invention is further set that a result enhancement unit is arranged in the result detection report module, and the result enhancement unit can enhance the dimension reduction data mapped in the one-dimensional convolutional neural network, thereby improving the data display accuracy.
The invention is further set that the result detection report module, the central processing unit and the storage unit are in data communication with the cloud platform through the wireless transmission unit.
The invention is further configured such that the cloud platform encrypts the data through the encryption unit in the process of transmitting the data.
The invention is further set that the cloud platform can select the dimensionality of the high-dimensional data set through the pre-matching unit when receiving the high-dimensional data set.
The working principle and the using process of the invention are as follows: the high-dimensional data set is input into the cloud platform through the data input module, the cloud platform can select the dimensionality through the pre-matching unit when receiving the high-dimensional data set, then the data are transmitted to the central processing unit, the central processing unit judges the data type through the data type judging module and respectively transmits the data type to the interiors of the nonlinear processing module and the linear processing module according to the type, the data are processed through the superposition denoising self-encoder, the superposition denoising self-encoder fully learns the data characteristics and eliminates partial irrelevant attributes, on the other hand, the dimensionality of a characteristic vector output by the OHE code is reduced, a long-short term memory network is added into the nonlinear dimensionality reduction module, and in order to guarantee the integrity and the usability of important characteristics and data association in the denoising process, the long-short term memory network is added. The memory process of the time sequence is controlled by using a long-short term memory network, the time dependency is extracted, the robustness of the dimensionality reduction process is favorably enhanced, principal component analysis is added into a linear dimensionality reduction module, the principal component analysis method is a classic linear dimensionality reduction mode, and has the characteristics of lightness, rapidness and effectiveness, finally, data are mapped to a one-dimensional convolutional neural network for display, a result enhancement unit enhances dimensionality reduction data mapped inside the one-dimensional convolutional neural network, the dimensionality reduction data enter a result detection report module to form an output report and are output to a cloud platform again for feedback, in the data dimension reduction processing process, the data dimension matching unit and the generation time matching unit which are contained in the data log recording module can record data according to data dimensions and generation time and store the data to the storage unit, and a user can extract past data through the cloud platform.
In summary, the following steps: the data dimension reduction and denoising method of the high-dimensional data set comprises a superposition denoising autoencoder, a one-dimensional convolution neural network, a central processing unit and a result detection report module to form a high-dimensional data denoising dimension reduction system, the dimension reduction accuracy of the high-dimensional data can be greatly improved, a more objective and comprehensive dimension reduction mode is brought to the data, the dimension reduction data is more precise and more flexible, the repetitive workload generated by the data noise is reduced, the data dimension reduction period is shortened, the problem that along with the increase of input dimensions, the number of potential feature subspaces is exponentially increased, the search space is also exponentially increased, meanwhile, a large number of irrelevant features in the data generate unnecessary data noise, and the problem of data subspace selection representing relevant attributes most is influenced is solved.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (8)

1. A data dimension reduction and denoising method for a high-dimensional data set is characterized by comprising the following steps: the method comprises the following steps:
s1: inputting a high-dimensional data set into a cloud platform through a data input module, transmitting the data to a central processing unit by the cloud platform, classifying the data set linearly and nonlinearly by the central processing unit, and processing the data by using a superposition denoising autoencoder, a long-short term memory network and a principal component analysis method respectively;
s2: the superposition denoising autoencoder fully learns the data characteristics and eliminates part of irrelevant attributes, on the other hand, the dimensionality reduction is also carried out on the characteristic vector output by the OHE encoding, and finally, the data are mapped to a one-dimensional convolution neural network for display;
s3: and the data subjected to the dimensionality reduction enters a result detection report module to form a yield report, and the yield report is output to the cloud platform again for feedback.
2. The method of claim 1, wherein the method comprises: in the data dimension reduction processing process, the data dimension matching unit and the generation time matching unit which are contained in the data log recording module can record data according to data dimensions and generation time and store the data to the storage unit, and a user can extract past data through the cloud platform.
3. The method of claim 1, wherein the method comprises: and a data characteristic learning unit is arranged in the rule model processing module and can learn the flow characteristics of the rule model processing module through a one-dimensional convolutional neural network.
4. The method of claim 3, wherein the method comprises: the output end of the data feature learning unit is electrically connected with a Monte Carlo search tree, and the data feature learning unit can perform record search on learning data through the Monte Carlo search tree.
5. The method of claim 1, wherein the method comprises: the result enhancement unit is arranged in the result detection report module and can enhance the dimension reduction data mapped in the one-dimensional convolutional neural network, and the data display accuracy is improved.
6. The method of claim 2, wherein the method comprises: and the result detection report module, the central processing unit and the storage unit are in data communication with the cloud platform through the wireless transmission unit.
7. The method of claim 1, wherein the method comprises: the cloud platform encrypts the data through the encryption unit in the process of transmitting the data.
8. The method of claim 1, wherein the method comprises: the cloud platform can select the dimensionality of the high-dimensional data set through the pre-matching unit when receiving the high-dimensional data set.
CN202110990653.9A 2021-08-26 2021-08-26 Data dimension reduction and denoising method for high-dimensional data set Pending CN113626414A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110990653.9A CN113626414A (en) 2021-08-26 2021-08-26 Data dimension reduction and denoising method for high-dimensional data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110990653.9A CN113626414A (en) 2021-08-26 2021-08-26 Data dimension reduction and denoising method for high-dimensional data set

Publications (1)

Publication Number Publication Date
CN113626414A true CN113626414A (en) 2021-11-09

Family

ID=78387922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110990653.9A Pending CN113626414A (en) 2021-08-26 2021-08-26 Data dimension reduction and denoising method for high-dimensional data set

Country Status (1)

Country Link
CN (1) CN113626414A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115756030A (en) * 2022-09-26 2023-03-07 杭州湛川智能技术有限公司 Monitoring system for BIM data center machine room management and use method thereof

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850925A (en) * 2014-02-17 2015-08-19 北京索为高科***技术有限公司 Integrated management system for process data
CN108173704A (en) * 2017-11-24 2018-06-15 中国科学院声学研究所 A kind of method and device of the net flow assorted based on representative learning
US20180253640A1 (en) * 2017-03-01 2018-09-06 Stc.Unm Hybrid architecture system and method for high-dimensional sequence processing
CN110164128A (en) * 2019-04-23 2019-08-23 银江股份有限公司 A kind of City-level intelligent transportation analogue system
WO2020168796A1 (en) * 2019-02-19 2020-08-27 深圳先进技术研究院 Data augmentation method based on high-dimensional spatial sampling
CN111860552A (en) * 2019-04-28 2020-10-30 中国科学院计算机网络信息中心 Model training method and device based on nuclear self-encoder and storage medium
CN111950651A (en) * 2020-08-21 2020-11-17 中国科学院计算机网络信息中心 High-dimensional data processing method and device
CN112966753A (en) * 2021-03-09 2021-06-15 哈尔滨工业大学 Automatic data dimension reduction method based on entropy stability constraint

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850925A (en) * 2014-02-17 2015-08-19 北京索为高科***技术有限公司 Integrated management system for process data
US20180253640A1 (en) * 2017-03-01 2018-09-06 Stc.Unm Hybrid architecture system and method for high-dimensional sequence processing
CN108173704A (en) * 2017-11-24 2018-06-15 中国科学院声学研究所 A kind of method and device of the net flow assorted based on representative learning
WO2020168796A1 (en) * 2019-02-19 2020-08-27 深圳先进技术研究院 Data augmentation method based on high-dimensional spatial sampling
CN110164128A (en) * 2019-04-23 2019-08-23 银江股份有限公司 A kind of City-level intelligent transportation analogue system
CN111860552A (en) * 2019-04-28 2020-10-30 中国科学院计算机网络信息中心 Model training method and device based on nuclear self-encoder and storage medium
CN111950651A (en) * 2020-08-21 2020-11-17 中国科学院计算机网络信息中心 High-dimensional data processing method and device
CN112966753A (en) * 2021-03-09 2021-06-15 哈尔滨工业大学 Automatic data dimension reduction method based on entropy stability constraint

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李小剑,谢晓尧,徐洋,张思聪: "基于CNN⁃SIndRNN的恶意TLS流量快速识别方法", 计算机工程/基于CNN⁃SINDRNN的恶意TLS流量快速识别方法, pages 148 - 154 *
杭梦鑫,陈伟,张仁杰: "基于改进的一维卷积神经网络的异常流量检测", 计算机应用/基于改进的一维卷积神经网络的异常流量检测, pages 433 - 440 *
武频,孙俊五,封卫兵: "基于自编码器和LSTM的模型降阶方法", 空气动力学学报/基于自编码器和LSTM的模型降阶方法, vol. 39, no. 1, pages 73 - 81 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115756030A (en) * 2022-09-26 2023-03-07 杭州湛川智能技术有限公司 Monitoring system for BIM data center machine room management and use method thereof

Similar Documents

Publication Publication Date Title
CN110046304B (en) User recommendation method and device
CN109741112B (en) User purchase intention prediction method based on mobile big data
US20220044133A1 (en) Detection of anomalous data using machine learning
CN111368096A (en) Knowledge graph-based information analysis method, device, equipment and storage medium
CN110347786B (en) Semantic model tuning method and system
US20170034111A1 (en) Method and Apparatus for Determining Key Social Information
CN116843400A (en) Block chain carbon emission transaction anomaly detection method and device based on graph representation learning
CN111598712B (en) Training and searching method for data feature generator in social media cross-modal search
CN110472659B (en) Data processing method, device, computer readable storage medium and computer equipment
CN115617614A (en) Log sequence anomaly detection method based on time interval perception self-attention mechanism
CN116049650A (en) RFSFD-T network-based radio frequency signal fingerprint identification method and system
CN113626414A (en) Data dimension reduction and denoising method for high-dimensional data set
Shou et al. MOOC Dropout Prediction Based on Multidimensional Time‐Series Data
CN115098789A (en) Neural network-based multi-dimensional interest fusion recommendation method and device and related equipment
CN116976505A (en) Click rate prediction method of decoupling attention network based on information sharing
CN109902273B (en) Modeling method and device for keyword generation model
CN116956289B (en) Method for dynamically adjusting potential blacklist and blacklist
CN116306780B (en) Dynamic graph link generation method
CN115827878B (en) Sentence emotion analysis method, sentence emotion analysis device and sentence emotion analysis equipment
CN116186708A (en) Class identification model generation method, device, computer equipment and storage medium
CN113259369B (en) Data set authentication method and system based on machine learning member inference attack
Bakery et al. A new double truncated generalized gamma model with some applications
CN114257565B (en) Method, system and server for mining potential threat domain names
CN114550157A (en) Bullet screen gathering identification method and device
CN112131570A (en) PCA-based password hard code detection method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Xia Chenyi

Inventor after: Xu Jianping

Inventor after: Ao Feixiang

Inventor after: Yao Baoming

Inventor after: Chen Junmei

Inventor after: Yan Juan

Inventor after: Ning Peng

Inventor after: Liu Tianhua

Inventor before: Xia Chenyi

Inventor before: Xu Jianping

Inventor before: Ao Feixiang

Inventor before: Yao Baoming

Inventor before: Chen Junmei