WO2024060543A1 - Real-time data processing method and system - Google Patents

Real-time data processing method and system Download PDF

Info

Publication number
WO2024060543A1
WO2024060543A1 PCT/CN2023/082711 CN2023082711W WO2024060543A1 WO 2024060543 A1 WO2024060543 A1 WO 2024060543A1 CN 2023082711 W CN2023082711 W CN 2023082711W WO 2024060543 A1 WO2024060543 A1 WO 2024060543A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
real
time data
time
analysis
Prior art date
Application number
PCT/CN2023/082711
Other languages
French (fr)
Chinese (zh)
Inventor
闫荣新
Original Assignee
河北网新科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 河北网新科技集团股份有限公司 filed Critical 河北网新科技集团股份有限公司
Publication of WO2024060543A1 publication Critical patent/WO2024060543A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Definitions

  • the present invention relates to the technical field of data processing, specifically a real-time data processing method and system.
  • Real-time data processing refers to the process of computer collection and processing of field data in the actual time it occurs.
  • the real-time database must first provide high-speed data collection and data processing in order to adapt to different integrated systems.
  • a real-time data processing method including the following steps:
  • Step 1 The system directly or indirectly collects real-time data from the distributed control system and determines the real-time processing object;
  • Step 2 Import the front-end data objects into a centralized large-scale distributed database or distributed storage cluster. After some simple cleaning and preprocessing of the real-time data, use Twitter's Storm to perform streaming computing on the data.
  • Step 3 Perform common analysis, classification and summary on a large amount of real-time data stored in a distributed database or distributed computing cluster after preprocessing;
  • Step 4 Use data fusion to automatically detect, associate, correlate, estimate and combine the real-time data after data statistical analysis.
  • Data fusion fuses the information received by different sensors to obtain the target status or target characteristics. Determination of symptoms;
  • Step 5 The collected and processed real-time data is stored again to facilitate people's query and analysis operations;
  • Step 6 After mining, calculation, analysis or data fusion and storage, the data is output and provided to the client.
  • a real-time data processing system including:
  • Data collection is used to collect real-time data from external sensors and input devices into the computer network and into the distributed control system, and summarize the real-time data;
  • Import preprocessing which is used to preprocess and stream the collected real-time data into a distributed database or distributed storage cluster
  • Statistical analysis is used for general analysis and classification of preprocessed real-time data in distributed databases or distributed storage clusters
  • Data fusion is used for automatic detection, association, correlation, estimation and combination processing after real-time data statistical analysis to form required target features or judgment of target features;
  • Data storage is used to store the fused real-time data again to facilitate people's query and analysis operations
  • Data mining which is used to perform advanced prediction calculations based on various algorithms on real-time data after fusion processing
  • Data output is used for client-facing output after data storage or data mining processing.
  • the data collection method is divided into direct data collection and indirect data collection.
  • the direct data collection is to receive real-time data collected from the distributed control system.
  • the indirect data collection means that the data collection computer does not directly communicate with the system. Instead of communicating with the on-site distributed control system, a host computer is placed on the distributed control system. The host computer collects real-time data through the interface provided by the distributed control system. The data collection computer outside the site communicates with the host computer to obtain all the data. Real-time data required.
  • the distributed control system in the direct data collection can adopt standard ODBC open database interconnection, DDE dynamic data exchange, OLE object linking and embedding, and the distributed control system can be connected to the computer's internal collection program through the network for real-time data collection.
  • the host computer is hung on the control grid of the distributed control system through a network card and communicates with the data interface of the on-site distributed control system.
  • the host computer generally has two processes for the collected real-time data. Methods: First, real-time data is placed on the local hard disk in the form of a database, spreadsheet or text file, and is processed remotely The data collection computer takes away the data regularly, and the other is that the host computer actively sends the collected real-time data to the data collection computer regularly.
  • the statistical analysis uses EMC's GreenPlum, Oracle's Exadata, and MySQL-based column storage Infobright to analyze real-time data.
  • the data fusion algorithm types are divided into real-time data fusion algorithms with feedback and weighted filter real-time data fusion algorithms.
  • the real-time data fusion algorithm with feedback solves the current real-time requirements in the fusion process.
  • This algorithm mainly emphasizes the need for real-time adaptive classification of different categories of data, and quickly fuses and transmits emergency data to users.
  • the weighted filtering real-time data fusion algorithm uses the support function matrix between data to weight multiple sets of data. Fusion, the fusion result is replaced by the filter value for Kalman filtering, thereby achieving real-time dynamic fusion of multiple sets of measurement data.
  • the data storage often uses NAND FLASH storage for real-time data storage
  • the data mining often uses K-Means for clustering, SVM for statistical learning, and NaiveBayes for classification to perform real-time data analysis, calculation and prediction.
  • the invention provides a real-time data processing method and system. It has the following beneficial effects:
  • Rapid pre-processing of real-time data can be performed through data collection, import pre-processing, and statistical analysis.
  • Data fusion, data storage, and data output can meet the needs of in-depth analysis and classification of real-time data, as well as rapid call and intuitive display of real-time data.
  • Data fusion, data mining , Data output can enable the real-time data to perform calculation, mining, analysis and prediction analysis based on big data after meeting the normal collection, analysis and display, which facilitates the analysis and calculation of the principles and background of the real-time data and outputs the display to the client. , increasing the reliability and scientificity of real-time data analysis and processing.
  • Figure 1 is a flow chart of the real-time data processing method of the present invention
  • Figure 2 is a composition diagram of the real-time data processing system of the present invention.
  • Figure 3 is a composition diagram of the data collection method of the present invention.
  • Figure 4 is a composition diagram of the data fusion algorithm type of the present invention.
  • Step 1 The system directly or indirectly collects real-time data from the distributed control system and determines the real-time processing object;
  • Step 2 Import the front-end data objects into a centralized large-scale distributed database or distributed storage cluster, and at the same time perform some simple cleaning and preprocessing of the real-time data and then use Twitter's Storm to perform streaming calculations on the data;
  • Step 3 Perform common analysis, classification and summary on a large amount of real-time data stored in a distributed database or distributed computing cluster after preprocessing;
  • Step 4 Use data fusion to automatically detect, associate, correlate, estimate and combine the real-time data after data statistical analysis.
  • Data fusion fuses the information received by different sensors to obtain a determination of the target status or target characteristics;
  • Step 5 The collected and processed real-time data is stored again to facilitate people's query and analysis operations;
  • Step 6 After mining, calculation, analysis or data fusion and storage, the data is output and provided to the client.
  • a real-time data processing system including:
  • Data collection is used to collect real-time data from external sensors and input devices into the computer network and into the distributed control system, and summarize the real-time data;
  • Import preprocessing which is used to preprocess and stream the collected real-time data into a distributed database or distributed storage cluster
  • Statistical analysis is used for general analysis and classification of preprocessed real-time data in distributed databases or distributed storage clusters
  • Data fusion is used for automatic detection, association, correlation, estimation and combination processing after real-time data statistical analysis to form required target features or judgment of target features;
  • Data storage is used to store the fused real-time data again to facilitate people's query and analysis operations
  • Data mining is used to perform advanced predictive calculations based on various algorithms on fused real-time data
  • Data output is used for client-facing output after data storage or data mining processing.
  • Direct data collection is to receive real-time data collected from the distributed control system.
  • Indirect data collection means that the data collection computer does not directly communicate with the on-site distributed control system, but A host computer is placed on top of the distributed control system. The host computer collects real-time data through the interface provided by the distributed control system. The data collection computer outside the site communicates with the host computer to obtain the required real-time data.
  • the distributed control system can use standard ODBC open database interconnection, DDE dynamic data exchange, and OLE object linking and embedding.
  • the distributed control system can be connected to the computer's internal collection program through the network for real-time data collection.
  • the host computer In indirect data collection, the host computer is hung on the control grid of the distributed control system through a network card and communicates with the data interface of the on-site distributed control system.
  • the host computer generally has two processing methods for the collected real-time data. One is real-time
  • the data is placed in the local hard disk in the form of a database, spreadsheet or text file, and the remote data collection computer regularly removes the data.
  • the other is that the host computer actively sends the collected real-time data to the data collection computer at regular intervals.
  • the types of data fusion algorithms are divided into real-time data fusion algorithms with feedback and weighted filter real-time data fusion algorithms.
  • the real-time data fusion algorithm with feedback solves the current real-time requirements in the fusion process. This algorithm mainly emphasizes the need for real-time adaptive grading of different categories of data, and quickly fuses and transmits emergency data to users.
  • the weighted filtering real-time data fusion algorithm uses the support function matrix between data to perform weighted fusion of multiple sets of data. The fusion result is replaced by the filter value for Kalman filtering, thereby realizing real-time dynamic fusion of multiple sets of measurement data.
  • Data storage commonly uses NAND FLASH storage for real-time data storage.
  • Data mining often uses K-Means for clustering, SVM for statistical learning, and NaiveBayes for classification to analyze, calculate and predict real-time data.
  • Rapid pre-processing of real-time data can be performed through data collection, import pre-processing, and statistical analysis.
  • Data fusion, data storage, and data output can meet the needs of in-depth analysis and classification of real-time data, as well as rapid call and intuitive display of real-time data.
  • Data fusion, data mining , Data output can enable the real-time data to perform calculation, mining, analysis and prediction analysis based on big data after meeting the normal collection, analysis and display, which facilitates the analysis and calculation of the principles and background of the real-time data and outputs the display to the client. , increasing the reliability and scientificity of real-time data analysis and processing.
  • a real-time data processing method includes the following steps:
  • Step 1 The system directly or indirectly collects real-time data from the distributed control system and determines the real-time processing object;
  • Step 2 Import the front-end data objects into a centralized large-scale distributed database or distributed storage cluster, and at the same time perform some simple cleaning and preprocessing of the real-time data and then use Twitter's Storm to perform streaming calculations on the data;
  • Step 3 Perform common analysis, classification and summary on a large amount of real-time data stored in a distributed database or distributed computing cluster after preprocessing;
  • Step 4 Use data fusion to automatically detect, associate, correlate, estimate and combine the real-time data after data statistical analysis.
  • Data fusion fuses the information received by different sensors to obtain a determination of the target status or target characteristics;
  • Step 5 The collected and processed real-time data is stored again to facilitate people's query and analysis operations;
  • Step 6 After mining, calculation, analysis or data fusion and storage, the data is output and provided to the client.
  • a real-time data processing system including:
  • Data collection is used to collect real-time data from external sensors and input devices into the computer network and into the distributed control system, and summarize the real-time data;
  • Import preprocessing which is used to preprocess and stream the collected real-time data into a distributed database or distributed storage cluster
  • Statistical analysis is used for general analysis and classification of preprocessed real-time data in distributed databases or distributed storage clusters
  • Data fusion is used to automatically detect, associate, correlate, estimate and combine real-time data after statistical analysis to fuse them into the required target features or judgment of target features;
  • Data storage is used to store the fused real-time data again to facilitate people's query and analysis operations
  • Data mining is used to perform advanced predictive calculations based on various algorithms on fused real-time data
  • Data output is used for client-facing output after data storage or data mining processing.
  • Direct data collection is to receive real-time data collected from the distributed control system.
  • Indirect data collection means that the data collection computer does not directly communicate with the on-site distributed control system, but A host computer is placed on top of the distributed control system. The host computer collects real-time data through the interface provided by the distributed control system. The data collection computer outside the site communicates with the host computer to obtain the required real-time data.
  • distributed control systems can use standard ODBC open database interconnection, DDE dynamic data exchange, and OLE object linking and embedding.
  • Distributed control systems can connect to the computer's internal acquisition program through a network to perform real-time data acquisition.
  • the host computer In indirect data collection, the host computer is hung on the control grid of the distributed control system through a network card and communicates with the data interface of the on-site distributed control system.
  • the host computer generally has two processing methods for the collected real-time data. One is real-time
  • the data is placed in the local hard disk in the form of a database, spreadsheet or text file, and the remote data collection computer regularly removes the data.
  • the other is that the host computer actively sends the collected real-time data to the data collection computer at regular intervals.
  • the types of data fusion algorithms are divided into real-time data fusion algorithms with feedback and weighted filter real-time data fusion algorithms.
  • the real-time data fusion algorithm with feedback solves the current real-time requirements in the fusion process. This algorithm mainly emphasizes the need for real-time adaptive grading of different categories of data, and quickly fuses and transmits emergency data to users.
  • the weighted filtering real-time data fusion algorithm uses the support function matrix between data to perform weighted fusion of multiple sets of data. The fusion result is replaced by the filter value for Kalman filtering, thereby realizing real-time dynamic fusion of multiple sets of measurement data.
  • Data storage commonly uses NAND FLASH storage for real-time data storage.
  • Data mining often uses K-Means for clustering, SVM for statistical learning, and NaiveBayes for classification to analyze, calculate and predict real-time data.
  • Rapid pre-processing of real-time data can be performed through data collection, import pre-processing, and statistical analysis.
  • Data fusion, data storage, and data output can meet the needs of in-depth analysis and classification of real-time data, as well as rapid call and intuitive display of real-time data.
  • Data fusion, data mining , Data output can enable the real-time data to perform calculation, mining, analysis and prediction analysis based on big data after meeting the normal collection, analysis and display, which facilitates the analysis and calculation of the principles and background of the real-time data and outputs the display to the client. , increasing the reliability and scientificity of real-time data analysis and processing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to the technical field of data processing, and provides a real-time data processing method, comprising the following steps: Step 1, a system directly or indirectly determining a real-time processing object from real-time data acquired in a distributed control system; and Step 2, importing a front-end data object into a centralized large-scale distributed database or a distributed storage cluster, and performing stream-oriented computation on real-time data by using Storm of Twitter after some simple cleaning and preprocessing operations are performed on the real-time data. Rapid early-stage processing of real-time data can be carried out by means of data acquisition, import preprocessing and statistical analysis. Data fusion, data mining and data output can enable the real-time data to meet normal acquisition, analysis and display, and then computation mining analysis and prediction analysis based on big data can be carried out on the real-time data; and real-time data generation principles and backgrounds are analyzed and computed, and then are output and displayed to a client, so that the reliability and scientificity of real-time data analysis and processing are improved.

Description

一种实时数据处理方法和***A real-time data processing method and system 技术领域Technical field
本发明涉及数据处理的技术领域,具体为一种实时数据处理方法和***。The present invention relates to the technical field of data processing, specifically a real-time data processing method and system.
背景技术Background technique
实时数据处理是指计算机对现场数据在其发生的实际时间内进行收集和处理的过程。在实时数据处理的过程中,实时数据库首先要提供高速的数据采集和数据处理,为了适应不同的集成***。Real-time data processing refers to the process of computer collection and processing of field data in the actual time it occurs. In the process of real-time data processing, the real-time database must first provide high-speed data collection and data processing in order to adapt to different integrated systems.
目前的实时数据处理方法和***只能简单对实时数据进行采集、分析、存储后输出到客户端,并不能对实时数据进行深层次的挖掘分析、预测分析,也不能了解实时数据发生的原因,尤其对于大数据背景下传统实时数据处理方法和***不能满足处理要求。Current real-time data processing methods and systems can only simply collect, analyze, and store real-time data and then output it to the client. They cannot conduct in-depth mining analysis and predictive analysis of real-time data, nor can they understand the reasons for the occurrence of real-time data. Especially in the context of big data, traditional real-time data processing methods and systems cannot meet the processing requirements.
技术问题technical problem
为解决上述目前的实时数据处理方法和***只能简单对实时数据进行采集、分析、存储后输出到客户端,并不能对实时数据进行深层次的挖掘分析、预测分析,也不能了解实时数据发生的原因,尤其对于大数据背景下传统实时数据处理方法和***不能满足处理要求的问题,实现以上实时数据的深层次的挖掘分析、预测分析的目的。In order to solve the above problem, current real-time data processing methods and systems can only simply collect, analyze, store and output real-time data to the client. They cannot conduct in-depth mining analysis and predictive analysis of real-time data, nor can they understand the occurrence of real-time data. The reason is, especially for the problem that traditional real-time data processing methods and systems cannot meet the processing requirements in the context of big data, to achieve the purpose of in-depth mining analysis and predictive analysis of the above real-time data.
技术解决方案Technical solutions
本发明通过以下技术方案予以实现:一种实时数据处理方法,包括以下步骤:The present invention is realized through the following technical solutions: a real-time data processing method, including the following steps:
步骤一、***直接或间接从分布式控制***中采集到的实时数据,确定实时处理对象;Step 1. The system directly or indirectly collects real-time data from the distributed control system and determines the real-time processing object;
步骤二、前端数据对象导入一个集中的大型分布式数据库或者分布式存储集群,同时对实时数据一些简单的清洗和预处理工作后使用Twitter的Storm来对数据进行流式计算;Step 2: Import the front-end data objects into a centralized large-scale distributed database or distributed storage cluster. After some simple cleaning and preprocessing of the real-time data, use Twitter's Storm to perform streaming computing on the data.
步骤三、对预处理后存储在分布式数据库或者分布式计算集群内部大量实时数据进行普通的分析和分类汇总;Step 3: Perform common analysis, classification and summary on a large amount of real-time data stored in a distributed database or distributed computing cluster after preprocessing;
步骤四、采用数据融合对数据统计分析后实时数据,进行自动检测、关联、相关、估计及组合处理,数据融合将不同传感器接收的信息经过融合得到对目标状态或目标特 征的判定;Step 4: Use data fusion to automatically detect, associate, correlate, estimate and combine the real-time data after data statistical analysis. Data fusion fuses the information received by different sensors to obtain the target status or target characteristics. Determination of symptoms;
步骤五、经过采集、处理后的实时数据再次进行存储以方便人们对其进行查询、分析操作;Step 5: The collected and processed real-time data is stored again to facilitate people's query and analysis operations;
或对融合处理后实时数据在现有数据上面进行基于各种算法的挖掘计算,从而起到预测的效果,从而实现一些高级别数据分析的需求,满足大数据的实时分析;Or perform mining calculations based on various algorithms on the existing data after fusion processing, so as to achieve prediction effects, thereby realizing some high-level data analysis needs and meeting the real-time analysis of big data;
步骤六、经过挖掘计算分析或数据融合存储后的数据进行数据输出提供给客户端。Step 6: After mining, calculation, analysis or data fusion and storage, the data is output and provided to the client.
一种实时数据处理***,包括:A real-time data processing system including:
数据采集,用于收集外界传感器、输入设备导入计算机网络进入分布式控制***中的实时数据,汇总实时数据;Data collection is used to collect real-time data from external sensors and input devices into the computer network and into the distributed control system, and summarize the real-time data;
导入预处理,用于对采集的实时数据导入型分布式数据库或者分布式存储集群进行预处理和流式计算;Import preprocessing, which is used to preprocess and stream the collected real-time data into a distributed database or distributed storage cluster;
统计分析,用于对预处理后在分布式数据库或者分布式存储集群的实时数据进行普通的分析和分类;Statistical analysis is used for general analysis and classification of preprocessed real-time data in distributed databases or distributed storage clusters;
数据融合,用于实时数据统计分析后进行自动检测、关联、相关、估计及组合处理融合成需要的目标特征或对目标特征的判断;Data fusion is used for automatic detection, association, correlation, estimation and combination processing after real-time data statistical analysis to form required target features or judgment of target features;
数据存储,用于对融合处理后的实时数据再次进行存储以方便人们对其进行查询、分析操作;Data storage is used to store the fused real-time data again to facilitate people's query and analysis operations;
数据挖掘,用于对融合处理后的实时数据进行基于各种算法的高级预测计算;Data mining, which is used to perform advanced prediction calculations based on various algorithms on real-time data after fusion processing;
数据输出,用于数据存储或数据挖掘处理后面向客户端的输出。Data output is used for client-facing output after data storage or data mining processing.
进一步的,所述数据采集方法分为直接数据采集和间接数据采集,所述直接数据采集是接从分布式控制***中采集到的实时数据,所述间接数据采集是指数据采集计算机不直接与现场分布式控制***通信,而是在分布式控制***之上放置一台上位机,上位机通过分布式控制***提供的接口采集实时数据,现场之外的数据采集计算机与上位机通信,取得所需要的实时数据。Further, the data collection method is divided into direct data collection and indirect data collection. The direct data collection is to receive real-time data collected from the distributed control system. The indirect data collection means that the data collection computer does not directly communicate with the system. Instead of communicating with the on-site distributed control system, a host computer is placed on the distributed control system. The host computer collects real-time data through the interface provided by the distributed control system. The data collection computer outside the site communicates with the host computer to obtain all the data. Real-time data required.
进一步的,所述直接数据采集中分布式控制***可以采用标准的ODBC开放式数据库互连、DDE动态数据交换、OLE对象链接与嵌入,分布式控制***可以与计算机内部采集程序通过网络连接进行实时数据采集。Furthermore, the distributed control system in the direct data collection can adopt standard ODBC open database interconnection, DDE dynamic data exchange, OLE object linking and embedding, and the distributed control system can be connected to the computer's internal collection program through the network for real-time data collection.
进一步的,所述间接数据采集中上位机通过网卡挂在分布式控制***的控制网格上,与现场分布式控制***的数据接口通信,上位机对所采集上来的实时数据一般有两种处理方式,一是实时数据以数据库、电子表格或文本文件方式放在本地硬盘中,由远程的 数据采集计算机定时将数据取走,另一种是上位机定时将采集的实时数据主动发送到数据采集计算机。Furthermore, in the indirect data collection, the host computer is hung on the control grid of the distributed control system through a network card and communicates with the data interface of the on-site distributed control system. The host computer generally has two processes for the collected real-time data. Methods: First, real-time data is placed on the local hard disk in the form of a database, spreadsheet or text file, and is processed remotely The data collection computer takes away the data regularly, and the other is that the host computer actively sends the collected real-time data to the data collection computer regularly.
进一步的,所述统计分析采用EMC的GreenPlum、Oracle的Exadata、基于MySQL的列式存储Infobright对实时数据进行分析。Furthermore, the statistical analysis uses EMC's GreenPlum, Oracle's Exadata, and MySQL-based column storage Infobright to analyze real-time data.
进一步的,所述数据融合算法类型分为带反馈的实时数据融合算法和加权滤波实时数据融合算法,所述带反馈的实时数据融合算法是解决目前在融合过程中的实时性要求。该算法主要强调对于不同类别的数据需要进行实时的自适应分级,将紧急数据迅速融合并传输给用户,所述加权滤波实时数据融合算法是利用数据间支持度函数矩阵,进行多组数据的加权融合,将融合结果替代滤波值进行卡尔曼滤波,从而实现多组测量数据的实时动态融合数据。Furthermore, the data fusion algorithm types are divided into real-time data fusion algorithms with feedback and weighted filter real-time data fusion algorithms. The real-time data fusion algorithm with feedback solves the current real-time requirements in the fusion process. This algorithm mainly emphasizes the need for real-time adaptive classification of different categories of data, and quickly fuses and transmits emergency data to users. The weighted filtering real-time data fusion algorithm uses the support function matrix between data to weight multiple sets of data. Fusion, the fusion result is replaced by the filter value for Kalman filtering, thereby achieving real-time dynamic fusion of multiple sets of measurement data.
进一步的,所述数据存储常用NAND FLASH存储进行实时数据存储,所述数据挖掘长采用用于聚类的K-Means、用于统计学习的SVM、用于分类的NaiveBayes进行实时数据的分析计算预测。Furthermore, the data storage often uses NAND FLASH storage for real-time data storage, and the data mining often uses K-Means for clustering, SVM for statistical learning, and NaiveBayes for classification to perform real-time data analysis, calculation and prediction.
有益效果beneficial effects
本发明提供了一种实时数据处理方法和***。具备以下有益效果:The invention provides a real-time data processing method and system. It has the following beneficial effects:
通过数据采集、导入预处理、统计分析可以进行实时数据的快速前期处理,数据融合、数据存储、数据输出可以满足实时数据的深度分析分类以及实时数据的快速调用和直观展示,数据融合、数据挖掘、数据输出可以使得实时数据在满足正常的采集、分析、展示后对实时数据进行基于大数据的计算挖掘分析和预测分析,便于对实时数据发生原理和背景进行分析和计算并输出展示到客户端,增加实时数据分析处理的可靠性和科学性。Rapid pre-processing of real-time data can be performed through data collection, import pre-processing, and statistical analysis. Data fusion, data storage, and data output can meet the needs of in-depth analysis and classification of real-time data, as well as rapid call and intuitive display of real-time data. Data fusion, data mining , Data output can enable the real-time data to perform calculation, mining, analysis and prediction analysis based on big data after meeting the normal collection, analysis and display, which facilitates the analysis and calculation of the principles and background of the real-time data and outputs the display to the client. , increasing the reliability and scientificity of real-time data analysis and processing.
附图说明Description of drawings
图1为本发明实时数据处理方法的流程图;Figure 1 is a flow chart of the real-time data processing method of the present invention;
图2为本发明实时数据处理***的组成图;Figure 2 is a composition diagram of the real-time data processing system of the present invention;
图3为本发明数据采集方法的组成图;Figure 3 is a composition diagram of the data collection method of the present invention;
图4为本发明数据融合算法类型的组成图。Figure 4 is a composition diagram of the data fusion algorithm type of the present invention.
本发明的最佳实施方式 Best Mode of Carrying Out the Invention
该一种实时数据处理方法和***的实施例如下:Examples of the real-time data processing method and system are as follows:
实施例:Example:
请参阅图1-图4,一种实时数据处理方法,包括以下步骤:Please refer to Figure 1-Figure 4, a real-time data processing method, including the following steps:
步骤一、***直接或间接从分布式控制***中采集到的实时数据,确定实时处理对象;Step 1. The system directly or indirectly collects real-time data from the distributed control system and determines the real-time processing object;
步骤二、前端数据对象导入一个集中的大型分布式数据库或者分布式存储集群,同时对实时数据一些简单的清洗和预处理工作后使用Twitter的Storm来对数据进行流式计算;Step 2: Import the front-end data objects into a centralized large-scale distributed database or distributed storage cluster, and at the same time perform some simple cleaning and preprocessing of the real-time data and then use Twitter's Storm to perform streaming calculations on the data;
步骤三、对预处理后存储在分布式数据库或者分布式计算集群内部大量实时数据进行普通的分析和分类汇总;Step 3: Perform common analysis, classification and summary on a large amount of real-time data stored in a distributed database or distributed computing cluster after preprocessing;
步骤四、采用数据融合对数据统计分析后实时数据,进行自动检测、关联、相关、估计及组合处理,数据融合将不同传感器接收的信息经过融合得到对目标状态或目标特征的判定;Step 4: Use data fusion to automatically detect, associate, correlate, estimate and combine the real-time data after data statistical analysis. Data fusion fuses the information received by different sensors to obtain a determination of the target status or target characteristics;
步骤五、经过采集、处理后的实时数据再次进行存储以方便人们对其进行查询、分析操作;Step 5: The collected and processed real-time data is stored again to facilitate people's query and analysis operations;
或对融合处理后实时数据在现有数据上面进行基于各种算法的挖掘计算,从而起到预测的效果,从而实现一些高级别数据分析的需求,满足大数据的实时分析;Or perform mining calculations based on various algorithms on the existing data after fusion processing, so as to achieve prediction effects, thereby realizing some high-level data analysis needs and meeting the real-time analysis of big data;
步骤六、经过挖掘计算分析或数据融合存储后的数据进行数据输出提供给客户端。Step 6: After mining, calculation, analysis or data fusion and storage, the data is output and provided to the client.
一种实时数据处理***,包括:A real-time data processing system including:
数据采集,用于收集外界传感器、输入设备导入计算机网络进入分布式控制***中的实时数据,汇总实时数据;Data collection is used to collect real-time data from external sensors and input devices into the computer network and into the distributed control system, and summarize the real-time data;
导入预处理,用于对采集的实时数据导入型分布式数据库或者分布式存储集群进行预处理和流式计算;Import preprocessing, which is used to preprocess and stream the collected real-time data into a distributed database or distributed storage cluster;
统计分析,用于对预处理后在分布式数据库或者分布式存储集群的实时数据进行普通的分析和分类;Statistical analysis is used for general analysis and classification of preprocessed real-time data in distributed databases or distributed storage clusters;
数据融合,用于实时数据统计分析后进行自动检测、关联、相关、估计及组合处理融合成需要的目标特征或对目标特征的判断;Data fusion is used for automatic detection, association, correlation, estimation and combination processing after real-time data statistical analysis to form required target features or judgment of target features;
数据存储,用于对融合处理后的实时数据再次进行存储以方便人们对其进行查询、分析操作;Data storage is used to store the fused real-time data again to facilitate people's query and analysis operations;
数据挖掘,用于对融合处理后的实时数据进行基于各种算法的高级预测计算; Data mining is used to perform advanced predictive calculations based on various algorithms on fused real-time data;
数据输出,用于数据存储或数据挖掘处理后面向客户端的输出。Data output is used for client-facing output after data storage or data mining processing.
数据采集方法分为直接数据采集和间接数据采集,直接数据采集是接从分布式控制***中采集到的实时数据,间接数据采集是指数据采集计算机不直接与现场分布式控制***通信,而是在分布式控制***之上放置一台上位机,上位机通过分布式控制***提供的接口采集实时数据,现场之外的数据采集计算机与上位机通信,取得所需要的实时数据。Data collection methods are divided into direct data collection and indirect data collection. Direct data collection is to receive real-time data collected from the distributed control system. Indirect data collection means that the data collection computer does not directly communicate with the on-site distributed control system, but A host computer is placed on top of the distributed control system. The host computer collects real-time data through the interface provided by the distributed control system. The data collection computer outside the site communicates with the host computer to obtain the required real-time data.
直接数据采集中分布式控制***可以采用标准的ODBC开放式数据库互连、DDE动态数据交换、OLE对象链接与嵌入,分布式控制***可以与计算机内部采集程序通过网络连接进行实时数据采集。In direct data collection, the distributed control system can use standard ODBC open database interconnection, DDE dynamic data exchange, and OLE object linking and embedding. The distributed control system can be connected to the computer's internal collection program through the network for real-time data collection.
间接数据采集中上位机通过网卡挂在分布式控制***的控制网格上,与现场分布式控制***的数据接口通信,上位机对所采集上来的实时数据一般有两种处理方式,一是实时数据以数据库、电子表格或文本文件方式放在本地硬盘中,由远程的数据采集计算机定时将数据取走,另一种是上位机定时将采集的实时数据主动发送到数据采集计算机。In indirect data collection, the host computer is hung on the control grid of the distributed control system through a network card and communicates with the data interface of the on-site distributed control system. The host computer generally has two processing methods for the collected real-time data. One is real-time The data is placed in the local hard disk in the form of a database, spreadsheet or text file, and the remote data collection computer regularly removes the data. The other is that the host computer actively sends the collected real-time data to the data collection computer at regular intervals.
统计分析采用EMC的GreenPlum、Oracle的Exadata、基于MySQL的列式存储Infobright对实时数据进行分析。Statistical analysis uses EMC's GreenPlum, Oracle's Exadata, and MySQL-based column storage Infobright to analyze real-time data.
数据融合算法类型分为带反馈的实时数据融合算法和加权滤波实时数据融合算法,带反馈的实时数据融合算法是解决目前在融合过程中的实时性要求。该算法主要强调对于不同类别的数据需要进行实时的自适应分级,将紧急数据迅速融合并传输给用户,加权滤波实时数据融合算法是利用数据间支持度函数矩阵,进行多组数据的加权融合,将融合结果替代滤波值进行卡尔曼滤波,从而实现多组测量数据的实时动态融合数据。The types of data fusion algorithms are divided into real-time data fusion algorithms with feedback and weighted filter real-time data fusion algorithms. The real-time data fusion algorithm with feedback solves the current real-time requirements in the fusion process. This algorithm mainly emphasizes the need for real-time adaptive grading of different categories of data, and quickly fuses and transmits emergency data to users. The weighted filtering real-time data fusion algorithm uses the support function matrix between data to perform weighted fusion of multiple sets of data. The fusion result is replaced by the filter value for Kalman filtering, thereby realizing real-time dynamic fusion of multiple sets of measurement data.
数据存储常用NAND FLASH存储进行实时数据存储,数据挖掘长采用用于聚类的K-Means、用于统计学习的SVM、用于分类的NaiveBayes进行实时数据的分析计算预测。Data storage commonly uses NAND FLASH storage for real-time data storage. Data mining often uses K-Means for clustering, SVM for statistical learning, and NaiveBayes for classification to analyze, calculate and predict real-time data.
通过数据采集、导入预处理、统计分析可以进行实时数据的快速前期处理,数据融合、数据存储、数据输出可以满足实时数据的深度分析分类以及实时数据的快速调用和直观展示,数据融合、数据挖掘、数据输出可以使得实时数据在满足正常的采集、分析、展示后对实时数据进行基于大数据的计算挖掘分析和预测分析,便于对实时数据发生原理和背景进行分析和计算并输出展示到客户端,增加实时数据分析处理的可靠性和科学性。Rapid pre-processing of real-time data can be performed through data collection, import pre-processing, and statistical analysis. Data fusion, data storage, and data output can meet the needs of in-depth analysis and classification of real-time data, as well as rapid call and intuitive display of real-time data. Data fusion, data mining , Data output can enable the real-time data to perform calculation, mining, analysis and prediction analysis based on big data after meeting the normal collection, analysis and display, which facilitates the analysis and calculation of the principles and background of the real-time data and outputs the display to the client. , increasing the reliability and scientificity of real-time data analysis and processing.
本发明的实施方式 Embodiments of the invention
该一种实时数据处理方法和***的实施例如下:Examples of the real-time data processing method and system are as follows:
实施例:Example:
请参阅图1-图4,一种实时数据处理方法,包括以下步骤:Please refer to Figures 1 to 4, a real-time data processing method includes the following steps:
步骤一、***直接或间接从分布式控制***中采集到的实时数据,确定实时处理对象;Step 1. The system directly or indirectly collects real-time data from the distributed control system and determines the real-time processing object;
步骤二、前端数据对象导入一个集中的大型分布式数据库或者分布式存储集群,同时对实时数据一些简单的清洗和预处理工作后使用Twitter的Storm来对数据进行流式计算;Step 2: Import the front-end data objects into a centralized large-scale distributed database or distributed storage cluster, and at the same time perform some simple cleaning and preprocessing of the real-time data and then use Twitter's Storm to perform streaming calculations on the data;
步骤三、对预处理后存储在分布式数据库或者分布式计算集群内部大量实时数据进行普通的分析和分类汇总;Step 3: Perform common analysis, classification and summary on a large amount of real-time data stored in a distributed database or distributed computing cluster after preprocessing;
步骤四、采用数据融合对数据统计分析后实时数据,进行自动检测、关联、相关、估计及组合处理,数据融合将不同传感器接收的信息经过融合得到对目标状态或目标特征的判定;Step 4: Use data fusion to automatically detect, associate, correlate, estimate and combine the real-time data after data statistical analysis. Data fusion fuses the information received by different sensors to obtain a determination of the target status or target characteristics;
步骤五、经过采集、处理后的实时数据再次进行存储以方便人们对其进行查询、分析操作;Step 5: The collected and processed real-time data is stored again to facilitate people's query and analysis operations;
或对融合处理后实时数据在现有数据上面进行基于各种算法的挖掘计算,从而起到预测的效果,从而实现一些高级别数据分析的需求,满足大数据的实时分析;Or perform mining calculations based on various algorithms on the existing data after fusion processing, so as to achieve prediction effects, thereby realizing some high-level data analysis needs and meeting the real-time analysis of big data;
步骤六、经过挖掘计算分析或数据融合存储后的数据进行数据输出提供给客户端。Step 6: After mining, calculation, analysis or data fusion and storage, the data is output and provided to the client.
一种实时数据处理***,包括:A real-time data processing system including:
数据采集,用于收集外界传感器、输入设备导入计算机网络进入分布式控制***中的实时数据,汇总实时数据;Data collection is used to collect real-time data from external sensors and input devices into the computer network and into the distributed control system, and summarize the real-time data;
导入预处理,用于对采集的实时数据导入型分布式数据库或者分布式存储集群进行预处理和流式计算;Import preprocessing, which is used to preprocess and stream the collected real-time data into a distributed database or distributed storage cluster;
统计分析,用于对预处理后在分布式数据库或者分布式存储集群的实时数据进行普通的分析和分类;Statistical analysis is used for general analysis and classification of preprocessed real-time data in distributed databases or distributed storage clusters;
数据融合,用于实时数据统计分析后进行自动检测、关联、相关、估计及组合处理融合成需要的目标特征或对目标特征的判断;Data fusion is used to automatically detect, associate, correlate, estimate and combine real-time data after statistical analysis to fuse them into the required target features or judgment of target features;
数据存储,用于对融合处理后的实时数据再次进行存储以方便人们对其进行查询、分析操作;Data storage is used to store the fused real-time data again to facilitate people's query and analysis operations;
数据挖掘,用于对融合处理后的实时数据进行基于各种算法的高级预测计算; Data mining is used to perform advanced predictive calculations based on various algorithms on fused real-time data;
数据输出,用于数据存储或数据挖掘处理后面向客户端的输出。Data output is used for client-facing output after data storage or data mining processing.
数据采集方法分为直接数据采集和间接数据采集,直接数据采集是接从分布式控制***中采集到的实时数据,间接数据采集是指数据采集计算机不直接与现场分布式控制***通信,而是在分布式控制***之上放置一台上位机,上位机通过分布式控制***提供的接口采集实时数据,现场之外的数据采集计算机与上位机通信,取得所需要的实时数据。Data collection methods are divided into direct data collection and indirect data collection. Direct data collection is to receive real-time data collected from the distributed control system. Indirect data collection means that the data collection computer does not directly communicate with the on-site distributed control system, but A host computer is placed on top of the distributed control system. The host computer collects real-time data through the interface provided by the distributed control system. The data collection computer outside the site communicates with the host computer to obtain the required real-time data.
直接数据采集中分布式控制***可以采用标准的ODBC开放式数据库互连、DDE动态数据交换、OLE对象链接与嵌入,分布式控制***可以与计算机内部采集程序通过网络连接进行实时数据采集。In direct data acquisition, distributed control systems can use standard ODBC open database interconnection, DDE dynamic data exchange, and OLE object linking and embedding. Distributed control systems can connect to the computer's internal acquisition program through a network to perform real-time data acquisition.
间接数据采集中上位机通过网卡挂在分布式控制***的控制网格上,与现场分布式控制***的数据接口通信,上位机对所采集上来的实时数据一般有两种处理方式,一是实时数据以数据库、电子表格或文本文件方式放在本地硬盘中,由远程的数据采集计算机定时将数据取走,另一种是上位机定时将采集的实时数据主动发送到数据采集计算机。In indirect data collection, the host computer is hung on the control grid of the distributed control system through a network card and communicates with the data interface of the on-site distributed control system. The host computer generally has two processing methods for the collected real-time data. One is real-time The data is placed in the local hard disk in the form of a database, spreadsheet or text file, and the remote data collection computer regularly removes the data. The other is that the host computer actively sends the collected real-time data to the data collection computer at regular intervals.
统计分析采用EMC的GreenPlum、Oracle的Exadata、基于MySQL的列式存储Infobright对实时数据进行分析。Statistical analysis uses EMC's GreenPlum, Oracle's Exadata, and MySQL-based column storage Infobright to analyze real-time data.
数据融合算法类型分为带反馈的实时数据融合算法和加权滤波实时数据融合算法,带反馈的实时数据融合算法是解决目前在融合过程中的实时性要求。该算法主要强调对于不同类别的数据需要进行实时的自适应分级,将紧急数据迅速融合并传输给用户,加权滤波实时数据融合算法是利用数据间支持度函数矩阵,进行多组数据的加权融合,将融合结果替代滤波值进行卡尔曼滤波,从而实现多组测量数据的实时动态融合数据。The types of data fusion algorithms are divided into real-time data fusion algorithms with feedback and weighted filter real-time data fusion algorithms. The real-time data fusion algorithm with feedback solves the current real-time requirements in the fusion process. This algorithm mainly emphasizes the need for real-time adaptive grading of different categories of data, and quickly fuses and transmits emergency data to users. The weighted filtering real-time data fusion algorithm uses the support function matrix between data to perform weighted fusion of multiple sets of data. The fusion result is replaced by the filter value for Kalman filtering, thereby realizing real-time dynamic fusion of multiple sets of measurement data.
数据存储常用NAND FLASH存储进行实时数据存储,数据挖掘长采用用于聚类的K-Means、用于统计学习的SVM、用于分类的NaiveBayes进行实时数据的分析计算预测。Data storage commonly uses NAND FLASH storage for real-time data storage. Data mining often uses K-Means for clustering, SVM for statistical learning, and NaiveBayes for classification to analyze, calculate and predict real-time data.
通过数据采集、导入预处理、统计分析可以进行实时数据的快速前期处理,数据融合、数据存储、数据输出可以满足实时数据的深度分析分类以及实时数据的快速调用和直观展示,数据融合、数据挖掘、数据输出可以使得实时数据在满足正常的采集、分析、展示后对实时数据进行基于大数据的计算挖掘分析和预测分析,便于对实时数据发生原理和背景进行分析和计算并输出展示到客户端,增加实时数据分析处理的可靠性和科学性。Rapid pre-processing of real-time data can be performed through data collection, import pre-processing, and statistical analysis. Data fusion, data storage, and data output can meet the needs of in-depth analysis and classification of real-time data, as well as rapid call and intuitive display of real-time data. Data fusion, data mining , Data output can enable the real-time data to perform calculation, mining, analysis and prediction analysis based on big data after meeting the normal collection, analysis and display, which facilitates the analysis and calculation of the principles and background of the real-time data and outputs the display to the client. , increasing the reliability and scientificity of real-time data analysis and processing.
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换 和变型,本发明的范围由所附权利要求及其等同物限定。 Although the embodiments of the present invention have been shown and described, those of ordinary skill in the art will understand that various changes, modifications, and substitutions can be made to these embodiments without departing from the principles and spirit of the invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.

Claims (8)

  1. 一种实时数据处理方法,其特征在于,包括以下步骤:A real-time data processing method, characterized by including the following steps:
    步骤一、***直接或间接从分布式控制***中采集到的实时数据,确定实时处理对象;Step 1. The system directly or indirectly collects real-time data from the distributed control system and determines the real-time processing object;
    步骤二、前端数据对象导入一个集中的大型分布式数据库或者分布式存储集群,同时对实时数据一些简单的清洗和预处理工作后使用Twitter的Storm来对数据进行流式计算;Step 2: Import the front-end data objects into a centralized large-scale distributed database or distributed storage cluster, and at the same time perform some simple cleaning and preprocessing of the real-time data and then use Twitter's Storm to perform streaming calculations on the data;
    步骤三、对预处理后存储在分布式数据库或者分布式计算集群内部大量实时数据进行普通的分析和分类汇总;Step 3: Perform common analysis, classification and summary on a large amount of real-time data stored in a distributed database or distributed computing cluster after preprocessing;
    步骤四、采用数据融合对数据统计分析后实时数据,进行自动检测、关联、相关、估计及组合处理,数据融合将不同传感器接收的信息经过融合得到对目标状态或目标特征的判定;Step 4: Use data fusion to automatically detect, associate, correlate, estimate and combine the real-time data after data statistical analysis. Data fusion fuses the information received by different sensors to obtain a determination of the target status or target characteristics;
    步骤五、经过采集、处理后的实时数据再次进行存储以方便人们对其进行查询、分析操作;Step 5: The collected and processed real-time data is stored again to facilitate people's query and analysis operations;
    或对融合处理后实时数据在现有数据上面进行基于各种算法的挖掘计算,从而起到预测的效果,从而实现一些高级别数据分析的需求,满足大数据的实时分析;Or perform mining calculations based on various algorithms on the existing data after fusion processing, so as to achieve prediction effects, thereby realizing some high-level data analysis needs and meeting the real-time analysis of big data;
    步骤六、经过挖掘计算分析或数据融合存储后的数据进行数据输出提供给客户端。Step 6: After mining, calculation, analysis or data fusion and storage, the data is output and provided to the client.
  2. 一种实时数据处理***,其特征在于,包括:A real-time data processing system, characterized by including:
    数据采集,用于收集外界传感器、输入设备导入计算机网络进入分布式控制***中的实时数据,汇总实时数据;Data collection is used to collect real-time data from external sensors and input devices into the computer network and into the distributed control system, and summarize the real-time data;
    导入预处理,用于对采集的实时数据导入型分布式数据库或者分布式存储集群进行预处理和流式计算;Import preprocessing, which is used to preprocess and stream the collected real-time data into a distributed database or distributed storage cluster;
    统计分析,用于对预处理后在分布式数据库或者分布式存储集群的实时数据进行普通的分析和分类;Statistical analysis is used for general analysis and classification of preprocessed real-time data in distributed databases or distributed storage clusters;
    数据融合,用于实时数据统计分析后进行自动检测、关联、相关、估计及组合处理融合成需要的目标特征或对目标特征的判断;Data fusion is used for automatic detection, association, correlation, estimation and combination processing after real-time data statistical analysis to form required target features or judgment of target features;
    数据存储,用于对融合处理后的实时数据再次进行存储以方便人们对其进行查询、分析操作;Data storage is used to store the fused real-time data again to facilitate people's query and analysis operations;
    数据挖掘,用于对融合处理后的实时数据进行基于各种算法的高级预测计算;Data mining is used to perform advanced predictive calculations based on various algorithms on fused real-time data;
    数据输出,用于数据存储或数据挖掘处理后面向客户端的输出。Data output is used for client-facing output after data storage or data mining processing.
  3. 根据权利要求2所述的一种实时数据处理***,其特征在于:所述数据采集方法分为直接数据采集和间接数据采集,所述直接数据采集是接从分布式控制***中采 集到的实时数据,所述间接数据采集是指数据采集计算机不直接与现场分布式控制***通信,而是在分布式控制***之上放置一台上位机,上位机通过分布式控制***提供的接口采集实时数据,现场之外的数据采集计算机与上位机通信,取得所需要的实时数据。A real-time data processing system according to claim 2, characterized in that: the data collection method is divided into direct data collection and indirect data collection, and the direct data collection is collected from a distributed control system. Collected real-time data, the indirect data collection means that the data collection computer does not directly communicate with the on-site distributed control system, but places a host computer on top of the distributed control system, and the host computer provides the data through the distributed control system. The interface collects real-time data, and the data collection computer outside the site communicates with the host computer to obtain the required real-time data.
  4. 根据权利要求3所述的一种实时数据处理***,其特征在于:所述直接数据采集中分布式控制***可以采用标准的ODBC开放式数据库互连、DDE动态数据交换、OLE对象链接与嵌入,分布式控制***可以与计算机内部采集程序通过网络连接进行实时数据采集。A real-time data processing system according to claim 3, characterized in that: the distributed control system in the direct data collection can adopt standard ODBC open database interconnection, DDE dynamic data exchange, OLE object linking and embedding, The distributed control system can be connected to the computer's internal collection program through the network for real-time data collection.
  5. 根据权利要求3所述的一种实时数据处理***,其特征在于:所述间接数据采集中上位机通过网卡挂在分布式控制***的控制网格上,与现场分布式控制***的数据接口通信,上位机对所采集上来的实时数据一般有两种处理方式,一是实时数据以数据库、电子表格或文本文件方式放在本地硬盘中,由远程的数据采集计算机定时将数据取走,另一种是上位机定时将采集的实时数据主动发送到数据采集计算机。A real-time data processing system according to claim 3, characterized in that: in the indirect data collection, the host computer is hung on the control grid of the distributed control system through a network card and communicates with the data interface of the on-site distributed control system. , the host computer generally has two processing methods for the real-time data collected. One is to place the real-time data in the local hard disk in the form of a database, spreadsheet or text file, and the remote data collection computer will take the data away at regular intervals. The other is to The first is that the host computer actively sends the collected real-time data to the data acquisition computer at regular intervals.
  6. 根据权利要求2所述的一种实时数据处理***,其特征在于:所述统计分析采用EMC的GreenPlum、Oracle的Exadata、基于MySQL的列式存储Infobright对实时数据进行分析。A real-time data processing system according to claim 2, characterized in that: the statistical analysis uses EMC's GreenPlum, Oracle's Exadata, and MySQL-based columnar storage Infobright to analyze real-time data.
  7. 根据权利要求2所述的一种实时数据处理***,其特征在于:所述数据融合算法类型分为带反馈的实时数据融合算法和加权滤波实时数据融合算法,所述带反馈的实时数据融合算法是解决目前在融合过程中的实时性要求。该算法主要强调对于不同类别的数据需要进行实时的自适应分级,将紧急数据迅速融合并传输给用户,所述加权滤波实时数据融合算法是利用数据间支持度函数矩阵,进行多组数据的加权融合,将融合结果替代滤波值进行卡尔曼滤波,从而实现多组测量数据的实时动态融合数据。A real-time data processing system according to claim 2, characterized in that: the data fusion algorithm type is divided into a real-time data fusion algorithm with feedback and a weighted filter real-time data fusion algorithm, and the real-time data fusion algorithm with feedback It is to solve the current real-time requirements in the integration process. This algorithm mainly emphasizes the need for real-time adaptive classification of different categories of data, and quickly fuses and transmits emergency data to users. The weighted filtering real-time data fusion algorithm uses the support function matrix between data to weight multiple sets of data. Fusion, the fusion result is replaced by the filter value for Kalman filtering, thereby achieving real-time dynamic fusion of multiple sets of measurement data.
  8. 根据权利要求2所述的一种实时数据处理***,其特征在于:所述数据存储常用NANDFLASH存储进行实时数据存储,所述数据挖掘长采用用于聚类的K-Means、用于统计学习的SVM、用于分类的NaiveBayes进行实时数据的分析计算预测。 A real-time data processing system according to claim 2, characterized in that: the data storage uses NANDFLASH storage for real-time data storage, and the data mining uses K-Means for clustering and K-Means for statistical learning. SVM and NaiveBayes for classification perform real-time data analysis, calculation and prediction.
PCT/CN2023/082711 2022-09-20 2023-03-21 Real-time data processing method and system WO2024060543A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211141240.4 2022-09-20
CN202211141240.4A CN115481183A (en) 2022-09-20 2022-09-20 Real-time data processing method and system

Publications (1)

Publication Number Publication Date
WO2024060543A1 true WO2024060543A1 (en) 2024-03-28

Family

ID=84424387

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/082711 WO2024060543A1 (en) 2022-09-20 2023-03-21 Real-time data processing method and system

Country Status (2)

Country Link
CN (1) CN115481183A (en)
WO (1) WO2024060543A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481183A (en) * 2022-09-20 2022-12-16 河北网新科技集团股份有限公司 Real-time data processing method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484409A (en) * 2014-12-16 2015-04-01 芜湖乐锐思信息咨询有限公司 Data mining method for big data processing
US20160035152A1 (en) * 2013-12-31 2016-02-04 Agnik, Llc Vehicle data mining based on vehicle onboard analysis and cloud-based distributed data stream mining algorithm
CN107967347A (en) * 2017-12-07 2018-04-27 湖北三新文化传媒有限公司 Batch data processing method, server, system and storage medium
CN109581981A (en) * 2018-12-06 2019-04-05 山东大学 A kind of data fusion system and its working method based on data assessment Yu system coordination module
CN114328688A (en) * 2021-12-27 2022-04-12 国网河北省电力有限公司信息通信分公司 Management and control platform for electric power energy big data
CN115481183A (en) * 2022-09-20 2022-12-16 河北网新科技集团股份有限公司 Real-time data processing method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105678398A (en) * 2015-12-24 2016-06-15 国家电网公司 Power load forecasting method based on big data technology, and research and application system based on method
CN106651633B (en) * 2016-10-09 2021-02-02 国网浙江省电力公司信息通信分公司 Power utilization information acquisition system based on big data technology and acquisition method thereof
CN106850249A (en) * 2016-10-26 2017-06-13 中国电力技术装备有限公司郑州电力设计院 Communication network prewarning analysis system based on big data analysis
CN107730394B (en) * 2017-09-07 2021-07-06 国网山东省电力公司淄博供电公司 Multi-element heterogeneous data fusion method for panoramic power grid based on big data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160035152A1 (en) * 2013-12-31 2016-02-04 Agnik, Llc Vehicle data mining based on vehicle onboard analysis and cloud-based distributed data stream mining algorithm
CN104484409A (en) * 2014-12-16 2015-04-01 芜湖乐锐思信息咨询有限公司 Data mining method for big data processing
CN107967347A (en) * 2017-12-07 2018-04-27 湖北三新文化传媒有限公司 Batch data processing method, server, system and storage medium
CN109581981A (en) * 2018-12-06 2019-04-05 山东大学 A kind of data fusion system and its working method based on data assessment Yu system coordination module
CN114328688A (en) * 2021-12-27 2022-04-12 国网河北省电力有限公司信息通信分公司 Management and control platform for electric power energy big data
CN115481183A (en) * 2022-09-20 2022-12-16 河北网新科技集团股份有限公司 Real-time data processing method and system

Also Published As

Publication number Publication date
CN115481183A (en) 2022-12-16

Similar Documents

Publication Publication Date Title
WO2022252505A1 (en) Device state monitoring method based on multi-index cluster analysis
WO2021179572A1 (en) Operation and maintenance system anomaly index detection model optimization method and apparatus, and storage medium
WO2022105266A1 (en) Elevator fault prediction method, system and apparatus, computer device, and storage medium
WO2021073114A1 (en) Abnormal traffic monitoring method, apparatus and device based on statistics, and storage medium
US11012289B2 (en) Reinforced machine learning tool for anomaly detection
WO2024060543A1 (en) Real-time data processing method and system
CN110297469B (en) Production line fault judgment method based on resampling integrated feature selection algorithm
CN115186883A (en) Industrial equipment health state monitoring system and method based on Bian Yun collaborative computing
CN103593470B (en) The integrated unbalanced data flow classification algorithm of a kind of two degree
TW202223769A (en) Systems and methods for enhanced machine learning using hierarchical prediction and compound thresholds
CN109491339B (en) Big data-based substation equipment running state early warning system
CN112415331A (en) Power grid secondary system fault diagnosis method based on multi-source fault information
WO2024007580A1 (en) Power equipment parallel fault diagnosis method and apparatus based on hybrid clustering
WO2019019429A1 (en) Anomaly detection method, device and apparatus for virtual machine, and storage medium
WO2021128523A1 (en) Technology readiness level determination method and system based on science and technology big data
US20240193035A1 (en) Point Anomaly Detection
CN105930255A (en) Method and apparatus for predicting health degree of system
WO2024027487A1 (en) Health degree evaluation method and apparatus based on intelligent operations and maintenance scene
CN109388512A (en) For the assessment and analysis system of large-scale computer cluster intensity of anomaly
CN113092083A (en) Machine pump fault diagnosis method and device based on fractal dimension and neural network
LU503958B1 (en) Real-time Data Treatment Method and System
CN110888850A (en) Data quality detection method based on power Internet of things platform
Chen et al. Machine learning-based anomaly detection of ganglia monitoring data in HEP Data Center
CN114493234A (en) Method for identifying key pressure control points of water supply pipe network
JP6201053B2 (en) Feature data management system and feature data management method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23866843

Country of ref document: EP

Kind code of ref document: A1