CN113407723A - Multi-source heterogeneous power load data fusion method, device, equipment and storage medium - Google Patents
Multi-source heterogeneous power load data fusion method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN113407723A CN113407723A CN202110809255.2A CN202110809255A CN113407723A CN 113407723 A CN113407723 A CN 113407723A CN 202110809255 A CN202110809255 A CN 202110809255A CN 113407723 A CN113407723 A CN 113407723A
- Authority
- CN
- China
- Prior art keywords
- data
- fusion
- heterogeneous
- source
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 12
- 230000004927 fusion Effects 0.000 claims abstract description 110
- 238000012545 processing Methods 0.000 claims abstract description 48
- 238000011156 evaluation Methods 0.000 claims abstract description 22
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000005516 engineering process Methods 0.000 claims abstract description 16
- 238000010606 normalization Methods 0.000 claims abstract description 15
- 238000004590 computer program Methods 0.000 claims description 19
- 238000007499 fusion processing Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 10
- 230000002159 abnormal effect Effects 0.000 claims description 9
- 238000012937 correction Methods 0.000 claims description 9
- 238000012216 screening Methods 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 4
- 238000013441 quality evaluation Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001915 proofreading effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Databases & Information Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Marketing (AREA)
- Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application relates to a multi-source heterogeneous power load data fusion method, device, equipment and storage medium. The method comprises the following steps: acquiring a multi-source heterogeneous text of power load data, and performing format normalization processing on the heterogeneous text; extracting key characters from the heterogeneous text to construct a knowledge dictionary, and obtaining an object database of the heterogeneous text through multi-source object name classification; matching the knowledge dictionary with the object name by adopting a MapReduce programming model parallel processing technology, and then matching the object name with object data; and performing longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after Reduce processing. Through multi-level processing of multi-source heterogeneous power grid load big data, a whole set of electric power big data fusion strategy and evaluation from fusion to evaluation are provided, and the data fusion is strong in practicability and high in efficiency.
Description
Technical Field
The application relates to the field of big data, in particular to a multi-source heterogeneous power load data fusion method, device, equipment and storage medium.
Background
The power load data has the characteristics of large scale order, various types, high change speed and the like, and is typical large data. At present, according to different professional requirements, each department usually establishes an independent model parameter library and maintains the model parameter libraries independently, and due to the lack of a cooperative management mechanism, the consistency of the model parameter libraries is difficult to ensure.
Patent document with publication number CN107402976A discloses a power grid multi-source data fusion method and system based on a multi-element heterogeneous model, which establish a unified model of each source system data, and calculate the matching degree between models through model traversal comparison, thereby realizing more than 90% of automatic integrated fusion of data, but the method and system do not provide a related method for large data of power load, and the pertinence is weak; patent document CN103617557A proposes a multi-source heterogeneous power grid operation parameter analysis system for power grid operation parameters, but for massive parameter processing, no relevant big data processing technology is introduced, and the data processing efficiency needs to be improved. Therefore, how to improve the pertinence of load data fusion and the processing technology needs further technical innovation.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a multi-source heterogeneous power load data fusion method, apparatus, device, and storage medium for solving the above technical problems.
In a first aspect, an embodiment of the present invention provides a multi-source heterogeneous power load data fusion method, including the following steps:
acquiring a multi-source heterogeneous text of power load data, and performing format normalization processing on the heterogeneous text;
extracting key characters from the heterogeneous text to construct a knowledge dictionary, and obtaining an object database of the heterogeneous text through multi-source object name classification;
matching the knowledge dictionary with the object name by adopting a MapReduce programming model parallel processing technology, and then matching the object name with object data;
and performing longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after Reduce processing.
Further, after the fusion of the heterogeneous texts, the evaluation of the fusion data is further included, and the evaluation of the fusion data includes:
the online correction of the parameters is completed through the real-time verification of the data of the knowledge dictionary and the object database;
comparing the fused data with the fused data, and eliminating error values and abnormal values generated in the fusion process;
the uniqueness of the data of the same object is ensured, and the redundancy generated by repeated data in the fusion process is eliminated;
and evaluating the time effectiveness and the space effectiveness so as to fuse the data in real time and comprehensively.
Further, the extracting key characters from the heterogeneous text to construct a knowledge dictionary, and obtaining an object database of the heterogeneous text by multi-source object name classification includes:
acquiring a multi-source heterogeneous text format comprising an Excel file, a DAT file and a CIM file from a Web Service interface of a power system;
obtaining the knowledge dictionary by extracting key characters and screening names of the normalized heterogeneous texts to remove duplicates;
and classifying the normalized heterogeneous text by multiple object names and sorting corresponding numerical values to obtain the object database.
Further, carry out vertical parameter fusion and horizontal parameter fusion to the multisource matching result after Reduce handles, include:
eliminating parameter differences of the same department specialty at different levels, and completing longitudinal parameter fusion by introducing a difference function:
and transverse parameter fusion is completed through parameter fusion between different professional departments in the same-level scheduling.
On the other hand, an embodiment of the present invention further provides a multi-source heterogeneous power load data fusion system, including:
the load data preprocessing module is used for acquiring a multi-source heterogeneous text of the power load data and carrying out format normalization processing on the heterogeneous text;
the data classification module is used for extracting key characters from the heterogeneous text to construct a knowledge dictionary and obtaining an object database of the heterogeneous text through multi-source object name classification;
the data matching module is used for matching the knowledge dictionary with the object names by adopting a MapReduce programming model parallel processing technology and then matching the object names with the object data;
and the parameter fusion module is used for performing longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after Reduce processing.
Further, the system further comprises a fusion evaluation module, wherein the fusion evaluation module is used for:
the online correction of the parameters is completed through the real-time verification of the data of the knowledge dictionary and the object database;
comparing the fused data with the fused data, and eliminating error values and abnormal values generated in the fusion process;
the uniqueness of the data of the same object is ensured, and the redundancy generated by repeated data in the fusion process is eliminated;
and evaluating the time effectiveness and the space effectiveness so as to fuse the data in real time and comprehensively.
Further, the data classification module includes a text normalization unit, and the text normalization unit is configured to:
acquiring a multi-source heterogeneous text format comprising an Excel file, a DAT file and a CIM file from a Web Service interface of a power system;
obtaining the knowledge dictionary by extracting key characters and screening names of the normalized heterogeneous texts to remove duplicates;
and classifying the normalized heterogeneous text by multiple object names and sorting corresponding numerical values to obtain the object database.
Further, the parameter fusion module includes a classification fusion unit, and the classification fusion unit is configured to:
eliminating parameter differences of the same department specialty at different levels, and completing longitudinal parameter fusion by introducing a difference function:
and transverse parameter fusion is completed through parameter fusion between different professional departments in the same-level scheduling.
The embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and when the processor executes the computer program, the following steps are implemented:
acquiring a multi-source heterogeneous text of power load data, and performing format normalization processing on the heterogeneous text;
extracting key characters from the heterogeneous text to construct a knowledge dictionary, and obtaining an object database of the heterogeneous text through multi-source object name classification;
matching the knowledge dictionary with the object name by adopting a MapReduce programming model parallel processing technology, and then matching the object name with object data;
and performing longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after Reduce processing.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps:
acquiring a multi-source heterogeneous text of power load data, and performing format normalization processing on the heterogeneous text;
extracting key characters from the heterogeneous text to construct a knowledge dictionary, and obtaining an object database of the heterogeneous text through multi-source object name classification;
matching the knowledge dictionary with the object name by adopting a MapReduce programming model parallel processing technology, and then matching the object name with object data;
and performing longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after Reduce processing.
The multi-source heterogeneous power load data fusion method, device, equipment and storage medium comprise a fusion strategy of load power data and a corresponding data fusion quality evaluation method. The object database and the knowledge dictionary in the fusion strategy realize the structural separation of object names and object values of the multi-source load data, the MapReduce parallel processing technology is used for improving the matching efficiency, and the longitudinal parameter fusion and the transverse parameter fusion are used for fusing the results after Reduce processing; the evaluation method evaluates the data fusion quality and simultaneously respectively checks the data fusion quality with the object database and the knowledge dictionary in real time, so that online repair is realized. The embodiment of the invention provides a whole set of electric power big data fusion strategy and evaluation from fusion to evaluation by multi-level processing of the multi-source heterogeneous power grid load big data, and the data fusion is strong in practicability and high in efficiency.
Drawings
FIG. 1 is a schematic flow chart diagram of a multi-source heterogeneous power load data fusion method in one embodiment;
FIG. 2 is a schematic flow chart diagram illustrating a method for evaluating fused data according to one embodiment;
FIG. 3 is a flow diagram that illustrates the construction of a knowledge dictionary and database in one embodiment;
FIG. 4 is a schematic flow diagram of longitudinal and lateral data fusion in one embodiment;
FIG. 5 is a block diagram of a multi-source heterogeneous power load data fusion system in one embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a multi-source heterogeneous power load data fusion method is provided, which includes the following steps:
102, extracting key characters from the heterogeneous text to construct a knowledge dictionary, and classifying by multi-source object names to obtain an object database of the heterogeneous text;
103, matching the knowledge dictionary with the object name by adopting a MapReduce programming model parallel processing technology, and then matching the object name with object data;
and 104, performing longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after Reduce processing.
Specifically, the method comprises a fusion strategy of load power data and a corresponding data fusion quality evaluation method. The object database and the knowledge dictionary in the fusion strategy realize the structure separation of object names and object values of multi-source load data, and the MapReduce parallel processing technology is used for improving the matching efficiency, wherein MapReduce is a programming model and is used for parallel operation of large-scale data sets (larger than 1TB), and the current software implementation specifies a Map function for mapping a group of key value pairs into a group of new key value pairs and specifies a concurrent Reduce function for ensuring that all the mapped key value pairs share the same key group. The longitudinal parameter fusion and the transverse parameter fusion are used for fusing the results after Reduce processing; the evaluation method evaluates the data fusion quality and simultaneously respectively checks the data fusion quality with the object database and the knowledge dictionary in real time, so that online repair is realized. The embodiment of the invention provides a whole set of electric power big data fusion strategy and evaluation from fusion to evaluation by multi-level processing of the multi-source heterogeneous power grid load big data, and the data fusion is strong in practicability and high in efficiency.
In one embodiment, as shown in fig. 2, the evaluation method of the fused data includes:
and step 204, evaluating the time effectiveness and the space effectiveness to ensure that the fusion data is real-time and comprehensive.
Specifically, after the fusion of the data is completed, the data fusion quality needs to be evaluated in terms of integrity, accuracy, uniqueness and effectiveness, and the parameters are corrected online through real-time verification of the data with the knowledge dictionary and the object database. The integrity evaluation comprises object quantity evaluation and data quantity evaluation, and the completeness of the object quantity and the completeness of the corresponding data after data fusion is ensured; the accuracy evaluation is to compare the fused data with the fused data to prevent the fusion process from generating error values and abnormal values; the uniqueness evaluation means that the uniqueness of the same object data is guaranteed, and the redundant generation of repeated data is prevented; the effectiveness evaluation comprises time effectiveness and space effectiveness, and the real-time performance and the multifacetability of the data are ensured. And after the four characteristics of the data are evaluated, the object name and the knowledge dictionary are checked, the object data and the object database are checked in real time, and the fused data are corrected on line.
In one embodiment, as shown in fig. 3, the process of knowledge dictionary and database construction includes:
and step 303, classifying the normalized heterogeneous texts through multiple object names and sorting corresponding numerical values to obtain the object database.
Specifically, the multi-source heterogeneous text format obtained through the Web Service interface comprises an Excel file, a DAT file, a CIM file and the like; the knowledge dictionary is obtained by extracting key characters from the normalized text data and screening and de-duplicating names; the object database is obtained by classifying the normalized text data by multiple object names and sorting corresponding numerical values; MapReduce parallel processing divides the object database into n object sets for parallel matching, so that the efficiency of matching the object names with the knowledge dictionary and matching the object data with the object names is improved.
In one embodiment, as shown in fig. 4, the process of vertical and horizontal data fusion of data includes:
and step 402, completing transverse parameter fusion through parameter fusion between different professional departments in the same-level scheduling.
Specifically, by means of longitudinal and transverse parameter fusion, multi-source heterogeneous power load big data from a PMU (phasor measurement Unit), an SCADA (supervisory control and data acquisition), a fault recorder and a user acquisition system are wide in range; in addition, aiming at the processing of large data of a power load, a MapReduce parallel processing technology is introduced, so that the data processing efficiency is improved; in the evaluation after the integration, the quality evaluation is carried out on the data fusion result from four aspects, and the online correction is completed through the real-time proofreading with the knowledge dictionary and the object database.
It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the above-described flowcharts may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 5, there is provided a multi-source heterogeneous power load data fusion system, comprising:
the load data preprocessing module 501 is configured to obtain a multi-source heterogeneous text of power load data, and perform format normalization processing on the heterogeneous text;
the data classification module 502 is used for extracting key characters from the heterogeneous text to construct a knowledge dictionary, and obtaining an object database of the heterogeneous text through multi-source object name classification;
the data matching module 503 is configured to match the knowledge dictionary with the object names by using a MapReduce programming model parallel processing technology, and then match the object names with the object data;
and the parameter fusion module 504 is configured to perform longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after the Reduce processing.
In one embodiment, further comprising a fusion assessment module to:
the online correction of the parameters is completed through the real-time verification of the data of the knowledge dictionary and the object database;
comparing the fused data with the fused data, and eliminating error values and abnormal values generated in the fusion process;
the uniqueness of the data of the same object is ensured, and the redundancy generated by repeated data in the fusion process is eliminated;
and evaluating the time effectiveness and the space effectiveness so as to fuse the data in real time and comprehensively.
In one embodiment, as shown in fig. 5, the data classification module 502 includes a text normalization unit 5021, which is configured to:
acquiring a multi-source heterogeneous text format comprising an Excel file, a DAT file and a CIM file from a Web Service interface of a power system;
obtaining the knowledge dictionary by extracting key characters and screening names of the normalized heterogeneous texts to remove duplicates;
and classifying the normalized heterogeneous text by multiple object names and sorting corresponding numerical values to obtain the object database.
In one embodiment, as shown in fig. 5, the parameter fusion module 504 includes a classification fusion unit 5041, and the classification fusion unit 5041 is configured to:
eliminating parameter differences of the same department specialty at different levels, and completing longitudinal parameter fusion by introducing a difference function:
and transverse parameter fusion is completed through parameter fusion between different professional departments in the same-level scheduling.
For specific limitations of the multi-source heterogeneous power load data fusion system, reference may be made to the above limitations on the multi-source heterogeneous power load data fusion method, which is not described herein again. All or part of each module in the multi-source heterogeneous power load data fusion system can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
FIG. 6 is a diagram illustrating an internal structure of a computer device in one embodiment. As shown in fig. 6, the computer apparatus includes a processor, a memory, a network interface, an input device, and a display screen connected through a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement the method of privilege anomaly detection. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform the method for detecting an abnormality of authority. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
acquiring a multi-source heterogeneous text of power load data, and performing format normalization processing on the heterogeneous text;
extracting key characters from the heterogeneous text to construct a knowledge dictionary, and obtaining an object database of the heterogeneous text through multi-source object name classification;
matching the knowledge dictionary with the object name by adopting a MapReduce programming model parallel processing technology, and then matching the object name with object data;
and performing longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after Reduce processing.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
the online correction of the parameters is completed through the real-time verification of the data of the knowledge dictionary and the object database;
comparing the fused data with the fused data, and eliminating error values and abnormal values generated in the fusion process;
the uniqueness of the data of the same object is ensured, and the redundancy generated by repeated data in the fusion process is eliminated;
and evaluating the time effectiveness and the space effectiveness so as to fuse the data in real time and comprehensively.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
acquiring a multi-source heterogeneous text format comprising an Excel file, a DAT file and a CIM file from a Web Service interface of a power system;
obtaining the knowledge dictionary by extracting key characters and screening names of the normalized heterogeneous texts to remove duplicates;
and classifying the normalized heterogeneous text by multiple object names and sorting corresponding numerical values to obtain the object database.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
eliminating parameter differences of the same department specialty at different levels, and completing longitudinal parameter fusion by introducing a difference function:
and transverse parameter fusion is completed through parameter fusion between different professional departments in the same-level scheduling.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring a multi-source heterogeneous text of power load data, and performing format normalization processing on the heterogeneous text;
extracting key characters from the heterogeneous text to construct a knowledge dictionary, and obtaining an object database of the heterogeneous text through multi-source object name classification;
matching the knowledge dictionary with the object name by adopting a MapReduce programming model parallel processing technology, and then matching the object name with object data;
and performing longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after Reduce processing.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
the online correction of the parameters is completed through the real-time verification of the data of the knowledge dictionary and the object database;
comparing the fused data with the fused data, and eliminating error values and abnormal values generated in the fusion process;
the uniqueness of the data of the same object is ensured, and the redundancy generated by repeated data in the fusion process is eliminated;
and evaluating the time effectiveness and the space effectiveness so as to fuse the data in real time and comprehensively.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
acquiring a multi-source heterogeneous text format comprising an Excel file, a DAT file and a CIM file from a Web Service interface of a power system;
obtaining the knowledge dictionary by extracting key characters and screening names of the normalized heterogeneous texts to remove duplicates;
and classifying the normalized heterogeneous text by multiple object names and sorting corresponding numerical values to obtain the object database.
In one embodiment, the processor, when executing the computer program, further performs the steps of:
eliminating parameter differences of the same department specialty at different levels, and completing longitudinal parameter fusion by introducing a difference function:
and transverse parameter fusion is completed through parameter fusion between different professional departments in the same-level scheduling.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A multi-source heterogeneous power load data fusion method is characterized by comprising the following steps:
acquiring a multi-source heterogeneous text of power load data, and performing format normalization processing on the heterogeneous text;
extracting key characters from the heterogeneous text to construct a knowledge dictionary, and obtaining an object database of the heterogeneous text through multi-source object name classification;
matching the knowledge dictionary with the object name by adopting a MapReduce programming model parallel processing technology, and then matching the object name with object data;
and performing longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after Reduce processing.
2. The method of claim 1, further comprising, after fusing the heterogeneous text, an evaluation of fused data, the evaluation of fused data comprising:
the online correction of the parameters is completed through the real-time verification of the data of the knowledge dictionary and the object database;
comparing the fused data with the fused data, and eliminating error values and abnormal values generated in the fusion process;
the uniqueness of the data of the same object is ensured, and the redundancy generated by repeated data in the fusion process is eliminated;
and evaluating the time effectiveness and the space effectiveness so as to fuse the data in real time and comprehensively.
3. The method of claim 2, wherein the extracting key characters from the heterogeneous text to construct a knowledge dictionary, and obtaining an object database of the heterogeneous text by multi-source object name classification comprises:
acquiring a multi-source heterogeneous text format comprising an Excel file, a DAT file and a CIM file from a Web Service interface of a power system;
obtaining the knowledge dictionary by extracting key characters and screening names of the normalized heterogeneous texts to remove duplicates;
and classifying the normalized heterogeneous text by multiple object names and sorting corresponding numerical values to obtain the object database.
4. The method according to claim 2, wherein the performing longitudinal parameter fusion and transverse parameter fusion on the Reduce-processed multi-source matching result comprises:
eliminating parameter differences of the same department specialty at different levels, and completing longitudinal parameter fusion by introducing a difference function:
and transverse parameter fusion is completed through parameter fusion between different professional departments in the same-level scheduling.
5. A multi-source heterogeneous power load data fusion system, comprising:
the load data preprocessing module is used for acquiring a multi-source heterogeneous text of the power load data and carrying out format normalization processing on the heterogeneous text;
the data classification module is used for extracting key characters from the heterogeneous text to construct a knowledge dictionary and obtaining an object database of the heterogeneous text through multi-source object name classification;
the data matching module is used for matching the knowledge dictionary with the object names by adopting a MapReduce programming model parallel processing technology and then matching the object names with the object data;
and the parameter fusion module is used for performing longitudinal parameter fusion and transverse parameter fusion on the multi-source matching result after Reduce processing.
6. The multi-source heterogeneous power load data fusion system of claim 5, further comprising a fusion assessment module to:
the online correction of the parameters is completed through the real-time verification of the data of the knowledge dictionary and the object database;
comparing the fused data with the fused data, and eliminating error values and abnormal values generated in the fusion process;
the uniqueness of the data of the same object is ensured, and the redundancy generated by repeated data in the fusion process is eliminated;
and evaluating the time effectiveness and the space effectiveness so as to fuse the data in real time and comprehensively.
7. The multi-source heterogeneous power load data fusion system of claim 5, wherein the data classification module comprises a text normalization unit to:
acquiring a multi-source heterogeneous text format comprising an Excel file, a DAT file and a CIM file from a Web Service interface of a power system;
obtaining the knowledge dictionary by extracting key characters and screening names of the normalized heterogeneous texts to remove duplicates;
and classifying the normalized heterogeneous text by multiple object names and sorting corresponding numerical values to obtain the object database.
8. The multi-source heterogeneous power load data fusion system of claim 5, wherein the parameter fusion module comprises a classification fusion unit configured to:
eliminating parameter differences of the same department specialty at different levels, and completing longitudinal parameter fusion by introducing a difference function:
and transverse parameter fusion is completed through parameter fusion between different professional departments in the same-level scheduling.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 4 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110809255.2A CN113407723A (en) | 2021-07-16 | 2021-07-16 | Multi-source heterogeneous power load data fusion method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110809255.2A CN113407723A (en) | 2021-07-16 | 2021-07-16 | Multi-source heterogeneous power load data fusion method, device, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113407723A true CN113407723A (en) | 2021-09-17 |
Family
ID=77686754
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110809255.2A Pending CN113407723A (en) | 2021-07-16 | 2021-07-16 | Multi-source heterogeneous power load data fusion method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113407723A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113836940A (en) * | 2021-09-26 | 2021-12-24 | 中国南方电网有限责任公司 | Knowledge fusion method and device in electric power metering field and computer equipment |
CN114970667A (en) * | 2022-03-30 | 2022-08-30 | 国网吉林省电力有限公司 | Multi-source heterogeneous energy data fusion method |
CN116303392A (en) * | 2023-03-02 | 2023-06-23 | 重庆市规划和自然资源信息中心 | Multi-source data table management method for real estate registration data |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617557A (en) * | 2013-11-06 | 2014-03-05 | 广东电网公司电力科学研究院 | Multi-source heterogeneous power grid operation parameter analysis system |
CN105184424A (en) * | 2015-10-19 | 2015-12-23 | 国网山东省电力公司菏泽供电公司 | Mapreduced short period load prediction method of multinucleated function learning SVM realizing multi-source heterogeneous data fusion |
CN107402976A (en) * | 2017-07-03 | 2017-11-28 | 国网山东省电力公司经济技术研究院 | Power grid multi-source data fusion method and system based on multi-element heterogeneous model |
CN108170752A (en) * | 2017-12-21 | 2018-06-15 | 山东合天智汇信息技术有限公司 | metadata management method and system based on template |
CN109086573A (en) * | 2018-07-30 | 2018-12-25 | 东北师范大学 | Multi-source biology big data convergence platform |
CN109165202A (en) * | 2018-07-04 | 2019-01-08 | 华南理工大学 | A kind of preprocess method of multi-source heterogeneous big data |
CN110781249A (en) * | 2019-10-16 | 2020-02-11 | 华电国际电力股份有限公司技术服务分公司 | Knowledge graph-based multi-source data fusion method and device for thermal power plant |
CN111897875A (en) * | 2020-07-31 | 2020-11-06 | 平安科技(深圳)有限公司 | Fusion processing method and device for urban multi-source heterogeneous data and computer equipment |
CN112214928A (en) * | 2020-09-27 | 2021-01-12 | 贵州电网有限责任公司 | Multi-source data processing and fusing method and system for low-voltage power distribution network |
CN113051249A (en) * | 2021-03-22 | 2021-06-29 | 江苏杰瑞信息科技有限公司 | Cloud service platform design method based on multi-source heterogeneous big data fusion |
-
2021
- 2021-07-16 CN CN202110809255.2A patent/CN113407723A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617557A (en) * | 2013-11-06 | 2014-03-05 | 广东电网公司电力科学研究院 | Multi-source heterogeneous power grid operation parameter analysis system |
CN105184424A (en) * | 2015-10-19 | 2015-12-23 | 国网山东省电力公司菏泽供电公司 | Mapreduced short period load prediction method of multinucleated function learning SVM realizing multi-source heterogeneous data fusion |
CN107402976A (en) * | 2017-07-03 | 2017-11-28 | 国网山东省电力公司经济技术研究院 | Power grid multi-source data fusion method and system based on multi-element heterogeneous model |
CN108170752A (en) * | 2017-12-21 | 2018-06-15 | 山东合天智汇信息技术有限公司 | metadata management method and system based on template |
CN109165202A (en) * | 2018-07-04 | 2019-01-08 | 华南理工大学 | A kind of preprocess method of multi-source heterogeneous big data |
CN109086573A (en) * | 2018-07-30 | 2018-12-25 | 东北师范大学 | Multi-source biology big data convergence platform |
CN110781249A (en) * | 2019-10-16 | 2020-02-11 | 华电国际电力股份有限公司技术服务分公司 | Knowledge graph-based multi-source data fusion method and device for thermal power plant |
CN111897875A (en) * | 2020-07-31 | 2020-11-06 | 平安科技(深圳)有限公司 | Fusion processing method and device for urban multi-source heterogeneous data and computer equipment |
CN112214928A (en) * | 2020-09-27 | 2021-01-12 | 贵州电网有限责任公司 | Multi-source data processing and fusing method and system for low-voltage power distribution network |
CN113051249A (en) * | 2021-03-22 | 2021-06-29 | 江苏杰瑞信息科技有限公司 | Cloud service platform design method based on multi-source heterogeneous big data fusion |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113836940A (en) * | 2021-09-26 | 2021-12-24 | 中国南方电网有限责任公司 | Knowledge fusion method and device in electric power metering field and computer equipment |
CN113836940B (en) * | 2021-09-26 | 2024-04-12 | 南方电网数字电网研究院股份有限公司 | Knowledge fusion method and device in electric power metering field and computer equipment |
CN114970667A (en) * | 2022-03-30 | 2022-08-30 | 国网吉林省电力有限公司 | Multi-source heterogeneous energy data fusion method |
CN114970667B (en) * | 2022-03-30 | 2024-03-29 | 国网吉林省电力有限公司 | Multi-source heterogeneous energy data fusion method |
CN116303392A (en) * | 2023-03-02 | 2023-06-23 | 重庆市规划和自然资源信息中心 | Multi-source data table management method for real estate registration data |
CN116303392B (en) * | 2023-03-02 | 2023-09-01 | 重庆市规划和自然资源信息中心 | Multi-source data table management method for real estate registration data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113407723A (en) | Multi-source heterogeneous power load data fusion method, device, equipment and storage medium | |
US11176028B2 (en) | System, method and storage device for CIM/E model standard compliance test | |
CN104573906B (en) | System and method for analyzing oscillation stability in power transmission system | |
Barbosa et al. | Using performance profiles to analyze the results of the 2006 CEC constrained optimization competition | |
Zhu et al. | Metanetwork framework for integrated performance assessment under uncertainty in construction projects | |
CN108460068A (en) | Method, apparatus, storage medium and the terminal that report imports and exports | |
Mandelli et al. | Dynamic PRA: an overview of new algorithms to generate, analyze and visualize data | |
Yang et al. | DEJIT: a differential evolution algorithm for effort-aware just-in-time software defect prediction | |
CN114528688A (en) | Method and device for constructing reliability digital twin model and computer equipment | |
US20120226484A1 (en) | Calculation simulation system and method thereof | |
CN110597726A (en) | Safety management method, device, equipment and storage medium for avionic system | |
Vartziotis et al. | Learn to Code Sustainably: An Empirical Study on LLM-based Green Code Generation | |
CN114743703A (en) | Reliability analysis method, device, equipment and storage medium for nuclear power station unit | |
Chatterjee et al. | NHPP-Based software reliability growth modeling and optimal release policy for N-Version programming system with increasing fault detection rate under imperfect debugging | |
CN113919609A (en) | Power distribution network model quality comprehensive evaluation method and system | |
Sarker et al. | Cp-sam: Cyber-power security assessment and resiliency analysis tool for distribution system | |
Thomas et al. | An innovative and automated solution for NERC PRC-027-1 compliance | |
Karimishad et al. | Probabilistic transient stability assessment using two-point estimate method | |
CN111105140A (en) | Comprehensive risk assessment method for running state of power distribution network | |
Wang et al. | Empirical study on the correlation between software structural modifications and its fault-proneness | |
Zhi-bo et al. | Analysis of software process effectiveness based on orthogonal defect classification | |
CN103677849A (en) | Embedded software credibility guaranteeing method | |
Singh et al. | Prediction of software quality model using gene expression programming | |
Shen et al. | Research on energy digital twin quality model based on data driven | |
CN115270310B (en) | Method for determining structural reliability design index of external culvert casing of aero-engine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210917 |