CN112241391A - Method and system for extracting, cleaning and integrating power data of power supply company - Google Patents
Method and system for extracting, cleaning and integrating power data of power supply company Download PDFInfo
- Publication number
- CN112241391A CN112241391A CN202011115601.9A CN202011115601A CN112241391A CN 112241391 A CN112241391 A CN 112241391A CN 202011115601 A CN202011115601 A CN 202011115601A CN 112241391 A CN112241391 A CN 112241391A
- Authority
- CN
- China
- Prior art keywords
- data
- power
- power supply
- cleaning
- supply company
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 40
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000004519 manufacturing process Methods 0.000 claims abstract description 33
- 238000013075 data extraction Methods 0.000 claims abstract description 21
- 230000002159 abnormal effect Effects 0.000 claims abstract description 15
- 230000010354 integration Effects 0.000 claims abstract description 14
- 238000012795 verification Methods 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 15
- 230000009193 crawling Effects 0.000 claims description 8
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 230000007547 defect Effects 0.000 abstract description 5
- 238000007726 management method Methods 0.000 description 6
- 230000005611 electricity Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000007418 data mining Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/116—Details of conversion of file system types or formats
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
- G06Q10/06375—Prediction of business process outcome or impact based on a proposed change
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Educational Administration (AREA)
- Marketing (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Fuzzy Systems (AREA)
- General Health & Medical Sciences (AREA)
- Water Supply & Treatment (AREA)
- Computational Linguistics (AREA)
- Development Economics (AREA)
- Public Health (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method and a system for extracting, cleaning and integrating power data of a power supply company, wherein the method for extracting, cleaning and integrating the power data of the power supply company sequentially comprises the following steps: s1: data extraction, namely extracting power data of monthly reports of production and operation of county-level power supply companies; s2: data cleaning, namely performing data quality verification on the extracted power data and cleaning abnormal data of the extracted power data; s3: and data integration, namely integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form an internal data set of the complete power supply company. The system for extracting, cleaning and integrating the power data of the power supply company comprises a data extraction module, a data cleaning module and a data integration module. The invention solves the defect of low utilization degree of data value in the production and management monthly newspaper in the prior art.
Description
Technical Field
The invention relates to the technical field of power systems, in particular to a method and a system for extracting, cleaning and integrating power data of a power supply company.
Background
Electric power is a forerunner of economic development and is the basis of the development of local economy. The power data can directly reflect the regional economic development vitality and characteristic state. The power supply company is used as a main unit of power supply and is responsible for guaranteeing safe and reliable supply of regional energy and economic development of service areas.
Based on the requirement of supporting the balanced development of regional power grid companies and promoting the stable development of regional economy, a data mining and analyzing system for producing monthly reports needs to be constructed, the data value in the monthly reports of the production and operation of county-level power supply companies is deeply researched, and regional characteristic data is combined, through related multi-dimensional mining and comprehensive analysis content, a county-level power supply company is guided to improve the quality and efficiency of power grid production and operation, explore production and operation problems, evaluate the current situation of regional economic development and judge the future development trend, problems and defects existing in the production and operation monthly reports of each county-level power supply company are explored, the integrity and accuracy of the monthly reports are improved, the comprehensive analysis mode and dimension of the production and operation reports are optimized, decision support is provided for provincial-level power supply companies and prefecture companies to optimize resource allocation, relevant improvement and treatment work of each county-level power supply company is assisted, and balanced and healthy development of regions is promoted.
However, the prior art has the defects that: at present, the provincial power supply company has insufficient accuracy of the overall data mastery degree of the monthly reports produced and operated by the county power supply companies, and the county power supply companies have certain differences in the format, the statistical dimension and the like of the monthly reports produced and operated, so that the provincial power supply companies have inaccurate accuracy of the power data mastery of the county power supply companies.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method and a system for extracting, cleaning and integrating power data of a power supply company, and solves the defect that the provincial power supply company cannot accurately master the power data of the county power supply company in the prior art.
The technical scheme adopted by the invention for solving the problems is as follows:
the method for extracting, cleaning and integrating the power data of the power supply company sequentially comprises the following steps of:
s1: data extraction, namely extracting power data of monthly reports of production and operation of county-level power supply companies;
s2: data cleaning, namely performing data quality verification on the extracted power data and cleaning abnormal data of the extracted power data;
s3: and data integration, namely integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form an internal data set of the complete power supply company.
Through the steps, the provincial power supply company extracts the electric power data of the monthly newspaper of the production and operation of the county power supply company, then clears abnormal data, and integrates the abnormal data, so that certain differences of the format, the statistical dimension and the like of the monthly newspaper of the production and operation of each county power supply company are effectively avoided, and the accuracy of the provincial power supply company in mastering the electric power data of the county power supply company is improved.
Preferably, the data extraction includes: and (5) combing the data structure and constructing a data extraction program.
The data structure is combed, the goal of extraction is convenient to be clear, the automatic extraction is convenient for a data extraction program, and the extraction efficiency is improved.
Preferably, the data structure combing includes: confirming a data crawling object and converting a file format.
The data crawling object is data of a table in an annex of a production and operation monthly report of each county-level power supply company, and the source of the power data is determined according to the extraction requirement; the file format conversion is to convert all files with different file formats (including multiple file formats such as doc, pdf, wps and rar) into a uniform file format in batches, so as to prepare for file data extraction and mining.
Preferably, the data crawling object comprises power data in an industry electricity utilization classification table, an electricity sale detail statistical table, a line loss rate statistical table, a 10kv heavy loss and negative loss line and platform area detail table and a 10kv heavy load line and platform area detail table.
The attached table has various key electric power data information, and electric power data in the attached table is crawled to conveniently, comprehensively and accurately master the electric power information.
Preferably, the target format of the file format conversion is.
Docx is convenient for identification and labeling and operation.
Preferably, the data cleansing is for a case including: the form in the production and management monthly report is in a picture form, so that data cannot be crawled; the condition that the tabular form and the data dimension are inconsistent with other reports exists in the production and management monthly report; header duplication exists in the production and management monthly report form, resulting in crawl to useless fields.
The above conditions will seriously affect the accuracy of power data extraction, so cleaning the power data in the above conditions will greatly improve the accuracy of power data extraction.
Preferably, the power data includes industry power consumption information, power selling amount, line loss rate, power supply line information, distribution area and line details.
The indexes are convenient for objectively reflecting the power condition, and have guiding significance for controlling the condition of a county-level power supply company.
The system for extracting, cleaning and integrating the power data of the power supply company comprises a data extraction module, a data cleaning module and a data integration module;
the data extraction module is used for extracting the electric power data of the monthly newspaper for the production and operation of the county-level power supply company;
the data cleaning module is used for checking the data quality of the extracted power data and cleaning abnormal data of the extracted power data;
the data integration module is used for integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form a complete power supply company internal data set.
Through the modules, the provincial power supply company extracts the power data of the monthly newspaper of the production and operation of the county power supply company, then clears abnormal data, and integrates the abnormal data, so that certain differences of the format, the statistical dimension and the like of the monthly newspaper of the production and operation of each county power supply company are effectively avoided, and the accuracy of the provincial power supply company in mastering the power data of the county power supply company is improved.
Compared with the prior art, the invention has the following beneficial effects:
(1) the method effectively avoids certain differences of production and operation monthly report formats, statistical dimensions and the like of county-level power supply companies, so that the accuracy of the provincial-level power supply companies in mastering the electric power data of the county-level power supply companies is improved;
(2) the invention is convenient for defining the target of extraction, and the data extraction program is convenient for automatic extraction, thereby improving the extraction efficiency;
(3) the method and the device are beneficial to determining the source of the power data according to the extraction requirement;
(4) the invention can conveniently, comprehensively and accurately master the electric power information;
(5) the target format of the file format conversion is docx, which is convenient for identification and marking and operation;
(6) the accuracy of electric power data extraction is greatly improved;
(7) the method is convenient for objectively reflecting the power condition, and has guiding significance for controlling the condition of a county-level power supply company.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited to these examples.
Example 1
The method for extracting, cleaning and integrating the power data of the power supply company sequentially comprises the following steps of:
s1: data extraction, namely extracting power data of monthly reports of production and operation of county-level power supply companies;
s2: data cleaning, namely performing data quality verification on the extracted power data and cleaning abnormal data of the extracted power data;
s3: and data integration, namely integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form an internal data set of the complete power supply company.
Through the steps, the provincial power supply company extracts the electric power data of the monthly newspaper of the production and operation of the county power supply company, then clears abnormal data, and integrates the abnormal data, so that certain differences of the format, the statistical dimension and the like of the monthly newspaper of the production and operation of each county power supply company are effectively avoided, and the accuracy of the provincial power supply company in mastering the electric power data of the county power supply company is improved.
Preferably, the data extraction includes: and (5) combing the data structure and constructing a data extraction program.
The data structure is combed, the goal of extraction is convenient to be clear, the automatic extraction is convenient for a data extraction program, and the extraction efficiency is improved.
Example 2
In order to better illustrate the present invention, as a further optimization of embodiment 1, this embodiment includes all the technical features of embodiment 1, and the difference is that this embodiment further includes the following technical features:
preferably, the data structure combing includes: confirming a data crawling object and converting a file format.
The data crawling object is data of a table in an annex of a production and operation monthly report of each county-level power supply company, and the source of the power data is determined according to the extraction requirement; the file format conversion is to convert all files with different file formats (including multiple file formats such as doc, pdf, wps and rar) into a uniform file format in batches, so as to prepare for file data extraction and mining.
Preferably, the data crawling object comprises power data in an industry electricity utilization classification table, an electricity sale detail statistical table, a line loss rate statistical table, a 10kv heavy loss and negative loss line and platform area detail table and a 10kv heavy load line and platform area detail table.
The attached table has various key electric power data information, and electric power data in the attached table is crawled to conveniently, comprehensively and accurately master the electric power information.
Preferably, the target format of the file format conversion is.
Docx is convenient for identification and labeling and operation.
Preferably, the data cleansing is for a case including: the form in the production and management monthly report is in a picture form, so that data cannot be crawled; the condition that the tabular form and the data dimension are inconsistent with other reports exists in the production and management monthly report; header duplication exists in the production and management monthly report form, resulting in crawl to useless fields.
The above conditions will seriously affect the accuracy of power data extraction, so cleaning the power data in the above conditions will greatly improve the accuracy of power data extraction.
Preferably, the power data includes industry power consumption information, power selling amount, line loss rate, power supply line information, distribution area and line details.
The indexes are convenient for objectively reflecting the power condition, and have guiding significance for controlling the condition of a county-level power supply company.
Example 3
The system for extracting, cleaning and integrating the power data of the power supply company comprises a data extraction module, a data cleaning module and a data integration module;
the data extraction module is used for extracting the electric power data of the monthly newspaper for the production and operation of the county-level power supply company;
the data cleaning module is used for checking the data quality of the extracted power data and cleaning abnormal data of the extracted power data;
the data integration module is used for integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form a complete power supply company internal data set.
Through the modules, the provincial power supply company extracts the power data of the monthly newspaper of the production and operation of the county power supply company, then clears abnormal data, and integrates the abnormal data, so that certain differences of the format, the statistical dimension and the like of the monthly newspaper of the production and operation of each county power supply company are effectively avoided, and the accuracy of the provincial power supply company in mastering the power data of the county power supply company is improved.
As described above, the present invention can be preferably realized.
The foregoing is only a preferred embodiment of the present invention, and the present invention is not limited thereto in any way, and any simple modification, equivalent replacement and improvement made to the above embodiment within the spirit and principle of the present invention still fall within the protection scope of the present invention.
Claims (8)
1. The method for extracting, cleaning and integrating the power data of the power supply company is characterized by sequentially comprising the following steps of:
s1: data extraction, namely extracting power data of monthly reports of production and operation of county-level power supply companies;
s2: data cleaning, namely performing data quality verification on the extracted power data and cleaning abnormal data of the extracted power data;
s3: and data integration, namely integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form an internal data set of the complete power supply company.
2. The extraction, cleaning and integration method for power supply company power data according to claim 1, characterized in that the data extraction comprises: and (5) combing the data structure and constructing a data extraction program.
3. The extraction, cleaning and integration method of power supply company power data as claimed in claim 2, wherein the data structure combing comprises: confirming a data crawling object and converting a file format.
4. The method for extracting, cleaning and integrating the power data of the power supply company according to claim 3, wherein the data crawling objects comprise the power data in the section Classification of industry power consumption, the section statistical Table of sold power, the section statistical Table of line loss rate, the section detailed Table of 10kv heavy loss, negative loss line and station area, and the section detailed Table of 10kv heavy load line and station area.
5. The method for extracting, cleaning and integrating the power data of the power supply company as claimed in claim 3, wherein the target format of the file format conversion is.
6. The extraction, cleaning and integration method for power supply company power data according to claim 1, wherein the data cleaning is performed on the condition that: the form in the production and management monthly report is in a picture form, so that data cannot be crawled; the condition that the tabular form and the data dimension are inconsistent with other reports exists in the production and management monthly report; header duplication exists in the production and management monthly report form, resulting in crawl to useless fields.
7. The method for extracting, cleaning and integrating the power data of the power supply company as claimed in claim 1, wherein the power data comprises industry power consumption information, power selling amount, line loss rate, power supply line information, station area and line detail.
8. The system for extracting, cleaning and integrating the power data of the power supply company is characterized by comprising a data extraction module, a data cleaning module and a data integration module;
the data extraction module is used for extracting the electric power data of the monthly newspaper for the production and operation of the county-level power supply company;
the data cleaning module is used for checking the data quality of the extracted power data and cleaning abnormal data of the extracted power data;
the data integration module is used for integrating the power data of each month into a complete data set on the basis of the data cleaned, so as to form a complete power supply company internal data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011115601.9A CN112241391A (en) | 2020-10-19 | 2020-10-19 | Method and system for extracting, cleaning and integrating power data of power supply company |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011115601.9A CN112241391A (en) | 2020-10-19 | 2020-10-19 | Method and system for extracting, cleaning and integrating power data of power supply company |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112241391A true CN112241391A (en) | 2021-01-19 |
Family
ID=74169126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011115601.9A Withdrawn CN112241391A (en) | 2020-10-19 | 2020-10-19 | Method and system for extracting, cleaning and integrating power data of power supply company |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112241391A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114037560A (en) * | 2021-11-12 | 2022-02-11 | 国网福建省电力有限公司 | Company power supply comprehensive meter data acquisition method based on purchase and sale synchronization |
CN114066215A (en) * | 2021-11-12 | 2022-02-18 | 国网福建省电力有限公司 | Company caliber electricity selling itemized access method based on purchasing and selling synchronization |
-
2020
- 2020-10-19 CN CN202011115601.9A patent/CN112241391A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114037560A (en) * | 2021-11-12 | 2022-02-11 | 国网福建省电力有限公司 | Company power supply comprehensive meter data acquisition method based on purchase and sale synchronization |
CN114066215A (en) * | 2021-11-12 | 2022-02-18 | 国网福建省电力有限公司 | Company caliber electricity selling itemized access method based on purchasing and selling synchronization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102567859B (en) | Data integrated management system of intelligentized power supply system | |
CN108446396B (en) | Power data processing method based on improved CIM model | |
CN112241391A (en) | Method and system for extracting, cleaning and integrating power data of power supply company | |
CN106251094B (en) | 10kV business expansion and installation work order transaction analysis device and analysis method | |
CN112615428A (en) | Line loss analysis and treatment system and method | |
CN112308437B (en) | Line loss management method, system, device and storage medium based on big data analysis | |
CN104361086A (en) | Data integration method for measurable asset entire life-cycle management system | |
CN106503240A (en) | A kind of power equipment image analysis data base construction method and device | |
CN112100223A (en) | Power grid enterprise power equipment marketing and distribution run-through data acquisition and processing method | |
CN103247087A (en) | City distribution network graphical intelligent anti-misoperation system and method of system | |
CN110719445A (en) | Remote meter reading system and method based on image recognition | |
CN110852646A (en) | On-site fault processing management system based on mobile operation terminal | |
CN108376324B (en) | System and method for managing metering assets of carrier collector | |
CN107194529B (en) | Power distribution network reliability economic benefit analysis method and device based on mining technology | |
CN201417948Y (en) | Distribution network status and operating mode optimizing system based on DSCADA system | |
CN115115470A (en) | Green data center carbon emission management method based on emission factor method | |
CN115062948A (en) | Power system measurement method based on Internet of things | |
CN111049157B (en) | Distribution network transformer reactive compensation condition analysis method | |
CN114003774A (en) | A big data information collection system of electric power for wisdom city | |
CN110852606A (en) | Production early report data object analysis method based on regulation cloud | |
CN111478340A (en) | Distribution network line reactive compensation condition analysis method | |
CN111260311A (en) | Electric quantity data platform system and analysis method | |
CN113592806B (en) | Intelligent comprehensive self-information checking system for transformer substation | |
CN103345659A (en) | Processing method for optimizing connecting information of new consumers of power distribution network | |
CN111461515B (en) | Intelligent analysis method for transformer substation vacant interval based on electric power big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210119 |