CN109446184A - Power generation big data preprocess method and system based on big data analysis platform - Google Patents

Power generation big data preprocess method and system based on big data analysis platform Download PDF

Info

Publication number
CN109446184A
CN109446184A CN201810989231.8A CN201810989231A CN109446184A CN 109446184 A CN109446184 A CN 109446184A CN 201810989231 A CN201810989231 A CN 201810989231A CN 109446184 A CN109446184 A CN 109446184A
Authority
CN
China
Prior art keywords
data
big data
analysis platform
power generation
data analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810989231.8A
Other languages
Chinese (zh)
Other versions
CN109446184B (en
Inventor
刘文哲
肖祥武
邹光球
李号彩
文雯
向春波
李志金
姜鑫
白全生
胡卫生
尹晓峰
周宏贵
刘克勤
谢小鹏
张博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Datang Xianyi Technology Co Ltd
Original Assignee
Hunan Datang Xianyi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Datang Xianyi Technology Co Ltd filed Critical Hunan Datang Xianyi Technology Co Ltd
Priority to CN201810989231.8A priority Critical patent/CN109446184B/en
Publication of CN109446184A publication Critical patent/CN109446184A/en
Application granted granted Critical
Publication of CN109446184B publication Critical patent/CN109446184B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of power generation big data preprocess methods and system based on big data analysis platform, this method comprises: extracting the operation data of power plants generating electricity unit from the real-time data base in power plant, and are uploaded to big data analysis platform;When needing to call the operation data of power plants generating electricity unit, according to generating set startup-shutdown decision rule, the operation data is filtered, the data of generating set startup-shutdown are deleted from the operation data of the power plants generating electricity unit obtained in the big data analysis platform.The present invention obtains standard, clean, continuous, required high-volume data by Data acquisition and storage and unit startup-shutdown data filtering, uses for subsequent big data statistics, big data excavation etc..

Description

Power generation big data preprocess method and system based on big data analysis platform
Technical field
The present invention relates to power informatization technical field more particularly to a kind of power generations based on big data analysis platform to count greatly Data preprocess method and system.
Background technique
With the application and development of electricity generation system information technology, power plant digitized degree is higher and higher, has had accumulated The historical data of magnanimity is to have been unable to meet power industry based on limited sample analysis using traditional data digging method The analysis of quick obtaining knowledge and information needs from mass data.And use big data technology to the big number of the electric power of electricity generation system According to data mining is carried out, it can more clearly find the detailed information that initial data can not disclose, greatly improve the big number of electric power According to the value contained.The application of electric power big data technology is informatization of power industry, the inevitable requirement of intelligent development, is to realize The key technology of wisdom power plant and the wisdom energy.
The power equipment state monitoring information category that all kinds of instrument, sensor obtain in electricity generation system is more, quantity is big, meaning Justice is very fuzzy, in the electric power big data of electricity power enterprise is excavated, due to there is very strong coupling to close between each system, each equipment System, index calculation formula are complicated.That there are certain proportions in the magnanimity initial data of storage is imperfect, inconsistent, there have to be abnormal dirty Data drastically influence the execution efficiency that big data excavates modeling, or to will lead to Result barely satisfactory.
Currently, the research of electric power big data and application are still at an early stage, how to apply big data technology to electricity power enterprise Electric power big data carry out analysis mining, obtaining and being hidden in the various values of depths is current problem urgently to be resolved.And it obtains It is most important that good data sample goes out ideal result to big data mining analysis.Therefore before big data analysis excavation, need Data are pre-processed.Data prediction is a mostly important and cumbersome step, and workload typically constitutes from entire excavate and divides The 70% of analysis process.
Since transient data of generating set during start and stop and varying duty is influenced by measurement means and measurement accuracy, And situations such as power plant's calculation of thermodynamics formula limitation.It is inevitable that the big data that generates electricity is stored in power plant historical data base Ground can have an impact the execution efficiency of data mining algorithm there is imperfect, inconsistent, inaccurate data, or even meeting Cause the deviation of Result.
Summary of the invention
It is an object of that present invention to provide a kind of power generation big data preprocess method and system based on big data analysis platform, It there is technical issues that with the big data that solves to generate electricity imperfect, inconsistent and inaccurate.
To achieve the above object, the present invention provides a kind of power generation big data pretreatment side based on big data analysis platform Method, comprising the following steps:
S1: the operation data of power plants generating electricity unit is extracted from the real-time data base in power plant, and is uploaded to big data Analysis platform;
S2: when needing to call the operation data of power plants generating electricity unit, according to generating set startup-shutdown decision rule, filtering Operation data deletes generating set startup-shutdown from the operation data of the power plants generating electricity unit obtained in big data analysis platform Data.
As further improvements in methods of the invention:
In step S2, the Rule of judgment of data is shut down are as follows: while meeting load≤8MW and revolving speed≤2900r/Min.
After the completion of step S2, method further include:
S3: detection one-dimensional noise simultaneously replaces exceptional value.
Step S3 includes: to detect one-dimensional noise using box traction substation method, chooses the upper quartile and 1.5 of sample sorting data The sum of difference of quartile up and down again is the upper limit as health data, using lower quartile and 1.5 times of upper lower quartile The difference of difference is the lower limit as health data;The exceptional value detected is replaced using linear interpolation processing method.
Method further include:
S4: according to the load variations of generating set, judging whether generating set is in steady working condition, deletes at generating set Operation data when unstable period.
In S4, the operating condition by generating set in load up and load down is determined as unstable period.
Method further include:
S5: it detects the local outlier in operation data and filters deletion.
Step S5 includes: to calculate the parts of all the points using the local LOF algorithm based on KNN to peel off the factor, according to point Part peel off the factor, judge whether be a little abnormal point;When point is abnormal point, filtering is deleted.
Operation data in step S5 includes: the operation number for all referring to target operation data and load and net coal consumption rate According to.
As a general technical idea, the present invention also provides a kind of power generation big datas based on big data analysis platform Pretreatment system including memory, processor and stores the computer program that can be run on a memory and on a processor, The step of processor realizes any of the above-described method when executing computer program.
The invention has the following advantages:
Power generation big data preprocess method and system based on big data analysis platform of the invention, by data acquisition with Storage and unit startup-shutdown data filtering, obtain standard, clean, continuous, required high-volume data, unite for subsequent big data Meter, big data excavation etc. use.
Power generation big data preprocess method and system based on big data analysis platform of the invention, is examined by one-dimensional noise It surveys and processing, unit operating condition sentences steady processing, local outlier detection and processing, collected power generation big data is carried out Noise pretreatment.The data processing of power generation big data mining algorithm is reduced, the quality of data is improved, and then is the big number of subsequent power generation It is improved efficiency according to mining analysis and accuracy.
Other than objects, features and advantages described above, there are also other objects, features and advantages by the present invention. Below with reference to accompanying drawings, the present invention is described in further detail.
Detailed description of the invention
The attached drawing constituted part of this application is used to provide further understanding of the present invention, schematic reality of the invention It applies example and its explanation is used to explain the present invention, do not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the stream of the power generation big data preprocess method based on big data analysis platform of the preferred embodiment of the present invention 1 Journey schematic diagram;
Fig. 2 is the power generation big data preprocess method based on big data analysis platform of the preferred embodiment of the present invention 2 or 3 Flow diagram.
Specific embodiment
The embodiment of the present invention is described in detail below in conjunction with attached drawing, but the present invention can be defined by the claims Implement with the multitude of different ways of covering.
Referring to Fig. 1, the power generation big data preprocess method of the invention based on big data analysis platform, including following step It is rapid:
S1: the operation data of power plants generating electricity unit is extracted from the real-time data base in power plant, and is uploaded to big data Analysis platform;
S2: when needing to call the operation data of power plants generating electricity unit, according to generating set startup-shutdown decision rule, filtering Operation data deletes generating set startup-shutdown from the operation data of the power plants generating electricity unit obtained in big data analysis platform Data.
By Data acquisition and storage and unit startup-shutdown data filtering, generating set startup-shutdown process data can be deleted Variation is very fast, causes bad influence to data statistics and excavation.This partial data is weeded out.For subsequent big data system Meter, big data excavation etc. provide more acurrate clean data.
Embodiment 1:
Referring to Fig. 1, the power generation big data preprocess method based on big data analysis platform of the present embodiment, including following step It is rapid:
S1: extracting the operation data of power plants generating electricity unit from the real-time data base in power plant, generates txt textual data According to, and it is uploaded to big data analysis platform, it is clear to carry out data to the power generation big data of storage in big data analysis platform It washes, data mining;
S2: when needing to call the operation data of power plants generating electricity unit, according to generating set startup-shutdown decision rule, filtering Operation data deletes generating set start-stop from the operation data of the power plants generating electricity unit obtained in the big data analysis platform The data of machine;Shut down the Rule of judgment of data are as follows: while meeting load≤8MW and revolving speed≤2900r/Min.Startup-shutdown data category In unstable state data, generate electricity big data analysis, typically just significant to steady-state analysis.Rule of judgment is exactly same When meet load≤8MW and revolving speed≤2900r/Min.It, can be with by Data acquisition and storage and unit startup-shutdown data filtering It is very fast to delete the variation of generating set startup-shutdown process data, bad influence is caused to data statistics and excavation.By this partial data To weeding out.
S3: detection one-dimensional noise simultaneously replaces exceptional value.Include: that one-dimensional noise is detected using box traction substation method, chooses sample row The sum of the upper quartile of ordinal number evidence and 1.5 times of the difference of quartile up and down are the upper limit as health data, using lower four points The difference of digit and 1.5 times of the difference of quartile up and down is the lower limit as health data;It is replaced using linear interpolation processing method The exceptional value detected.
S4: according to the load variations of generating set, judge whether generating set is in steady working condition, generating set is being risen The operating condition of load and load down is determined as unstable period.Delete operation data when generating set is in unstable period.
S5: it detects the local outlier in operation data and filters deletion.In the present embodiment, using the part based on KNN LOF algorithm, the part for calculating all the points peel off the factor, are peeled off the factor according to the part of point, judge whether be a little abnormal point;When When point is abnormal point, delete processing is filtered.Local outlier in operation data is detected in two steps just, first to all referring to target Operation data is handled, then is handled the operation data of load and net coal consumption rate.
Above step sentences steady processing, local outlier detection and processing by one-dimensional noise measuring and processing, unit operating condition And etc., noise pretreatment is carried out to collected power generation big data.The data processing for reducing power generation big data mining algorithm, changes It is improved efficiency and accuracy into the quality of data, and then for subsequent power generation big data mining analysis.
Embodiment 2:
Referring to fig. 2, the power generation big data preprocess method based on big data analysis platform of the present embodiment, including following step It is rapid:
S1: Data acquisition and storage.By the way that from power plant plant level supervisory information system, (collection process real-time monitoring, optimization are controlled The level of factory automatic information system that system and Technical innova- tion are integrated, Supervisory information system in Plant leve writes a Chinese character in simplified form SIS system) real-time data base in extract the operation data of power plants generating electricity unit, generate TXT text Data, and it is uploaded to HDFS (Hadoop Distributed File System) distributed storage system of big data analysis platform TXT data file transition is that parquet format is stored to big data analysis by the merging of file and the conversion of format by system Platform.Data file in platform is stored in substantially in HDFS file system, and HDFS supports the storage of big data quantity;Operation generates Journal file be stored in HBase (high reliability, high-performance, towards column, telescopic distributed memory system, Hadoop Database) distributed data base, support fast and efficiently literacy.
S2: unit startup-shutdown data filtering.Unit startup-shutdown data filtering refers to that there are one during unit operation A bit the case where startup-shutdown process, the data variation in this period is very fast, causes bad influence to data statistics and excavation.Needing will This partial data is weeded out.According to thermal power generation unit startup-shutdown decision rule, mainly with unit load and revolving speed index into Row determines, in the present embodiment, to meet load≤8MW simultaneously, the data judging of two conditions of revolving speed≤2900r/Min is to shut down Data.
S3: one-dimensional noise measuring and processing.During data acquisition of plant, noise or different is inevitably introduced Chang Dian.The data collected by hand suffer from the puzzlement of typing mistake, the data collected automatically also inevitably exist by sensor, Transmission, system reading etc. processes and caused by noise data.So being directed to such situation, one-dimensional noise measuring and processing are taken. This step detects one-dimensional noise using box traction substation, investigates the exceptional value of adjacent data, then determine using outlier processing method Adjacent data end value.
Box traction substation (Boxplot) can be used to observe the distribution situation of data entirety, be in one group of data minimum value, First quartile, median, third quartile and maximum value reflect center and the scattered band of data distribution.It is logical It crosses the ascending arrangement of data all in group and is divided into quarter, the number in three cut-point positions is determined as quartile Number.By calculating these statistics, a cabinet figure is generated, cabinet contains most normal data, and in cabinet top It is exactly abnormal data except boundary and lower boundary.
Wherein the calculation formula of up-and-down boundary is as follows:
AU=Q3+1.5IQR=75% quantile+(- 25% quantile of 75% quantile) * 1.5 (1)
AL=Q1-1.5IQR=25% quantile-(- 25% quantile of 75% quantile) * 1.5 (2)
Parameter declaration: AU is the cabinet upper limit;AL is cabinet lower limit;Q1 is to indicate lower quartile, i.e. 25% quantile;Q3 For upper quartile, i.e. 75% quantile;IQR indicates interquartile range up and down;Coefficient 1.5 is a kind of by a large amount of analyses and experience The typical coefficient accumulated.
There are many outlier processing methods that detected for box traction substation method, such as mean value replacement, median replacement, mode Replacement etc..In view of power generation big data is mostly continuous variable, abnormal point should keep with neighbouring normal point trend consistency, because This uses linear interpolation processing.Linear interpolation component uses abnormal point moment normal point up and down, and it is original to replace to go mean value Exceptional value.The method of this processing makes data smoother, can eliminate the influence of noise data.
S4: unit operating condition sentences steady processing.Thermal power generation unit is in the variable working condition such as load up and load down, certain fingers There is advanced or lag variation in target variation, at this point, there are biggish deviation, coal consumption meters for the calculated values such as coal consumption and actual value Calculation value is a false coal consumption value, cannot reflect true situation.So the variation of index lead-lag can under variable working condition To be carried out and be handled by the judgement of steady working condition.
Whether judging generating set stable conditions.Determination of stability can be carried out to the index of characterization operating condition.Judge this respectively Whether the variation size (change absolute value or rate of change) being worth in the front and back some time at a little index a certain moment is beyond setting Range.When range of the index value any in these indexs of a certain moment beyond setting, then assert that this moment unit is not Stable state.Data will not be included in ASSOCIATE STATISTICS and analysis at this time.Mainly selected generating set load, main vapour pressure, main steaming Stripping temperature, reheat steam temperature, feedwater flow, feed temperature are as thermal power generation unit startup-shutdown Judging index, according to certain Decision rule, in sample every ten minutes data carry out stable state judgement, be judged as unstable data and be removed Fall.Recursion 5min backward again, constitutes the data small sample of new 10min, determines again.
S5: local outlier detection and processing.
Using the local LOF (Local for being based on KNN (KNN, K-NearestNeighbor, K nearest neighbour classification algorithm) Outlier Factor, local outlier factor algorithm) algorithm realizes.This method combination K- nearest neighbor algorithm (KNN), passes through construction Accident tree is revealed in one mixing, the K distance calculated a little rapidly and efficiently, optimize based on distance calculate point it is local peel off because The LOF algorithm of son.The efficiency of algorithm can be effectively improved, higher-dimension and large data sets can be handled.
The algorithm is mainly to judge whether be a little abnormal point with the interior density put of its K neighborhood by comparing point.Algorithm The part of output all the points peels off the factor, if the factor that peels off of point is more less than 1, then it represents that the density of the point is adjacent much larger than its K The density put in domain;If the factor that peels off of point is more greater than 1, then it represents that the density of the point is more less than the density put in its K neighborhood, The point more may be abnormal point.The specific implementation steps are as follows for algorithm:
(1) the K distance of K- nearest neighbor algorithm (KNN) output all the points is utilized;
(2) all the points in a little K neighborhood are found out;
(3) it is calculated according to formula (3) and calculates the reach distance that point in point K neighborhood arrives the point, wherein p is to calculate point, o For the point in p K neighborhood;
Reach-distancek(p)=max { k-distance (0), d (p, o) } (3)
Wherein, Reach-distancek (p, o) is the kth reach distance of point o to point p, and k-distance (o) is point o's Kth distance, d (p, o) are the distance between point p and point o.
(4) it is calculated from the formula the reachable density for calculating point;
Wherein, lrdk (p) is the local reachability density of point p, | Nk (p) | for the number put in the kth neighborhood of point p, Nk (p) For the K neighborhood that calculate point p.
(5) part for calculating point up to density is calculated according to following formula (5) to peel off the factor.
Wherein, LOFk(p) part to calculate point up to density peels off the factor.
(6) as calculated LOFk(p) >=2.5 when, direct scalping method delete processing data are used in data set.
Embodiment 3:
Referring to fig. 2, the present embodiment is nearest by big data analysis platform, acquiring the overcritical 600MW unit of certain power plant 184 history energy consumption index data such as 1 year load, main vapour pressure, net coal consumption rate, utilize the above big data preprocess method pair The sample data is cleaned and is pre-processed, and rejects non-real real data, and sentence to operating condition steady, being good under acquisition steady working condition Health data carry out data mining analysis.Specific step is as follows:
S1: Data acquisition and storage.
Based on the history data of #3 unit, a nearest annual data is acquired, amounts to 525600.Data volume amounts to 4.5GB.When acquiring data, file is divided into two batch acquisitions and is completed with the format of txt.In merging and the lattice for passing through data Data are merged into a file by formula conversion, and HDFS document storage system is arrived in storage.
S2: unit startup-shutdown data filtering.
During unit operation, the case where there are some startup-shutdown processes, by this partial data to being weeded out.Mainly The data point of reference is load and revolving speed.Load is specifically configured to less than or equal to 8MW, revolving speed is set smaller than equal to 2900r/ Min.Meet the data of the two conditions simultaneously to shut down data.By filtering, remaining data has 409553.Shut down data 116047.
S3: one-dimensional rejecting outliers and processing.
For noise data, the replacement of exceptional value is carried out after first being detected using box traction substation using linear interpolation method.It looks for The quartile point of data after one-dimensional sorts out, using 1.5 times of a quarter position, four/three-bit value and the two difference in terms of Calculate the upper and lower limits of attribute normal value.Value except range is exceptional value.The data that generate electricity are passing through startup-shutdown filtration treatment Afterwards, still some index parameter value zeros, will affect the determination of box traction substation quartile.Therefore it uses and goes case collimation method abnormality detection comprising two kinds Application method goes 0 case line processing to go 0 case line processing with non-.
On the basis of the statistical result to 0 value, 46 attributes are selected to carry out 0 case line detection abnormal.Optionally select 128 Attribute progress is non-to go 0 case line detection abnormal.Detect abnormal data value 1, normal value 0 in addition column.
S4: unit operating condition sentences steady processing.
In Thermal generation unit actual moving process, influenced by edge-restraint conditions such as load, coal quality and environment, system It can change at any time with the operating status of equipment, operational process is always continuous under " one transition of stable state, one stable state " each state Alternately.Data mining analysis needs to establish under the stable state of unit, it is therefore desirable to carry out the judgement of stable state.
In the energy consumption data sample of acquisition, chooses six characteristic index combinations and determine, specific Judging index and condition Such as table 1:
1 fired power generating unit steady working condition of table determines
Wherein, δLoadFor the numerical value of load, AmaxFor load maximum value, AminFor load minimum value;
δMain vapour pressureFor the numerical value of main vapour pressure, BmaxFor main vapour pressure maximum value, BminFor main vapour pressure minimum value;
δStream temperatureFor the numerical value of main steam temperature, CmaxFor main steam temperature maximum value, CminFor main steam temperature minimum value;
δReheat steam temperatureFor the numerical value of reheat steam temperature, DmaxFor reheat steam temperature maximum value, DminMost for reheat steam temperature Small value;
δFeedwater flowFor the numerical value of feedwater flow, EmaxFor feedwater flow maximum value, EminFor feedwater flow minimum value;
TFeed temperatureFor the numerical value of feed temperature;
During sentencing steady, ten minutes data is taken to be determined every time, meeting condition is then stabilization by data setting State, if being unsatisfactory for steady working condition one of them, parameter recursion 5min backward, and new 5min data are taken, constitute 10min's Data carry out the judgement of steady state condition to unit again.Unstable data are then removed.Stable state data after screening 308978, unstable data have 100575.
S5: partial isolated point detection and processing.
The detection and processing of partial isolated point are carried out using follow-on KNN-LOF algorithm.Calculate K apart from when use The algorithm of KNN optimizes.In the factor algorithm that locally peels off, determine that the K- distance of algorithm is calculated by setup parameter K value, The LOF factor for calculating all data points is filtered further according to LOF factor pair data.
Detection and place of the implementation case to sample above treated data successively carry out part twice the peels off factor Reason.Whole index overall situations are handled for the first time, according to the needs of the data mining of energy consumption analysis, second to net coal consumption rate Index carries out Local treatment, since net coal consumption rate is calculated in real time by Thermodynamics Formulas come due to calculating by each parameter Journey is complicated, and influence factor is numerous, and calculated result inevitably will appear large error.Therefore it is directed to load-coal consumption characteristic relation, it utilizes Two-dimensional process is carried out to two attributes of load and net coal consumption rate based on KNN-LOF algorithm assembly.
In view of data set data volume is larger, 200,000 or more, after analyzing and researching by contrast, preferred plan is taken.That is: The peel off parameter K of the factor of primary part is set as 500, filters out the data that the lof factor is greater than 2.5, preliminary treatment drops off group Farther away point.The factor K that peels off for the second time parameter is set as 500, filters out the data that the lof factor is greater than 2.8.By handling data 216007 are remained with, screening weeds out 20677.
Embodiment 4:
The present invention also provides a kind of power generation big data pretreatment system based on big data analysis platform, including storage Device, processor and storage on a memory and the computer program that can run on a processor, processor execution computer journey The step of any of the above-described method is realized when sequence.
In summary, the present invention is opened by being tool using big data analysis platform by Data acquisition and storage, unit It shuts down data filtering, one-dimensional noise measuring and processing, unit operating condition and sentences steady processing, local outlier detection and processing, The problems such as collected power generation big data is pre-processed, noise in electric power big data, exception can be handled.Located in advance by data Reason improves the quality of data, and data is allowed to better adapt to specific Spark big data platform digging tool.Effectively improve big data The quality of excavation reduces actual excavation process time.Standard, clean, continuous, required high-volume data are obtained, power generation is reduced The data processing of big data mining algorithm improves the quality of data, and then improves efficiency for subsequent power generation big data mining analysis And accuracy.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of power generation big data preprocess method based on big data analysis platform, which comprises the following steps:
S1: the operation data of power plants generating electricity unit is extracted from the real-time data base in power plant, and is uploaded to big data analysis Platform;
S2: when needing to call the operation data of power plants generating electricity unit, according to generating set startup-shutdown decision rule, described in filtering Operation data deletes generating set start-stop from the operation data of the power plants generating electricity unit obtained in the big data analysis platform The data of machine.
2. the power generation big data preprocess method according to claim 1 based on big data analysis platform, which is characterized in that In the step S2, the Rule of judgment of the startup-shutdown data are as follows: while meeting load≤8MW and revolving speed≤2900r/Min.
3. the power generation big data preprocess method according to claim 1 based on big data analysis platform, which is characterized in that After the completion of the step S2, the method also includes:
S3: detection one-dimensional noise simultaneously replaces exceptional value.
4. the power generation big data preprocess method according to claim 3 based on big data analysis platform, which is characterized in that The step S3 includes: to detect one-dimensional noise using box traction substation method, chooses the upper quartile of sample sorting data and 1.5 times The sum of upper and lower quartile difference is the upper limit as health data, using lower quartile and 1.5 times of quartile difference up and down Difference be lower limit as health data;The exceptional value detected is replaced using linear interpolation processing method.
5. the power generation big data preprocess method according to claim 1 based on big data analysis platform, which is characterized in that The method also includes:
S4: according to the load variations of generating set, judging whether generating set is in steady working condition, deletes generating set and is in not Operation data when steady working condition.
6. the power generation big data preprocess method according to claim 5 based on big data analysis platform, which is characterized in that In the S4, the operating condition by the generating set in load up and load down is determined as unstable period.
7. the power generation big data preprocess method according to any one of claim 1 to 6 based on big data analysis platform, It is characterized in that, the method also includes:
S5: it detects the local outlier in operation data and filters deletion.
8. the power generation big data preprocess method according to claim 7 based on big data analysis platform, which is characterized in that
The step S5 includes: to calculate the parts of all the points using the local LOF algorithm based on KNN to peel off the factor, according to institute The part for stating a little peels off the factor, judges whether be a little abnormal point;When point is abnormal point, filtering is deleted.
9. the power generation big data preprocess method according to claim 8 based on big data analysis platform, which is characterized in that Operation data in the step S5 includes: to all refer to the operation data of target operation data and load and net coal consumption rate.
10. a kind of power generation big data pretreatment system based on big data analysis platform, including memory, processor and storage On a memory and the computer program that can run on a processor, which is characterized in that the processor executes the computer The step of any the method for the claims 1 to 9 is realized when program.
CN201810989231.8A 2018-08-28 2018-08-28 Big data analysis platform-based power generation big data preprocessing method and system Active CN109446184B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810989231.8A CN109446184B (en) 2018-08-28 2018-08-28 Big data analysis platform-based power generation big data preprocessing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810989231.8A CN109446184B (en) 2018-08-28 2018-08-28 Big data analysis platform-based power generation big data preprocessing method and system

Publications (2)

Publication Number Publication Date
CN109446184A true CN109446184A (en) 2019-03-08
CN109446184B CN109446184B (en) 2020-04-14

Family

ID=65530089

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810989231.8A Active CN109446184B (en) 2018-08-28 2018-08-28 Big data analysis platform-based power generation big data preprocessing method and system

Country Status (1)

Country Link
CN (1) CN109446184B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188094A (en) * 2019-05-29 2019-08-30 国网山东省电力公司电力科学研究院 A kind of main transformer oil chromatography data cleaning method based on LOF algorithm
CN110443376A (en) * 2019-08-30 2019-11-12 中国南方电网有限责任公司超高压输电公司贵阳局 State analysis method and its application module based on non-supervisory machine learning algorithm
CN112528558A (en) * 2020-12-04 2021-03-19 湘潭大学 Underground gas concentration prediction method and device based on long-term and short-term memory neural network
CN114236448A (en) * 2021-11-23 2022-03-25 国网山东省电力公司日照供电公司 Metering device troubleshooting system based on big data
CN116166655A (en) * 2023-04-25 2023-05-26 尚特杰电力科技有限公司 Big data cleaning system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708180A (en) * 2012-05-09 2012-10-03 北京华电天仁电力控制技术有限公司 Data mining method in unit operation mode based on real-time historical library
CN104574212A (en) * 2015-01-09 2015-04-29 南京南瑞集团公司 Hydraulic power plant comprehensive data analysis method
CN106677996A (en) * 2016-12-29 2017-05-17 科诺伟业风能设备(北京)有限公司 Method for detecting vibration anomaly of tower drum of wind generating set
CN106897941A (en) * 2017-01-03 2017-06-27 北京国能日新***控制技术有限公司 A kind of blower fan method for processing abnormal data and device based on quartile box traction substation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102708180A (en) * 2012-05-09 2012-10-03 北京华电天仁电力控制技术有限公司 Data mining method in unit operation mode based on real-time historical library
CN104574212A (en) * 2015-01-09 2015-04-29 南京南瑞集团公司 Hydraulic power plant comprehensive data analysis method
CN106677996A (en) * 2016-12-29 2017-05-17 科诺伟业风能设备(北京)有限公司 Method for detecting vibration anomaly of tower drum of wind generating set
CN106897941A (en) * 2017-01-03 2017-06-27 北京国能日新***控制技术有限公司 A kind of blower fan method for processing abnormal data and device based on quartile box traction substation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵一凡等: "数据清洗方法研究综述", 《软件导刊》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188094A (en) * 2019-05-29 2019-08-30 国网山东省电力公司电力科学研究院 A kind of main transformer oil chromatography data cleaning method based on LOF algorithm
CN110443376A (en) * 2019-08-30 2019-11-12 中国南方电网有限责任公司超高压输电公司贵阳局 State analysis method and its application module based on non-supervisory machine learning algorithm
CN110443376B (en) * 2019-08-30 2024-05-17 中国南方电网有限责任公司超高压输电公司贵阳局 State analysis method based on non-supervision machine learning algorithm and application module thereof
CN112528558A (en) * 2020-12-04 2021-03-19 湘潭大学 Underground gas concentration prediction method and device based on long-term and short-term memory neural network
CN114236448A (en) * 2021-11-23 2022-03-25 国网山东省电力公司日照供电公司 Metering device troubleshooting system based on big data
CN116166655A (en) * 2023-04-25 2023-05-26 尚特杰电力科技有限公司 Big data cleaning system

Also Published As

Publication number Publication date
CN109446184B (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN109446184A (en) Power generation big data preprocess method and system based on big data analysis platform
CN106101121B (en) A kind of all-network flow abnormity abstracting method
CN106094744B (en) Based on the determination method of thermoelectricity factory owner's operating parameter desired value of association rule mining
CN105160038B (en) Data analysis method and system based on audit database
CN105677791B (en) For analyzing the method and system of the operation data of wind power generating set
CN106779200A (en) Based on the Wind turbines trend prediction method for carrying out similarity in the historical data
CN112181758B (en) Fault root cause positioning method based on network topology and real-time alarm
CN107292502B (en) Power distribution network reliability assessment method
CN108319131B (en) Unit peak regulation capacity evaluation method based on data mining
CN110297207A (en) Method for diagnosing faults, system and the electronic device of intelligent electric meter
CN108011367A (en) A kind of Characteristics of Electric Load method for digging based on depth decision Tree algorithms
CN111092442A (en) Hydroelectric generating set multi-dimensional vibration region fine division method based on decision tree model
CN105630797B (en) Data processing method and system
CN115015683B (en) Cable production performance test method, device, equipment and storage medium
CN108491991A (en) Constraints analysis system based on the industrial big data product duration and method
CN116914917A (en) Big data-based monitoring and management system for operation state of power distribution cabinet
CN109902133B (en) Multi-source data error correction processing method and system based on arbitrary partition area of power grid
CN109299201B (en) Power plant production subsystem abnormity monitoring method and device based on two-stage clustering
CN106097138A (en) A kind of electricity consumption anomaly data detection System and method for based on statistical model
CN115409120A (en) Data-driven-based auxiliary user electricity stealing behavior detection method
CN109299080B (en) Cleaning method for power production operation data and computing equipment
CN113726558A (en) Network equipment flow prediction system based on random forest algorithm
CN116522111A (en) Automatic diagnosis method for remote power failure
CN116596120A (en) Variable working condition degradation trend prediction method and device for pumped storage unit
CN107622251B (en) Method and device for extracting signal degradation characteristics of aircraft fuel pump

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Power generation big data preprocessing method and system based on big data analysis platform

Effective date of registration: 20210923

Granted publication date: 20200414

Pledgee: Huarong Xiangjiang Bank Co.,Ltd. Xiangjiang New Area Branch

Pledgor: Hunan Datang Xianyi Technology Co.,Ltd.

Registration number: Y2021430000057

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20221012

Granted publication date: 20200414

Pledgee: Huarong Xiangjiang Bank Co.,Ltd. Xiangjiang New Area Branch

Pledgor: Hunan Datang Xianyi Technology Co.,Ltd.

Registration number: Y2021430000057