CN108062387A - A kind of real time data cleaning and conversion method towards TAS systems - Google Patents
A kind of real time data cleaning and conversion method towards TAS systems Download PDFInfo
- Publication number
- CN108062387A CN108062387A CN201711338916.8A CN201711338916A CN108062387A CN 108062387 A CN108062387 A CN 108062387A CN 201711338916 A CN201711338916 A CN 201711338916A CN 108062387 A CN108062387 A CN 108062387A
- Authority
- CN
- China
- Prior art keywords
- data
- real time
- tas
- database
- conversion method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24561—Intermediate data storage techniques for performance improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of real time data cleaning towards TAS systems and conversion methods, the data source of TAS system datas is merged, obtain data to be cleaned, data to be cleaned are handled, to obtain clean data, clean data are stored in amalgamation database, cleans and converts to realize real time data so as to fulfill the data sharing between multiple systems;The present invention is by by process of the TAS system datas Jing Guo extraction/conversion/loading, so that during being shown in front end, it is no longer necessary to obtain data from respective autonomous system, substantially reduce query time, the actual effect of data display is greatly improved, configuration is more flexible simultaneously, also more convenient for the management of metadata, in data handling, so that the transmission of data is relatively reliable, increase security, meet higher requirement, realize efficiently fusion, reduce synchronization time, improve timeliness.
Description
Technical field
The present invention relates to data cleansings and database modeling technique field, specifically a kind of reality towards TAS systems
When data cleansing and conversion method.
Background technology
Data are the valuable sources of modern enterprise, are that enterprise uses scientific management, the basis of Analysis of Policy Making.At present, electric power
Enterprise spends substantial amounts of fund and time to build the operation system of Transaction Processing OLTP and data decision system, for remembering
Record issued transaction and the various related datas of device acquisition.According to statistics, data volume every 2~3 year will be doubled and redoubled, this
A little data contain huge application value, and enterprise is of interest usually only accounts for 2%~4% or so of total amount of data.
It is main to realize during data cleansing, it is necessary to using ET (Extract-Transform-Load) technology:1. disappeared according to rule
Except data error correcting missing data;It is merged 2. multi-source data is realized, structuring input;3. it provides for data reliability
Documenting is weighed;The general technical tool of generally use in industry, but can there are problems that security and flexibility, and pass through
The data cleansing of the present invention and transfer algorithm mode, on the basis of problem above is solved, are provided simultaneously with:1. convenient for directly managing
Metadata;2. real-time synchronization, efficiently convenient.
Based on this, for limitation present in above-mentioned present situation, the present invention proposes one kind and enables data to efficiently melt
It closes, reduce synchronization time, the real time data towards TAS systems of raising timeliness is cleaned and conversion method.
The content of the invention
In order to solve above-mentioned power industry in the prior art to the highers such as real time data synchronization and data validity, security
It is required that occasion problem, the present invention, which proposes, a kind of to be enabled data to efficiently fusion, reduces synchronization time, improves timeliness
Real time data cleaning and conversion method towards TAS systems.
The technical solution adopted by the present invention to solve the technical problems is:
The data source of TAS system datas is merged, obtained by a kind of real time data cleaning and conversion method towards TAS systems
Data to be cleaned are taken, data to be cleaned are handled, to obtain clean data, clean data are stored in amalgamation database
In, so as to fulfill the data sharing between multiple systems to realize real time data cleaning and convert, specific steps include:
Step 1, by ETL system by the data for being deployed in different server, database and non-structured data root
It is extracted according to specified rule, by the data storage of extraction to interlayer ODS;
The data of ODS are carried out mistake layer by layer by step 2 in a manner that design object or cleaning are submitted or standardization is submitted
Filter, obtains clean data;
Step 3:Clean data are inserted or updated into amalgamation database according to the business rule of design.
Further, the step 1 is specially:
101st, crawl data service module is set in ETL system, crawl data service module is obtained according to profile information
The IP address of different server is taken, the application being then attached in each server;
102nd, data are extracted from each server according to the rule specified, by the data storage of extraction to interlayer ODS.
Further, the data source in the step 1 includes odbc database structured data source, flat file, XML
Data source and daily record.
Further, the mode filtered in the step 2 is to obtain equipment current state by the application in server, if
The equipment is run mode, then captures corresponding data, if on the contrary, the equipment for it is deactivated, the states such as remove, abandon data, not into
Enter amalgamation database.
Further, the data of discarding are the data of incomplete data, the data of mistake and repetition, the mistake
Data are repaired after discarding, are extracted again after reparation.
Further, detailed process inserted or updated in the step 3 is:
Clean data are compared with CRC code, it is inserted or updated into amalgamation database if identical, if it is different,
CRC match is then carried out, will be legal inserted or updated into amalgamation database by the prompting in the system of source, do not meet rule
Ignoring then.
Further, process inserted or updated in the step 3 specifically further includes:
The field changed in the different data of CRC code is subjected to covering treatment, if can cover, may be inserted into or update and arrive
In amalgamation database, if cannot cover, the dimensional attribute updated the data is inserted into after being repaired or updates to fused data
In storehouse.
Compared with prior art, the beneficial effects of the invention are as follows:The present invention is extracted/turned by the way that TAS system datas are passed through
The process changed/loaded so that during being shown in front end, it is no longer necessary to obtain data from respective autonomous system, greatly shorten
Query time, the actual effect of data display greatly improved, at the same configure it is more flexible, for metadata management also more
It is convenient, in data handling so that the transmission of data is relatively reliable, increases security, meets higher requirement, realizes efficiently
Fusion reduces synchronization time, improves timeliness.
Description of the drawings
Fig. 1 is data pick-up logical architecture figure in first embodiment of the invention;
Fig. 2 is data filtering logical architecture figure in second embodiment of the invention;
Fig. 3 is data load logic Organization Chart in third embodiment of the invention.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
A kind of real time data cleaning and conversion method towards TAS systems of the present invention, by the number of TAS system datas
It is merged according to source, obtains data to be cleaned, data to be cleaned are handled, to obtain clean data, clean data are deposited
Enter in amalgamation database, so as to fulfill the data sharing between multiple systems to realize real time data cleaning with converting, specific steps
Including:
Step 1, by ETL system by the data for being deployed in different server, database and non-structured data root
It is extracted according to specified rule, by the data storage of extraction to interlayer ODS;
The data of ODS are carried out mistake layer by layer by step 2 in a manner that design object or cleaning are submitted or standardization is submitted
Filter, obtains clean data;
Step 3:Clean data are inserted or updated into amalgamation database according to the business rule of design.
Further, the step 1 is specially:
101st, crawl data service module is set in ETL system, crawl data service module is obtained according to profile information
The IP address of different server is taken, the application being then attached in each server;
102nd, data are extracted from each server according to the rule specified, by the data storage of extraction to interlayer ODS.
When it is implemented, data source in the step 1 include odbc database structured data source, flat file,
XML data source and daily record.
When it is implemented, the mode filtered in the step 2 is to obtain equipment current state by the application in server,
If the equipment is run mode, capture corresponding data, if on the contrary, the equipment for it is deactivated, the states such as remove, abandon data, no
Into amalgamation database.
When it is implemented, the data abandoned are incomplete data, the data of mistake and the data repeated, the mistake
Data are repaired after discarding, are extracted again after reparation.
When it is implemented, detailed process inserted or updated in the step 3 is:
Clean data are compared with CRC code, it is inserted or updated into amalgamation database if identical, if it is different,
CRC match is then carried out, will be legal inserted or updated into amalgamation database by the prompting in the system of source, do not meet rule
Ignoring then.
When it is implemented, process inserted or updated in the step 3 specifically further includes:
The field changed in the different data of CRC code is subjected to covering treatment, if can cover, may be inserted into or update and arrive
In amalgamation database, if cannot cover, the dimensional attribute updated the data is inserted into after being repaired or updates to fused data
In storehouse.
As one embodiment of the present invention, on the basis of said program, the logical architecture of data pick-up in step 1
As shown in Figure 1, SysDataMergeSrv representatives are provided with the module of crawl data services, server 1, server n is represented
N server of field deployment, RTSrv, HisSrv, DBSrv represent each attendant application disposed in presence server
Module.
Wherein, RTSrv is a kind of memory database service, is mainly used for storing the real time information of live FTU equipment, including
The information such as load, voltage;HisSrv is mainly used for storing fault message, including failure-description, failure drawing etc.;DBSrv is main
For storing some fix informations of live FTU equipment, such as shaft tower number, circuit, substation etc..
Data source is divided into odbc database structured data source, flat file, XML data source and daily record data source.
Wherein ODBC structured data sources in data handling, mainly solve cross-server and the SQL server of system
In varying environment in data.
Flat file is in use, is mainly used for:Transmit data source, part server data Intranet system, using penetrating
Service, by flat file FTP to data set interface, realizes summarizing for data;Using file directly from the I/O of file system read-write
Speed is significantly faster than inserted into and inquiry DBMS systems, while can realize that block loading prepares.
Data extraction process is described mainly as, and it is each to obtain scene according to profile information for SysDataMergeSrv services
Then a server ip address connects the applications such as RTSrv, HisSrv, DBSrv in each server.According to convergence platform number
According to requiring, the real-time operating conditions of equipment are obtained from RTSrv, including load, voltage etc.;The equipment is obtained from DBSrv
Line information, including information such as circuit name, shaft tower number, substations;Obtain fault-signal from HisSrv, failure-description information,
Then by finish message, write according to database bottom table structure in database.
As second embodiment of the present invention, on the basis of said program, cleaning filter process mainly passes through RTSrv
Equipment current state is obtained, if the equipment is run mode, captures corresponding data, if on the contrary, the equipment is the shapes such as deactivated, dismounting
State then abandons data, does not enter amalgamation database.
Main wash rule is:
A, incomplete data, it is characterized in that be some due loss of learning, such as the title of device, each branch company
Main table cannot be matched with detail list in the loss of learning such as title, TAS systems;
B, the data of mistake, producing cause are that operation system is not well established, do not carry out judging directly after input is received
Caused by writing background data base, for example numeric data is defeated that into behind full-shape numerical character, string data, there are one carriage return, days
Phase form is incorrect, the date crosses the border.This kind of data will also classify, and have not face for being similar to before and after double byte character, data
The mode that the problem of seeing character can only write SQL is found out, and client is then required to be extracted after operation system is corrected;Date format
This kind of mistake that the incorrect or date crosses the border can cause ETL operation failures, this kind of mistake needs operation system
Database is picked out with the mode of SQL, is given competent business department's requirement time limit amendment, is extracted again after correcting;
C, it is relatively common in the data repeated, particularly dimension table, all fields that record of the data repeated are exported, are allowed
Client confirms and arranges.
Data cleansing is a process repeatedly, finally obtains clean data.
The logical architecture of data filtering defines as shown in Fig. 2, being filtered device to data quality meta, while will be each
It criticizes filter to run to obtain each filter, filter is performed to clustering data in the ODS of interlayer, is obtained by quality metric
Error event fact table, by the quality dimensions from batch processing whether be fatal error judgement, then stop if YES
Only, if NO, then audit dimension record is created, audit dimensional database is obtained, continues ETL.
As the 3rd embodiment of the present invention, on the basis of said program, the logical architecture of data loading in step 3
As shown in figure 3, the source data of extraction is carried out CRC code comparison, if identical, dimension entity is obtained to get to the clear of needs
The data washed and converted, update to amalgamation database, if CRC is different, pass through:
A, by source system new record, assignment agent key sets prompting date indicator, is inserted into dimension entity, i.e.,
Amalgamation database after being inserted into;
B, carry out CRC match, by be not inconsistent normally ignore;
C, variation field is obtained, carries out covering treatment, if can cover, assignment agent key and setting date prompt device, symbol
It is normally then inserted into amalgamation database, is not inconsistent after normally then updating most new keys and updates into amalgamation database.
Update distribution most new keys after the dimension entity needed by above-mentioned three kinds of methods, complete real time data cleaning
With conversion method.
Original system of the present invention in the detailed loading of 10W datas, it is necessary to 20.16S, by the ETL algorithm process it
Afterwards, loading is completed only to need 0.98S, and the actual effect of data display is greatly improved, while configuration is more flexible, for metadata
Management it is also more convenient, in data handling so that the transmission of data is relatively reliable, increase security, meet it is higher will
It asks.
Basic principle, main feature and the advantages of the present invention of the present invention has been shown and described above.The technology of the industry
Personnel are it should be appreciated that the present invention is not limited to the above embodiments, and the above embodiments and description only describe this
The principle of invention, without departing from the spirit and scope of the present invention, various changes and modifications of the present invention are possible, these changes
Change and improvement all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and its
Equivalent thereof.
Claims (7)
1. a kind of real time data cleaning and conversion method towards TAS systems, which is characterized in that by the data of TAS system datas
Source is merged, and obtains data to be cleaned, data to be cleaned are handled, to obtain clean data, clean data are stored in
In amalgamation database, so as to fulfill the data sharing between multiple systems to realize real time data cleaning with converting, specific steps bag
It includes:
Step 1, by ETL system by be deployed in different server, database data and non-structured data according to finger
Fixed rule is extracted, by the data storage of extraction to interlayer ODS;
The data of ODS are carried out layering by step 2 in a manner that design object or cleaning are submitted or standardization is submitted, and are obtained
To clean data;
Step 3:Clean data are inserted or updated into amalgamation database according to the business rule of design.
2. a kind of real time data cleaning and conversion method towards TAS systems according to claim 1, which is characterized in that
The step 1 is specially:
101st, crawl data service module is set in ETL system, crawl data service module obtains not according to profile information
With the IP address of server, the application being then attached in each server;
102nd, data are extracted from each server according to the rule specified, by the data storage of extraction to interlayer ODS.
3. a kind of real time data cleaning and conversion method towards TAS systems according to claim 1, it is characterised in that:
Data source in the step 1 includes odbc database structured data source, flat file, XML data source and daily record.
4. a kind of real time data cleaning and conversion method towards TAS systems according to claim 1, it is characterised in that:
The mode filtered in the step 2 is to obtain equipment current state by the application in server, if the equipment is run mode,
Capture corresponding data, if on the contrary, the equipment for it is deactivated, the states such as remove, abandon data, do not enter amalgamation database.
5. a kind of real time data cleaning and conversion method towards TAS systems according to claim 4, it is characterised in that:
The data of discarding are incomplete data, the data of mistake and the data repeated, the data of the mistake are repaiied after discarding
It is multiple, it is extracted again after reparation.
6. a kind of real time data cleaning and conversion method towards TAS systems according to claim 1, which is characterized in that
Inserted or updated detailed process is in the step 3:
Clean data are compared with CRC code, it is inserted or updated into amalgamation database if identical, if it is different, then into
Row CRC match, will be legal inserted or updated into amalgamation database by the prompting in the system of source, is not inconsistent normally
Ignore.
7. a kind of real time data cleaning and conversion method towards TAS systems according to claim 6, which is characterized in that
Inserted or updated process specifically further includes in the step 3:
The field changed in the different data of CRC code is subjected to covering treatment, if can cover, may be inserted into or update to fusion
In database, if cannot cover, the dimensional attribute updated the data is inserted into after being repaired or updates into amalgamation database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711338916.8A CN108062387A (en) | 2017-12-14 | 2017-12-14 | A kind of real time data cleaning and conversion method towards TAS systems |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711338916.8A CN108062387A (en) | 2017-12-14 | 2017-12-14 | A kind of real time data cleaning and conversion method towards TAS systems |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108062387A true CN108062387A (en) | 2018-05-22 |
Family
ID=62138615
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711338916.8A Pending CN108062387A (en) | 2017-12-14 | 2017-12-14 | A kind of real time data cleaning and conversion method towards TAS systems |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108062387A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109582667A (en) * | 2018-10-16 | 2019-04-05 | 中国电力科学研究院有限公司 | A kind of multiple database mixing storage method and system based on power regulation big data |
CN109635350A (en) * | 2018-11-16 | 2019-04-16 | 海南电网有限责任公司电力科学研究院 | A kind of data replacement method that the trend section for emulation automatically generates and system |
CN109977110A (en) * | 2019-04-28 | 2019-07-05 | 杭州数梦工场科技有限公司 | Data cleaning method, device and equipment |
CN111258993A (en) * | 2020-01-09 | 2020-06-09 | 佛山科学技术学院 | Method and device for filtering abnormal data of industrial big data |
CN112463737A (en) * | 2020-11-17 | 2021-03-09 | 中科金审(北京)科技有限公司 | System and method for rapidly acquiring data aiming at multi-format data intelligent matching template |
CN112732696A (en) * | 2021-01-21 | 2021-04-30 | 中科三清科技有限公司 | Data cleaning method and device applied to atmospheric environment monitoring and storage medium |
CN113158233A (en) * | 2021-03-29 | 2021-07-23 | 重庆首亨软件股份有限公司 | Data preprocessing method and device and computer storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722582A (en) * | 2012-06-07 | 2012-10-10 | 陈浩 | System and method for integrating data on basis of reverse clearing |
US20170124164A1 (en) * | 2014-06-23 | 2017-05-04 | International Business Machines Corporation | Etl tool interface for remote mainframes |
CN106951442A (en) * | 2017-02-15 | 2017-07-14 | 中国保险信息技术管理有限责任公司 | Data interactive method and device between a kind of heterogeneous database |
-
2017
- 2017-12-14 CN CN201711338916.8A patent/CN108062387A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722582A (en) * | 2012-06-07 | 2012-10-10 | 陈浩 | System and method for integrating data on basis of reverse clearing |
US20170124164A1 (en) * | 2014-06-23 | 2017-05-04 | International Business Machines Corporation | Etl tool interface for remote mainframes |
CN106951442A (en) * | 2017-02-15 | 2017-07-14 | 中国保险信息技术管理有限责任公司 | Data interactive method and device between a kind of heterogeneous database |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109582667A (en) * | 2018-10-16 | 2019-04-05 | 中国电力科学研究院有限公司 | A kind of multiple database mixing storage method and system based on power regulation big data |
CN109635350A (en) * | 2018-11-16 | 2019-04-16 | 海南电网有限责任公司电力科学研究院 | A kind of data replacement method that the trend section for emulation automatically generates and system |
CN109635350B (en) * | 2018-11-16 | 2024-01-05 | 海南电网有限责任公司电力科学研究院 | Data replacement method and system for automatically generating simulated power flow section |
CN109977110A (en) * | 2019-04-28 | 2019-07-05 | 杭州数梦工场科技有限公司 | Data cleaning method, device and equipment |
CN112199366A (en) * | 2019-04-28 | 2021-01-08 | 杭州数梦工场科技有限公司 | Data table processing method, device and equipment |
CN111258993A (en) * | 2020-01-09 | 2020-06-09 | 佛山科学技术学院 | Method and device for filtering abnormal data of industrial big data |
CN112463737A (en) * | 2020-11-17 | 2021-03-09 | 中科金审(北京)科技有限公司 | System and method for rapidly acquiring data aiming at multi-format data intelligent matching template |
CN112732696A (en) * | 2021-01-21 | 2021-04-30 | 中科三清科技有限公司 | Data cleaning method and device applied to atmospheric environment monitoring and storage medium |
CN112732696B (en) * | 2021-01-21 | 2022-02-08 | 中科三清科技有限公司 | Data cleaning method and device applied to atmospheric environment monitoring and storage medium |
CN113158233A (en) * | 2021-03-29 | 2021-07-23 | 重庆首亨软件股份有限公司 | Data preprocessing method and device and computer storage medium |
CN113158233B (en) * | 2021-03-29 | 2023-06-27 | 重庆首亨软件股份有限公司 | Data preprocessing method and device and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108062387A (en) | A kind of real time data cleaning and conversion method towards TAS systems | |
CN107958057B (en) | Code generation method and device for data migration in heterogeneous database | |
EP3513314B1 (en) | System for analysing data relationships to support query execution | |
CN103460208B (en) | For loading data into the method and system of temporal data warehouse | |
KR101117244B1 (en) | Method and system for linking business entities | |
CN102236672B (en) | A kind of data lead-in method and device | |
CN110321113B (en) | Integrated assembly line system taking project batches as standards and working method thereof | |
CN104272247B (en) | Metadata-driven undo method and system | |
CN107003935A (en) | Optimize database duplicate removal | |
CN102426587B (en) | Method for customizing and inquiring heterogeneous BOM (Bill of Materiel) based on complex product | |
US20100280990A1 (en) | Etl for process data warehouse | |
CN100578498C (en) | Data integral service system and method | |
CN111400354B (en) | Machine tool manufacturing BOM (Bill of Material) storage query and tree structure construction method based on MES (manufacturing execution System) | |
CN104102652A (en) | Unstructured data storage system and method | |
CN111563130A (en) | Data credible data management method and system based on block chain technology | |
CN107657049A (en) | A kind of data processing method based on data warehouse | |
CN104636337B (en) | A kind of data cleansing storage method for value-added tax | |
CN106503158A (en) | Method of data synchronization and device | |
WO2019076001A1 (en) | Information updating method and device | |
CN107729448A (en) | A kind of data handling system based on data warehouse | |
CN106802905A (en) | A kind of synergistic data exchange method of isomorphism PLM system | |
CN111125069B (en) | Data cleaning fusion system | |
CN106155838A (en) | A kind of database back-up data restoration methods and device | |
CN109308290A (en) | A kind of efficient data cleaning conversion method based on CIM | |
CN109871378A (en) | The data acquisition and processing (DAP) method and system of big data platform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180522 |