CN109522746A - A kind of data processing method, electronic equipment and computer storage medium - Google Patents

A kind of data processing method, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN109522746A
CN109522746A CN201811323448.1A CN201811323448A CN109522746A CN 109522746 A CN109522746 A CN 109522746A CN 201811323448 A CN201811323448 A CN 201811323448A CN 109522746 A CN109522746 A CN 109522746A
Authority
CN
China
Prior art keywords
data
rule
mentioned
project
normalisation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811323448.1A
Other languages
Chinese (zh)
Inventor
肖涌川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ping An Medical Health Technology Service Co Ltd
Original Assignee
Ping An Medical and Healthcare Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Medical and Healthcare Management Co Ltd filed Critical Ping An Medical and Healthcare Management Co Ltd
Priority to CN201811323448.1A priority Critical patent/CN109522746A/en
Publication of CN109522746A publication Critical patent/CN109522746A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application discloses a kind of data processing method, electronic equipment and computer storage medium, is related to data processing and data normalization technology, and wherein method includes: to obtain the data label that initial data and the initial data carry;Desensitization process is carried out to the initial data, obtains target data;The corresponding normalisation rule of the data label is obtained, the normalisation rule includes integrality investigation rule and format convention;According to the normalisation rule processing target data, obtain the normal data for meeting integrality the investigation rule and the format convention, the efficiency and accuracy of data processing can be improved to carry out subsequent data storage or treatment process convenient for standardized data.

Description

A kind of data processing method, electronic equipment and computer storage medium
Technical field
This application involves technical field of data processing more particularly to a kind of data processing methods, electronic equipment and computer Storage medium.
Background technique
Big data has been widely regarded as a kind of strategic new type resource, can define the magnanimity of current era generation Data and the development of relevant technology are innovated with service.Big data contains huge commercial value.In the data field of circulation, with Data volume increase rapidly, the qualities of data of different data sources ginseng time is uneven, and the quality of data has different concept and standard, data Quality generally comprises the quantitative descriptions such as accuracy, integrality, timeliness, the consistency of data.For the use under big data era For family, needed for storage, processing data volume it is big, data source and the various complexity of data structure, be big data analysis and Using bringing many challenges.User to give full play to the opportunity and advantage that big data is assigned, on condition that must possess it is reliable, Accurately, timely data of high quality extract implicit, useful information, ability only from the large-scale data of high quality Make decision that is more accurate, being more in line with market and customer demand.For this purpose, user more focuses on the quality of data and its important Property.
In medical health field, require to handle a large amount of medical insurance data in all respects, therefore data processing still needs Under the premise of guaranteeing accuracy, treatment effeciency is improved.Data source itself is complicated, it is understood that there may be certain error, and its Data type, format etc. may differ greatly, therefore analysis knot cannot be accurately obtained during data processing and analysis Fruit, and will lead to system and error or even mistake occur, the stability and treatment effeciency of data processing are lower.
Summary of the invention
The embodiment of the present application provides a kind of data processing method, electronic equipment and computer storage medium, is related at data Reason and data normalization technology, readily available authority data can be improved with carrying out subsequent data storage or treatment process The efficiency and accuracy of data processing.
In a first aspect, the embodiment of the present application provides a kind of data processing method, this method comprises:
Obtain the data label that initial data and the initial data carry;
Desensitization process is carried out to the initial data, obtains target data;
The corresponding normalisation rule of the data label is obtained, the normalisation rule includes integrality investigation rule and lattice Formula rule;
According to the normalisation rule processing target data, acquisition meets the integrality investigation rule and the lattice The normal data of formula rule.
It as a kind of possible embodiment, include the feelings of the integrality investigation rule in response to the normalisation rule Condition, described to include according to the normalisation rule processing target data:
It detects whether the project in the target data has vacant position, if having vacant position, generates the report comprising the vacancy project It accuses, exports the report to prompt to carry out completion to the vacancy project.
As a kind of possible embodiment, the normalisation rule further includes repeating investigation rule, described in the foundation The normalisation rule processing target data further include:
The similarity of the project in the target data is detected, if the similarity is higher than first threshold, judges the item Mesh is duplicated project, merges the data of the duplicated project or the data of one of the deletion duplicated project.
As a kind of possible embodiment, include the case where the format convention, institute in response to the normalisation rule It states according to the normalisation rule processing target data further include:
Detect item field in the target data whether with template field matches, if mismatching, according to the format The unmatched item field is converted to the aiming field with the template field matches by rule.
As a kind of possible embodiment, the method also includes:
If detecting the described and unmatched item field of template field, there are wrong data, generation error records;
The project category for determining the item field, searches that the project category is corresponding and institute in amendment database State the matched amendment data of item field, prompt information of the output comprising the amendment data.
As a kind of possible embodiment, the acquisition meets the integrality investigation rule and the format convention After normal data, the method also includes:
Obtain the corresponding pre-stored data template of the data label, by the normal data and the pre-stored data template into Row comparing obtains data scoring;
Judging, whether the data scoring is higher than the first score threshold and the normal data packet is stored if being higher than.
As a kind of possible embodiment, the acquisition meets the integrality investigation rule and the format convention After normal data, the method also includes:
Data Detection report is generated, the Data Detection report includes the Data Detection moment of the initial data, data Repetitive rate, data qualification rate and/or the content that the target data is handled according to the normalisation rule, wherein the data Repetitive rate is the ratio of the repeated data in the target data, and the data qualification rate is to mark described in the target data The ratio of quasi- data.
Second aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: obtain module, desensitization module and data Processing module, in which:
The acquisition module, the data label carried for obtaining initial data and the initial data;
The desensitization module obtains target data for carrying out desensitization process to the initial data;
The acquisition module is also used to, and obtains the corresponding normalisation rule of the data label, the normalisation rule packet Include integrality investigation rule and format convention;
The data processing module, described in being met according to the normalisation rule processing target data The normal data of integrality investigation rule and the format convention.
The third aspect, the embodiment of the present application also provides a kind of electronic equipment, comprising: processor, input equipment, output are set Standby and memory, the processor, input equipment, output equipment and memory are connected with each other, wherein the memory is for depositing Computer program is stored up, the computer program includes program instruction, and the processor is configured for calling described program instruction, Execute the method as described in first aspect and its any possible embodiment.
Fourth aspect, the embodiment of the present application provide a kind of computer storage medium, the computer storage medium storage There is computer program, the computer program includes program instruction, and described program instruction makes the place when being executed by a processor The method that reason device executes above-mentioned first aspect and its any possible embodiment.
The data label that the embodiment of the present application is carried by obtaining initial data and above-mentioned initial data, to above-mentioned original number According to desensitization process is carried out, target data is obtained, then obtains the corresponding normalisation rule of above-mentioned data label, above-mentioned standardization rule Then include integrality investigation rule and format convention, then according to the above-mentioned target data of above-mentioned standard rule process, is met The normal data of above-mentioned integrality investigation rule and above-mentioned format convention, is convenient for standardized data, is deposited with carrying out subsequent data Storage or treatment process, can be improved the efficiency and accuracy of data processing.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in embodiment description Attached drawing is briefly described.
Fig. 1 is a kind of flow diagram of data processing method provided by the embodiments of the present application;
Fig. 2 is a kind of flow diagram for data processing method that another embodiment of the application provides;
Fig. 3 is the structural schematic diagram of a kind of electronic equipment provided by the embodiments of the present application;
Fig. 4 is the structural schematic diagram of another electronic equipment provided by the embodiments of the present application.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Herein " embodiment " is referred to it is meant that a particular feature, structure, or characteristic described may be embodied in the application at least in conjunction with the embodiments In one embodiment.The phrase, which occurs, in each position in the description might not each mean identical embodiment, nor With the independent or alternative embodiment of other embodiments mutual exclusion.Those skilled in the art explicitly and implicitly understand, Embodiment described herein can be combined with other embodiments.
Based on the embodiment in the application, those of ordinary skill in the art are obtained without making creative work The every other embodiment obtained, shall fall in the protection scope of this application.
The description and claims of this application and term " first " in above-mentioned attached drawing, " second " etc. are for distinguishing Different objects, are not use to describe a particular order.In addition, term " includes " and " having " and their any deformations, it is intended that It is to cover and non-exclusive includes.Such as the process, method, system, product or equipment for containing a series of steps or units do not have It is defined in listed step or unit, but optionally further comprising the step of not listing or unit, or optionally also wrap Include other step or units intrinsic for these process, methods, product or equipment.
It is also understood that mesh of the term used in this present specification merely for the sake of description specific embodiment And be not intended to limit the application.As present specification and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in present specification and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
As used in this specification and in the appended claims, term " if " can be according to context quilt Be construed to " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or " if detecting [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to true It is fixed " or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".
In order to more fully understand the embodiment of the present application, will be situated between below to the method for application the embodiment of the present application It continues.
The electronic equipment mentioned in the embodiment of the present application may include the various handheld devices with wireless communication function, vehicle Equipment is carried, wearable device (such as smartwatch, Intelligent bracelet, pedometer etc.), equipment is calculated or is connected to wireless-modulated solution Adjust other processing equipments and various forms of user equipmenies (User Equipment, UE) of device, mobile station (Mobile Station, MS), terminal device (terminal device) etc..For convenience of description, apparatus mentioned above is referred to as electronics Equipment.
It referring to Figure 1, is a kind of schematic flow diagram of data processing method provided by the embodiments of the present application, this method can be with Applied to above-mentioned electronic equipment, this method as shown in Figure 1 can include:
101, the data label that initial data and above-mentioned initial data carry is obtained.
Initial data in the embodiment of the present application can be Medicare data, and medical insurance refers generally to basic medical guarantor Danger, the social security system established due to being to compensate for labourer's economic loss caused by disease risks.Pass through employment Medical Benefits Fund is established in unit and personal payment, and insurant illness is gone to a doctor after generation medical expense, by Medical Insurance Organizations Certain economic compensation is given to it.It can be related to a large amount of medical insurance data processing in above process.
Above-mentioned initial data can be the generation of copy editor's software such as data in all types of documents, such as WORD, EXCEL File in data.
Electronic equipment can be communicated with terminal device, the initial data from terminal device be received, on receiving Step 101 can be immediately performed after initial data by stating, and can store in electronic equipment it is default execute the moment, i.e. user can be with The time of data processing is set in advance, and then above-mentioned steps 101 can be executed at the above-mentioned default execution moment.
It is handled for, standardization quick to data, use carries out above-mentioned initial data in the embodiment of the present application The mode of classification.Above-mentioned initial data can carry different data labels due to source difference, by above-mentioned data label, It can be to above-mentioned initial data classification processing.Above-mentioned data label can be according to data type division, such as Different hospital The data of different-format are reported, to carry out data normalization, establish database;Each format corresponds to a kind of data mark Label, can determine its data format according to above-mentioned data label.
102, desensitization process is carried out to above-mentioned initial data, obtains target data.
Data desensitization in the embodiment of the present application refers to the deformation that certain sensitive informations are carried out with data by desensitization rule, Realize the reliably protecting of privacy-sensitive data.In the case where being related to client secure data or some commercial sensitive datas, Under the conditions of not violating system convention, truthful data is transformed and test use is provided, such as identification card number, cell-phone number, card Number, the personal information such as customer ID require to carry out data desensitization.One of Database security technology, Database security technology are mainly wrapped It includes: database drain sweep, data base encryption, database firewall, data desensitization, database security auditing system.Database security Risk includes: to drag library, brush library, hit library.
Desensitization rule can be preset in the embodiment of the present application, i.e. setting takes off certain certain types of data It is quick.Above-mentioned desensitization rule can be set according to the actual situation, select the data class to desensitize.In the embodiment of the present application Main desensitization rule may include: name, identification card number, address information, telephone number.
In medical insurance field by data desensitization process, Information Security can be improved, ensure privacy of user.
Step 103 can be executed after desensitization, target data can carry out specific standardization.
103, the corresponding normalisation rule of above-mentioned data label is obtained.
Before data analysis, we are usually required first by data normalization (normalization), after standardization Data carry out data analysis.The indexation of data normalization i.e. statistical data.Data normalization processing mainly includes number It is handled according to same chemotactic and nondimensionalization handles two aspects.Data mainly solve heterogeneity data problem with chemotactic processing, right Heterogeneity index directly adds up not the synthesis result that can correctly reflect different role power, must first consider that change inverse indicators is Statistical Matter makes all indexs to the same chemotactic of the active force of Te st grogram, then adds up and can just obtain correct result.The processing of data nondimensionalization Mainly solve the comparativity of data.There are many kinds of the methods of data normalization, and there are commonly " minimums-maximum standardization ", " Z- Score standardization " and " by decimal calibration standardization " etc..It is handled by above-mentioned standardization, initial data is converted into dimensionless Change index assessment value, i.e., each index value can carry out comprehensive test analysis all in the same number of levels.
Mainly being continuously increased due to data volume, originally data structure may be unreasonable, are not able to satisfy wanting for various aspects It asks.By the replacement of database, the replacement of data structure, to need the conversion of data itself.
Transcoded data is called data conversion (data transfer), is that data are become another from a kind of representation The process of the form of expression.Data transmitting in medicare system, there are the replacements of database, the replacement of data structure, to need Want the conversion of data itself.
The corresponding relationship of above-mentioned data label and normalisation rule can be understood as data conversion standard, pass through above-mentioned data Transfer standard realizes the transcoded data of specification.
Data conversion standard includes the planning that a whole set of encodes data by field, record and documentation requirements, so as to It is converted by specified medium.Data model is the prerequisite for developing coding rule, and the intermediary nature of transfer standard is A kind of main feature.All data can be made there won't be any problem effectively to communicate after transfer standard optimization, and to product and database knot Structure then can effectively be stored after optimizing, application and maintenance.
It can store the corresponding relationship of data label and normalisation rule, the i.e. number of different data label in electronic equipment According to using different normalisation rules to be handled.By data label, can determine needs normalisation rule to be used, in turn The normalisation rule is obtained, step 104 can be executed.
104, according to the above-mentioned target data of above-mentioned standard rule process, it is regular and upper that acquisition meets above-mentioned integrality investigation State the normal data of format convention.
Since the source of initial data is different, transcoded data process can be targetedly executed.
Specifically, the normalisation rule being standardized to data has been stored in advance in electronic equipment, can have more Kind rule, and it is stored with the corresponding relationship of data label and normalisation rule, above-mentioned data label and standardization can be passed through The corresponding relationship of rule obtains the corresponding normalisation rule of data label of initial data carrying, reuses above-mentioned standard Rule executes transcoding process.
Data cleansing (Data cleaning), is the process that data are examined and verified again, it is therefore intended that is deleted Mistake existing for duplicate message, correction, and data consistency is provided.
Above-mentioned standard rule may include integrality investigation rule, for detecting the integrality of target data, Ke Yijian The missing for finding data is omitted, and can carry out completion according to normalisation rule to the data that missing is omitted.
Above-mentioned standard rule may include repeating to check rule, for detecting the repeated data in target data, and It can be with deleting duplicated data.
Above-mentioned standard rule may include format convention, the format of target data can be united according to above-mentioned format convention One.
Above-mentioned standard rule can also include the processing to wrong data, and mistake Producing reason may be operation system Not well established, caused by not carrying out judgement after receiving input and writing direct background data base, for example numeric data is defeated helps Have that a carriage return operation, date format be incorrect, the date crosses the border behind angle numerical character, string data.This kind of data Also to classify, it, can only be by writing the side of SQL statement for being similar to double byte character, having the problem of invisible character before and after data Formula is found out, and client is then required to extract after operation system amendment.Date format is incorrect or the date crosses the border This kind of mistakes will lead to ETL operation failure, and this kind of mistakes can go operation system database to be picked out with the mode of SQL, It gives competent business department and requires time limit amendment, extracted again after amendment.
According to above-mentioned standard rule process target data, mainly carried out by integrality investigation rule and format convention Standardization can obtain the normal data for meeting above-mentioned integrality investigation rule and above-mentioned format convention.
Optionally, after obtaining normal data, follow-up data process flow can be carried out to above-mentioned standard data.
Optionally, above-mentioned standard data can be stored to the first space.
Above-mentioned standard data after standardization can be recognized by the system, and can carry out data analysis using these normal datas Processing, and preset space can be stored to and saved, it is convenient for subsequent query or calling.
The data label that the embodiment of the present application is carried by obtaining initial data and above-mentioned initial data, to above-mentioned original number According to desensitization process is carried out, target data is obtained, then obtains the corresponding normalisation rule of above-mentioned data label, above-mentioned standardization rule Then include integrality investigation rule and format convention, then according to the above-mentioned target data of above-mentioned standard rule process, is met The normal data of above-mentioned integrality investigation rule and above-mentioned format convention, is convenient for standardized data, is deposited with carrying out subsequent data Storage or treatment process, can be improved the efficiency and accuracy of data processing.
It referring to fig. 2, is the schematic flow diagram of another data processing method provided by the embodiments of the present application, it is shown in Fig. 2 Embodiment, which can be, to be obtained on the basis of embodiment shown in Fig. 1, this method as shown in Figure 2 can include:
201, the data label that initial data and above-mentioned initial data carry is obtained.
Above-mentioned steps 201 can be with reference to the specific descriptions in embodiment step 101 shown in FIG. 1, and details are not described herein again.
202, desensitization process is carried out to above-mentioned initial data, obtains target data.
Above-mentioned steps 202 can be with reference to the specific descriptions in embodiment step 102 shown in FIG. 1, and details are not described herein again.
203, the corresponding normalisation rule of above-mentioned data label is obtained.
Above-mentioned standard rule may include integrality investigation rule, repeat to check rule and/or format convention etc., if packet Integrality investigation rule is included, step 204 can be executed;If step 205 can be executed including repeating to check rule;If including lattice Formula rule, can execute step 206.
Wherein, above-mentioned steps 203 can be no longer superfluous herein with reference to the specific descriptions in embodiment step 103 shown in FIG. 1 It states.
204, include the case where integrality investigation rule in response to above-mentioned standard rule, detect in above-mentioned target data Whether project has vacant position, if having vacant position, generates the report comprising above-mentioned vacancy project, exports above-mentioned report to prompt to above-mentioned sky Lacuna mesh carries out completion.
It, can be to number of targets when the corresponding normalisation rule of above-mentioned data label of acquisition includes integrality investigation rule According to integrality checked, can detecte whether the project in above-mentioned target data has vacant position, vacancy can be understood as data Middle to there is the project that do not fill in, this kind of data are mainly some due loss of learning, title, branch company such as supplier Title, the area information missing of client, main table cannot be matched with detail list in operation system.For this kind of missing datas It can filter out, it can be data warehouse be just written after completion according to integrality investigation rule.Specifically, vacancy if it exists, it can To arrange above-mentioned vacancy project, the report comprising above-mentioned vacancy project is generated, and user can be prompted to above-mentioned with output report Vacancy project carries out the project no data (not filling in) that admission time is had recorded in completion, such as this report, can be by above-mentioned report Accuse the source side sent to above-mentioned initial data, it is desirable that the above-mentioned vacancy project of amended record.Optionally, certain missing values can be from this number It is derived according to source or other data sources, this can use average value, maximum value, minimum value or increasingly complex probability Estimation generation For the value of missing, to achieve the purpose that cleaning.Therefore above-mentioned integrality investigation rule can also include completion rule, can also be with Above-mentioned vacancy project completion, such as the admission time for including in target data are not filled in by above-mentioned completion rule, but can be with Other related datas such as electronic health record of the user is got in the electronic equipment, electronic equipment is available to arrive the electronics The admission time recorded in case history, then electronic equipment can be by the admission time completion in above-mentioned target data, which can also To generate completion record, the information of completion can be inquired.
205, the similarity of the project in above-mentioned target data is detected, if above-mentioned similarity is higher than first threshold, in judgement Stating project is duplicated project, deletes the data of one of above-mentioned duplicated project.
Above-mentioned standard rule includes that when repeating to check rule, can detecte the repeated data in target data, realizes number According to duplicate removal.
Optionally, the identical record of attribute value is considered as repeating to record in database, passes through the attribute between judgement record Whether whether value equal equal to detect record, and equal record merges into a record (i.e. merging/removing).Electronic equipment can To detect the similarity of the project in above-mentioned target data, by the comparison of field in data, above-mentioned similarity is obtained, electronics is set It can also be stored with first threshold in standby, may determine that above-mentioned project is duplicated project when similarity is higher than first threshold, into And the data of above-mentioned duplicated project can be deleted.For example first threshold is set as 98%, the target data that electronic equipment detects It is middle then to delete the data of one of project there are the project that two similarities are 98.7%, or merge above-mentioned duplicated project Data be portion, and retain above-mentioned analysis record, storage is in the electronic device.
Optionally, second threshold can also be stored in electronic equipment, when above-mentioned similarity is lower than first threshold and height When above-mentioned second threshold, it can be determined that above-mentioned project is high similarity project, and then be can store in high similarity record, In order to user query reference, it whether can also be repeated data by artificial judgment, improve the accuracy of data processing.
In the embodiment of the present application, memory space can be saved by data deduplication for magnanimity medical insurance data.Pass through weight Complex data is deleted, and can substantially reduce the computer storage medium quantity of needs, and then reduce cost.It is even possible that based on hard The storage system cost of disk is lower than tape library, while providing better performance.Therefore, the storage system of data deduplication technology is supported System is particularly suitable for the backup for being used to do data.
Data deduplication can also promote write performance.The write performance of disk is limited, and is usually sequentially written in 100MB/s or so carries out data deduplication if when data are written, and disk can be written to avoid the data of a part, To promote write performance.
Data deduplication can also save network bandwidth.If carrying out data deduplication in client, only newly-increased data are passed It is defeated to arrive storage system, it is possible to reduce the volume of transmitted data on network, to save network bandwidth.
206, include the case where format convention in response to above-mentioned standard rule, detect the project word in above-mentioned target data Section whether with template field matches, if mismatch, according to above-mentioned format convention by above-mentioned unmatched item field be converted to The aiming field of above-mentioned template field matches.
Specifically, can identify possible error value or exceptional value with the method for statistical analysis, not such as variance analysis, identification In accordance with distribution or the value of regression equation, data can also be checked with simple rule library (common-sense rule, business ad hoc rules etc.) Value, or data are detected and cleared up using the constraint between different attribute, external data.
It can store multiple template field in electronic equipment, it, can be by above-mentioned target data for different types of data In item field be compared with template field, if above-mentioned item field meets the format of above-mentioned template field, match, if It does not meet, mismatches.For example, for date data of going to a doctor, the date of going to a doctor is on March 11st, 2017, in target data The data format of medical date of making available is " 2017.3.11 ", and is " XXXX (year)-XX for the format convention on medical date (moon)-XX (day) " (template field), electronic equipment can be with the above-mentioned medical date datas " 2017.3.11 " of automatic identification, with template Field compares, and detects its format mismatching template field, matched format " 2017-03-11 " can be converted into, to reach To the effect of Uniform data format.
If mismatching, above-mentioned unmatched item field can be converted to and above-mentioned template word according to above-mentioned format convention The matched aiming field of section.
If 207, detecting the above-mentioned and unmatched item field of template field, there are wrong data, generation error records.
Under some cases, item field and the unmatched reason of template field not instead of format issues, the data of input Mistake, for example be equally medical date data, the date of going to a doctor is on March 11st, 2017, the medical date of making available in target data Data are " 2017.3.111 ", and electronic equipment can detecte the codomain error of target data, and " 111 " can be judged as certain moon Date, but which is beyond that normal date range (31 days January), it can be determined that there are wrong data for above-mentioned item field, and Such wrong data can not determine correct data, therefore above-mentioned error logging can be generated, and can be recorded in report It is checked in announcement for user, and above-mentioned error logging can be exported and prompted, be convenient for the amendment of data, obtained accurate Data.
Wherein, above-mentioned steps 204- step 206 can be executed sequentially in no particular order, also may be performed simultaneously.
208, the project category for determining above-mentioned item field, in the corresponding amendment database of above-mentioned project category search with The above-mentioned matched amendment data of item field, output include the prompt information of above-mentioned amendment data.
For different wrong data, it can be automatically corrected under some cases.It can store in electronic equipment There is amendment database, corrects in database and contain the corresponding amendment data of disparity items classification, electronic equipment can determine The project category of item field is stated, and then searches corresponding with project category, above-mentioned item field in above-mentioned amendment database Matched amendment data, it is possible to further export the prompt information comprising above-mentioned amendment data, for suggesting user to above-mentioned Wrong data is modified.
For example, the department item field of transferring from one department to another in target data fills in wrong (for example having wrongly written character), section of transferring from one department to another can be determined Not corresponding project category is hospital course information, and the corresponding amendment data of hospital course information are obtained in amendment database, Which includes all departments, amendment data can be determined by the identification to above-mentioned wrong data, i.e., correct department title, Such as dermatology, neurosurgery etc..
After processing target data through the above steps, it can obtain and meet above-mentioned integrality investigation rule and above-mentioned lattice The normal data of formula rule.
Optionally, this method can also include: to obtain the corresponding pre-stored data template of above-mentioned data label, by above-mentioned standard Data and above-mentioned pre-stored data template carry out comparing, obtain data scoring;
Judging, whether above-mentioned data scoring is higher than the first score threshold and above-mentioned standard data grouping is stored if being higher than.
Specifically, may determine that whether normal data meets quality of data requirement.It can store and be directed in electronic equipment The pre-stored data template of different data can determine the corresponding pre-stored data template used of the data by above-mentioned data label, The pre-stored data template may include the contents such as data structure template, data format template, also comprising corresponding data scoring rule Then, for example data formatting error button 1 divides, an item data project does not fill in 0.5 point of button etc..Getting above-mentioned pre-stored data template Later, above-mentioned standard data and above-mentioned pre-stored data template can be subjected to comparing, gives a mark, obtains to the normal data Obtain above-mentioned data scoring.Wherein it is possible to carry out comparing and scoring to all target datas once uploaded, can also be grouped Carry out comparing and scoring.
After acquisition data scoring, it can be determined that whether above-mentioned data scoring is higher than the first score threshold, if above-mentioned data Scoring is higher than the first score threshold, then meets above-mentioned quality of data requirement, can be stored with above-mentioned standard data grouping.
Optionally, multiple score thresholds can also be stored in electronic equipment, it, can be with by compared with the scoring of above-mentioned data Determine the credit rating of normal data.For example the second score threshold higher than above-mentioned first scoring threshold value can be set, it can sentence Whether above-mentioned data score data scoring of breaking is higher than the second score threshold, if the scoring of above-mentioned data is higher than the second score threshold, Meet the above-mentioned quality of data require and quality of data grade be it is excellent, if the scoring of above-mentioned data be higher than above-mentioned first score threshold but Not higher than above-mentioned second scoring threshold value, meets above-mentioned quality of data requirement but quality of data grade is good.
Group basis can be to be grouped or data type is grouped according to quality of data grade, is not done herein Limitation.
Data can be arranged according to quality of data situation, be intuitively understood by the assessment to data credit rating Quality of data situation, the quality of data that is convenient for reference execute subsequent data processing steps.
Optionally, this method further include: generate Data Detection report, above-mentioned Data Detection report includes above-mentioned initial data The Data Detection moment, Data duplication rate, data qualification rate and/or according to the above-mentioned target data of above-mentioned standard rule process Content.
Wherein, above-mentioned Data duplication rate is the ratio of the repeated data in above-mentioned target data, above-mentioned data qualification rate For the ratio of above-mentioned standard data in above-mentioned target data.It is reported by above-mentioned Data Detection, it can clearly response data matter Measure situation, can acquisition to data and processing effective reference is provided, convenient for user to the maintenance of data and data acquisition, number According to improving for processing system.
The data label that the embodiment of the present application is carried by obtaining initial data and above-mentioned initial data, to above-mentioned original number After carrying out desensitization process, target data is obtained, then obtain the corresponding normalisation rule of above-mentioned data label, according to above-mentioned standard Change rule to handle target data, comprising: detect whether the project in above-mentioned target data has vacant position, it is raw if having vacant position At the report comprising the vacancy project, output report is to prompt to carry out completion to above-mentioned vacancy project;Detect above-mentioned number of targets The similarity of project in judges that above-mentioned project for duplicated project, can delete if above-mentioned similarity is higher than first threshold State the data of one of duplicated project;Detect item field in above-mentioned target data whether with template field matches, if mismatching, Above-mentioned unmatched item field is converted to the aiming field with above-mentioned template field matches according to above-mentioned format convention;If inspection Measuring the above-mentioned and unmatched item field of template field, there are wrong data, generation error records, and determine above-mentioned project word The project category of section is searched and the matched amendment number of above-mentioned item field in the corresponding amendment database of above-mentioned project category According to output includes the prompt information of above-mentioned amendment data, can obtain and meet above-mentioned integrality investigation rule and above-mentioned format rule The efficiency and accuracy of data processing can be improved to carry out subsequent data storage or treatment process in normal data then.
Fig. 3 is referred to, Fig. 3 is the structural schematic diagram of a kind of electronic equipment 300 provided by the embodiments of the present application, which sets Standby 300 include obtaining module 310, desensitization module 320 and data processing module 330, in which:
Above-mentioned acquisition module 310, the data label carried for obtaining initial data and the initial data;
Initial data in the embodiment of the present application can be Medicare data, and medical insurance refers generally to basic medical guarantor Danger, the social security system established due to being to compensate for labourer's economic loss caused by disease risks.Pass through employment Medical Benefits Fund is established in unit and personal payment, and insurant illness is gone to a doctor after generation medical expense, by Medical Insurance Organizations Certain economic compensation is given to it.It can be related to a large amount of medical insurance data processing in above process.
Above-mentioned initial data can be the generation of copy editor's software such as data in all types of documents, such as WORD, EXCEL File in data.
Electronic equipment can be communicated with terminal device, the initial data from terminal device be received, on receiving When stating and obtain module 310 after initial data and can be immediately performed above-mentioned steps, and can store default execute in electronic equipment It carves, i.e., the time of data processing can be arranged in user in advance, and then obtaining module 310 can be in above-mentioned default execution moment execution Above-mentioned steps.
It is handled for, standardization quick to data, use carries out above-mentioned initial data in the embodiment of the present application The mode of classification.Above-mentioned initial data can carry different data labels due to source difference, by above-mentioned data label, It can be to above-mentioned initial data classification processing.Above-mentioned data label can be according to data type division, such as Different hospital The data of different-format are reported, to carry out data normalization, establish database;Each format corresponds to a kind of data mark Label, can determine its data format according to above-mentioned data label.
Above-mentioned desensitization module 320 obtains target data for carrying out desensitization process to the initial data;
Data desensitization in the embodiment of the present application refers to the deformation that certain sensitive informations are carried out with data by desensitization rule, Realize the reliably protecting of privacy-sensitive data.In the case where being related to client secure data or some commercial sensitive datas, Under the conditions of not violating system convention, truthful data is transformed and test use is provided, such as identification card number, cell-phone number, card Number, the personal information such as customer ID require to carry out data desensitization.One of Database security technology, Database security technology are mainly wrapped It includes: database drain sweep, data base encryption, database firewall, data desensitization, database security auditing system.Database security Risk includes: to drag library, brush library, hit library.
Desensitization rule can be preset in the embodiment of the present application, i.e. setting takes off certain certain types of data It is quick.Above-mentioned desensitization rule can be arranged in desensitization module 320 according to the actual situation, select the data class to desensitize.This Apply for that the main desensitization rule in embodiment may include: name, identification card number, address information, telephone number.
In medical insurance field by data desensitization process, Information Security can be improved, ensure privacy of user.
After the module 320 that desensitizes executes data desensitization, specific standardization can be carried out.
Above-mentioned acquisition module 310 is also used to, and obtains the corresponding normalisation rule of above-mentioned data label;
It can store the corresponding relationship of data label and normalisation rule, the i.e. number of different data label in electronic equipment According to using different normalisation rules to be handled.Module 310 is obtained by data label, can determine needs standard to be used Change rule, and then obtain the normalisation rule, subsequent step can be executed.
Above-mentioned data processing module 330, for being met according to the above-mentioned target data of above-mentioned standard rule process State the normal data of integrality investigation rule and above-mentioned format convention.
Since the source of initial data is different, transcoded data process can be targetedly executed.
Specifically, the normalisation rule being standardized to data has been stored in advance in electronic equipment, can have more Kind rule, and it is stored with the corresponding relationship of data label and normalisation rule, data processing module 330 can pass through above-mentioned number According to the corresponding relationship of label and normalisation rule, the corresponding normalisation rule of data label of initial data carrying is obtained, then Transcoding process is executed using above-mentioned standard rule.
Data cleansing (Data cleaning), is the process that data are examined and verified again, it is therefore intended that is deleted Mistake existing for duplicate message, correction, and data consistency is provided.
Above-mentioned standard rule may include integrality investigation rule, for detecting the integrality of target data, Ke Yijian The missing for finding data is omitted, and can carry out completion according to normalisation rule to the data that missing is omitted.
Above-mentioned standard rule may include repeating to check rule, for detecting the repeated data in target data, and It can be with deleting duplicated data.
Above-mentioned standard rule may include format convention, the format of target data can be united according to above-mentioned format convention One.
Above-mentioned standard rule can also include the processing to wrong data, and mistake Producing reason may be operation system Not well established, caused by not carrying out judgement after receiving input and writing direct background data base, for example numeric data is defeated helps Have that a carriage return operation, date format be incorrect, the date crosses the border behind angle numerical character, string data.This kind of data Also to classify, it, can only be by writing the side of SQL statement for being similar to double byte character, having the problem of invisible character before and after data Formula is found out, and client is then required to extract after operation system amendment.Date format is incorrect or the date crosses the border This kind of mistakes will lead to ETL operation failure, and this kind of mistakes can go operation system database to be picked out with the mode of SQL, It gives competent business department and requires time limit amendment, extracted again after amendment.
Data processing module 330 is mainly arranged by integrality according to target data described in above-mentioned standard rule process It looks into and is standardized with format convention, the standard for meeting above-mentioned integrality investigation rule and above-mentioned format convention can be obtained Data.
Optionally, data processing module 330 includes first processing units 331, is used for:
It detects whether the project in above-mentioned target data has vacant position, if having vacant position, generates the report comprising above-mentioned vacancy project It accuses, exports above-mentioned report to prompt to carry out completion to above-mentioned vacancy project.
When the corresponding normalisation rule of above-mentioned data label of acquisition includes integrality investigation rule, first processing units 331 can check the integrality of target data, can detecte whether the project in above-mentioned target data has vacant position, vacancy It can be understood as the presence of the project that do not fill in data, this kind of data are mainly some due loss of learning, are such as supplied The title of quotient, the title of branch company, the area information missing of client, main table cannot be matched with detail list in operation system.It is right It can be filtered out in this kind of missing datas, it can be data warehouse be just written after completion according to integrality investigation rule.Specifically , vacancy, first processing units 331 can arrange above-mentioned vacancy project if it exists, the report comprising above-mentioned vacancy project is generated, And user can be prompted to carry out having recorded admission time in completion, such as this report to above-mentioned vacancy project with output report Project no data (is not filled in), and above-mentioned report can be sent to the source side of above-mentioned initial data, it is desirable that the above-mentioned vacancy of amended record Project.Optionally, certain missing values can be derived from notebook data source or other data sources, this can use average value, most Big value, minimum value or increasingly complex probability Estimation replace the value of missing, to achieve the purpose that cleaning.Therefore above-mentioned integrality Checking rule can also include completion rule, can also be by above-mentioned completion rule by above-mentioned vacancy project completion, such as target The admission time for including in data is not filled in, but other related datas of the user can be got in the electronic equipment such as Electronic health record, the available admission time recorded into the electronic health record of electronic equipment, then electronic equipment can be by above-mentioned mesh The admission time completion in data is marked, which also can be generated completion record, can inquire the information of completion.
Optionally, data processing module 330 includes the second processing unit 332, is used for:
The similarity of the project in above-mentioned target data is detected, if above-mentioned similarity is higher than first threshold, judges above-mentioned item Mesh is duplicated project, merges the data of above-mentioned duplicated project or the data of one of the above-mentioned duplicated project of deletion.
Above-mentioned standard rule includes when repeating to check rule, and the second processing unit 332 can detecte in target data Repeated data realizes data deduplication.
Optionally, the identical record of attribute value is considered as repeating to record in database, passes through the attribute between judgement record Whether whether value equal equal to detect record, and equal record merges into a record (i.e. merging/removing).Second processing list Member 332 can detecte the similarity of the project in above-mentioned target data, by the comparison of field in data, obtain above-mentioned similar It spends, first threshold can also be stored in electronic equipment, the second processing unit 332 can be sentenced when similarity is higher than first threshold Above-mentioned project of breaking is duplicated project, and then can delete the data of above-mentioned duplicated project.For example first threshold is set as 98%, There are the project that two similarities are 98.7% in the target data that two processing units 332 detect, then one of item is deleted Purpose data, or merging the data of above-mentioned duplicated project is portion, and retains above-mentioned analysis record, is stored in electronic equipment In.
Optionally, data processing module 330 includes third processing unit 333, is used for:
Detect item field in above-mentioned target data whether with template field matches, if mismatching, according to above-mentioned format Above-mentioned unmatched item field is converted to the aiming field with above-mentioned template field matches by rule.
Specifically, third processing unit 333 can identify possible error value or exceptional value with the method for statistical analysis, such as The value of distribution or regression equation is not abided by variance analysis, identification, and can also using simple rule library, (common-sense rule, business are specific Rule etc.) check data value, or data are detected and cleared up using the constraint between different attribute, external data.
It can store multiple template field in electronic equipment, for different types of data, third processing unit 333 can be with Item field in above-mentioned target data is compared with template field, if above-mentioned item field meets above-mentioned template field Format then matches, and mismatches if not meeting.For example, for date data of going to a doctor, the date of going to a doctor is March 11 in 2017 Number, the data format of the medical date of making available in target data is " 2017.3.11 ", and is for the format convention on medical date " XXXX (year)-XX (moon)-XX (day) " (template field), third processing unit 333 can be with the above-mentioned medical day issues of automatic identification It according to " 2017.3.11 ", is compared with template field, detects its format mismatching template field, can be converted into matched Format " 2017-03-11 ", to achieve the effect that Uniform data format.
In the embodiment of the present application, memory space can be saved by data deduplication for magnanimity medical insurance data.Pass through weight Complex data is deleted, and can substantially reduce the computer storage medium quantity of needs, and then reduce cost.It is even possible that based on hard The storage system cost of disk is lower than tape library, while providing better performance.Therefore, the storage system of data deduplication technology is supported System is particularly suitable for the backup for being used to do data.
Data deduplication can also promote write performance.The write performance of disk is limited, and is usually sequentially written in 100MB/s or so carries out data deduplication if when data are written, and disk can be written to avoid the data of a part, To promote write performance.
Data deduplication can also save network bandwidth.If carrying out data deduplication in client, only newly-increased data are passed It is defeated to arrive storage system, it is possible to reduce the volume of transmitted data on network, to save network bandwidth.
Optionally, data processing module 330 further includes amending unit 334, is used for:
If detecting the above-mentioned and unmatched item field of template field, there are wrong data, generation error records;
The project category for determining above-mentioned item field, amendment database in search above-mentioned project category it is corresponding, with it is upper The matched amendment data of item field are stated, output includes the prompt information of above-mentioned amendment data.
Under some cases, item field and the unmatched reason of template field not instead of format issues, the data of input Mistake, for example be equally medical date data, the date of going to a doctor is on March 11st, 2017, the medical date of making available in target data Data are " 2017.3.111 ", and amending unit 334 can detecte the codomain error of target data, and " 111 " can be judged as Certain month date, but which is beyond that normal date range (31 days January), amending unit 334 may determine that above-mentioned item field There are wrong data, and such wrong data can not determine correct data, therefore above-mentioned error logging can be generated, Yi Jike It is checked with being recorded in report for user, and above-mentioned error logging can be exported and prompted, be convenient for data Amendment, obtains accurate data.
For different wrong data, it can be automatically corrected under some cases.It can store in electronic equipment There is amendment database, corrects in database and contain the corresponding amendment data of disparity items classification, amending unit 334 can determine The project category of above-mentioned item field, and then corresponding with project category, above-mentioned project word is searched in above-mentioned amendment database The matched amendment data of section, it is possible to further export the prompt information comprising above-mentioned amendment data, for suggest user to Wrong data is stated to modify.
For example, the department item field of transferring from one department to another in target data is filled in wrong (for example having wrongly written character), amending unit 334 can be with Determine that the corresponding project category of department of transferring from one department to another is hospital course information, it is corresponding to obtain hospital course information in amendment database Data are corrected, which includes all departments, amending unit 334 can determine amendment number by the identification to above-mentioned wrong data According to that is, correct department title, such as dermatology, neurosurgery etc..
After processing target data through the above steps, it can obtain and meet above-mentioned integrality investigation rule and above-mentioned lattice The normal data of formula rule.
Optionally, electronic equipment 300 further includes grading module 340, is used for:
After obtaining the normal data for meeting above-mentioned integrality investigation rule and above-mentioned format convention, above-mentioned data are obtained Above-mentioned standard data and above-mentioned pre-stored data template are carried out comparing, obtain data by the corresponding pre-stored data template of label Scoring;
Judging, whether above-mentioned data scoring is higher than the first score threshold and above-mentioned standard data grouping is stored if being higher than.
Specifically, grading module 340 may determine that whether normal data meets quality of data requirement.It can be in electronic equipment It is stored with the pre-stored data template for different data, grading module 340 can determine the data pair by above-mentioned data label The pre-stored data template that should be used, the pre-stored data template may include the contents such as data structure template, data format template, also Comprising corresponding data code of points, for example data formatting error button 1 divides, an item data project does not fill in 0.5 point of button etc..Scoring Module 340 can carry out above-mentioned standard data and above-mentioned pre-stored data template after getting above-mentioned pre-stored data template Comparing gives a mark to the normal data, obtains above-mentioned data scoring.Wherein it is possible to all targets once uploaded Data carry out comparing and scoring, can also be grouped and carry out comparing and scoring.
Grading module 340 obtains after data scoring, it can be determined that and whether above-mentioned data scoring is higher than the first score threshold, If above-mentioned data scoring is higher than the first score threshold, meets above-mentioned quality of data requirement, can be deposited with above-mentioned standard data grouping Storage.
Optionally, multiple score thresholds can also be stored in electronic equipment, grading module 340 with above-mentioned data by scoring Comparison, can determine the credit rating of normal data.For example the second score higher than above-mentioned first scoring threshold value can be set Threshold value, grading module 340 may determine that whether above-mentioned data score data scoring is higher than the second score threshold, if above-mentioned data are commented Divide and be higher than the second score threshold, then meets above-mentioned quality of data requirement and quality of data grade is excellent, if above-mentioned data score Higher than above-mentioned first score threshold but not higher than above-mentioned second scoring threshold value, meet above-mentioned quality of data requirement but quality of data etc. Grade is good.
Group basis can be to be grouped or data type is grouped according to quality of data grade, is not done herein Limitation.
Data can be carried out according to quality of data situation by assessment of the grading module 340 to data credit rating whole Reason, intuitively understands quality of data situation, and the quality of data that is convenient for reference executes subsequent data processing steps.
Optionally, which further includes reporting modules 350, for generating Data Detection report, above-mentioned data inspection Observe and predict that accuse include Data Detection moment of above-mentioned initial data, Data duplication rate, data qualification rate and/or according to above-mentioned standard The content of the above-mentioned target data of rule process, wherein above-mentioned Data duplication rate is the ratio of the repeated data in above-mentioned target data Example, above-mentioned data qualification rate are, above-mentioned in above-mentioned target data to meet above-mentioned integrality investigation rule and above-mentioned format convention The ratio of normal data.
Reported by above-mentioned Data Detection, can clearly response data quality condition, can acquisition to data and place Reason provides effective reference, improves convenient for user to the maintenance of data and data acquisition, data processing system.
According to the specific embodiment of the embodiment of the present application, the step of Fig. 1 and data processing method shown in Fig. 2 are related to 101~104,201~208 modules that can be in electronic equipment 300 as shown in Figure 3 are performed.For example, in Fig. 1 Step 101~104 can respectively shown in Fig. 3 acquisition module 310, desensitization module 320 and data processing module 330 It executes.
By the electronic equipment 300 of the embodiment of the present application, the available initial data of electronic equipment 300 and above-mentioned original number According to the data label of carrying, desensitization process is carried out to above-mentioned initial data, target data is obtained, then obtains above-mentioned data label Corresponding normalisation rule, above-mentioned standard rule include integrality investigation rule and format convention, then according to above-mentioned standard The above-mentioned target data of rule process obtains the normal data for meeting above-mentioned integrality investigation rule and above-mentioned format convention, is convenient for The efficiency and accuracy of data processing can be improved to carry out subsequent data storage or treatment process in standardized data.
Referring to Fig. 4, Fig. 4 is the structural schematic diagram of another kind electronic equipment disclosed in the embodiment of the present application.Such as Fig. 4 institute Show, which includes processor 401 and memory 402, wherein electronic equipment 400 can also include bus 403, place Reason device 401 and memory 402 can be connected with each other by bus 403, and bus 403 can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, PCI) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..Bus 403 can be divided into address bus, data/address bus, Control bus etc..Only to be indicated with a thick line in Fig. 4, it is not intended that an only bus or a seed type convenient for indicating Bus.Wherein, electronic equipment 400 can also include input-output equipment 404, and input-output equipment 404 may include display Screen, such as liquid crystal display.Memory 402 is used to store one or more programs comprising instruction;Processor 401 is for calling Method and step some or all of is mentioned in the above-mentioned Fig. 1 and Fig. 2 embodiment of the instruction execution being stored in memory 402.
It should be appreciated that in the embodiment of the present application, alleged processor 401 can be central processing unit (Central Processing Unit, CPU), which can also be other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic Device, discrete gate or transistor logic, discrete hardware components etc..General processor can be microprocessor or this at Reason device is also possible to any conventional processor etc..
Input equipment 402 may include that Trackpad, fingerprint adopt sensor (for acquiring the finger print information and fingerprint of user Directional information), microphone etc., output equipment 403 may include display (LCD etc.), loudspeaker etc..
The memory 404 may include read-only memory and random access memory, and to processor 401 provide instruction and Data.The a part of of memory 404 can also include nonvolatile RAM.For example, memory 404 can also be deposited Store up the information of device type.
By the electronic equipment 400 of the embodiment of the present application, the available initial data of electronic equipment 400 and above-mentioned original number According to the data label of carrying, desensitization process is carried out to above-mentioned initial data, target data is obtained, then obtains above-mentioned data label Corresponding normalisation rule, above-mentioned standard rule include integrality investigation rule and format convention, then according to above-mentioned standard The above-mentioned target data of rule process obtains the normal data for meeting above-mentioned integrality investigation rule and above-mentioned format convention, is convenient for The efficiency and accuracy of data processing can be improved to carry out subsequent data storage or treatment process in standardized data.
The embodiment of the present application also provides a kind of computer storage medium, wherein computer storage medium storage is for electricity The computer program of subdata exchange, it is as any in recorded in above method embodiment which execute computer A kind of some or all of data processing method step.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the module, it is only a kind of Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or module, It can be electrical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple On network module.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
If the integrated module is realized in the form of software function module and sells or use as independent product When, it can store in a computer-readable access to memory.Based on this understanding, technical solution of the present invention substantially or Person says that all or part of the part that contributes to existing technology or the technical solution can body in the form of software products Reveal and, which is stored in a memory, including some instructions are used so that a computer equipment (can be personal computer, server or network equipment etc.) executes all or part of each embodiment the method for the present invention Step.And memory above-mentioned includes: USB flash disk, read-only memory (Read-Only Memory, ROM), random access memory The various media that can store program code such as (Random Access Memory, RAM), mobile hard disk, magnetic or disk.

Claims (10)

1. a kind of data processing method, which is characterized in that the described method includes:
Obtain the data label that initial data and the initial data carry;
Desensitization process is carried out to the initial data, obtains target data;
The corresponding normalisation rule of the data label is obtained, the normalisation rule includes integrality investigation rule and format rule Then;
According to the normalisation rule processing target data, acquisition meets the integrality investigation rule and format rule Normal data then.
2. the method according to claim 1, wherein including that the integrality is arranged in response to the normalisation rule The situation of rule is looked into, described to include according to the normalisation rule processing target data:
It detects whether the project in the target data has vacant position, if having vacant position, generates the report comprising the vacancy project, it is defeated The report is out to prompt to carry out completion to the vacancy project.
3. according to the method described in claim 2, it is characterized in that, the normalisation rule further includes repeating to check rule, institute It states according to the normalisation rule processing target data further include:
The similarity of the project in the target data is detected, if the similarity is higher than first threshold, judges that the project is Duplicated project merges the data of the duplicated project or the data of one of the deletion duplicated project.
4. according to the method in claim 2 or 3, which is characterized in that in response to the normalisation rule include the format The situation of rule, it is described according to the normalisation rule processing target data further include:
Detect item field in the target data whether with template field matches, if mismatching, according to the format convention The unmatched item field is converted to the aiming field with the template field matches.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
If detecting the described and unmatched item field of template field, there are wrong data, generation error records;
The project category for determining the item field, amendment database in search the project category it is corresponding, with the item The amendment data of mesh fields match, prompt information of the output comprising the amendment data.
6. method according to claim 1-5, which is characterized in that the acquisition meets the integrality investigation rule Then and after the normal data of the format convention, the method also includes:
The corresponding pre-stored data template of the data label is obtained, the normal data and the pre-stored data template are counted According to comparison, data scoring is obtained;
Judging, whether the data scoring is higher than the first score threshold and the normal data packet is stored if being higher than.
7. method according to claim 1-6, which is characterized in that the acquisition meets the integrality investigation rule Then and after the normal data of the format convention, the method also includes:
Data Detection report is generated, the Data Detection report includes the Data Detection moment of the initial data, Data duplication Rate, data qualification rate and/or the content that the target data is handled according to the normalisation rule, wherein the Data duplication Rate is the ratio of the repeated data in the target data, and the data qualification rate is criterion numeral described in the target data According to ratio.
8. a kind of electronic equipment characterized by comprising obtain module, desensitization module and data processing module, in which:
The acquisition module, the data label carried for obtaining initial data and the initial data;
The desensitization module obtains target data for carrying out desensitization process to the initial data;
The acquisition module is also used to, and obtains the corresponding normalisation rule of the data label, and the normalisation rule has included Whole property investigation rule and format convention;
The data processing module, for according to the normalisation rule processing target data, acquisition to meet described complete Property investigation rule and the format convention normal data.
9. a kind of electronic equipment, which is characterized in that including processor, input equipment, output equipment and memory, the processing Device, input equipment, output equipment and memory are connected with each other, wherein the memory is for storing computer program, the meter Calculation machine program includes program instruction, and the processor is configured for calling described program instruction, executes claim 1-7 such as and appoints Method described in one.
10. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with computer program, described Computer program includes program instruction, and described program instruction makes the processor execute such as claim when being executed by a processor The described in any item methods of 1-7.
CN201811323448.1A 2018-11-07 2018-11-07 A kind of data processing method, electronic equipment and computer storage medium Pending CN109522746A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811323448.1A CN109522746A (en) 2018-11-07 2018-11-07 A kind of data processing method, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811323448.1A CN109522746A (en) 2018-11-07 2018-11-07 A kind of data processing method, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN109522746A true CN109522746A (en) 2019-03-26

Family

ID=65774328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811323448.1A Pending CN109522746A (en) 2018-11-07 2018-11-07 A kind of data processing method, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN109522746A (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109998A (en) * 2019-05-17 2019-08-09 贵州数据宝网络科技有限公司 Data trade intelligence integration system
CN110263024A (en) * 2019-05-20 2019-09-20 平安普惠企业管理有限公司 Data processing method, terminal device and computer storage medium
CN110263016A (en) * 2019-05-20 2019-09-20 平安普惠企业管理有限公司 Data processing method, terminal device and computer storage medium
CN110414579A (en) * 2019-07-18 2019-11-05 北京信远通科技有限公司 Metadata schema closes mark property inspection method and device, storage medium
CN110737689A (en) * 2019-10-10 2020-01-31 广东省科技基础条件平台中心 Data standard conformance detection method, device, system and storage medium
CN111026744A (en) * 2019-12-11 2020-04-17 新奥数能科技有限公司 Data management method and device based on energy station system model framework
CN111061733A (en) * 2019-12-10 2020-04-24 北京明略软件***有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN111159479A (en) * 2019-12-31 2020-05-15 上海亿保健康管理有限公司 Data processing method, device and equipment
CN111177176A (en) * 2019-11-18 2020-05-19 腾讯科技(深圳)有限公司 Data detection method, device and storage medium
CN111291050A (en) * 2020-01-21 2020-06-16 北京工业大数据创新中心有限公司 Method and device for processing data standard of equipment
CN111291031A (en) * 2020-01-22 2020-06-16 北京明略软件***有限公司 Data correction method and device
CN111400296A (en) * 2020-03-16 2020-07-10 北京大学深圳医院 Kidney pathology immunofluorescence data processing method and device and related equipment
CN111597177A (en) * 2020-05-14 2020-08-28 重庆农村商业银行股份有限公司 Data governance method for improving data quality
CN111612007A (en) * 2020-05-19 2020-09-01 黑龙江工业学院 English second-level braille conversion system based on image acquisition and correction
CN111984987A (en) * 2020-09-01 2020-11-24 上海梅斯医药科技有限公司 Method, device, system and medium for desensitization and reduction of electronic medical record
CN112102098A (en) * 2020-08-12 2020-12-18 泰康保险集团股份有限公司 Data processing method and device, electronic equipment and storage medium
CN112347749A (en) * 2019-08-06 2021-02-09 南通深南电路有限公司 Data processing method, data processing equipment and data processing system
CN112466199A (en) * 2020-11-26 2021-03-09 联盛(厦门)彩印有限公司 Automatic typesetting method, system, equipment and storage medium for electronic tag hang tag
CN112613764A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Data processing method and device and electronic equipment
CN112633736A (en) * 2020-12-30 2021-04-09 上海魔橙网络科技有限公司 Risk monitoring method, system and device based on block chain system
CN112650806A (en) * 2020-12-30 2021-04-13 邦邦汽车销售服务(北京)有限公司 ERP system docking accessory data standardization method and device and storage medium
CN113138980A (en) * 2021-05-13 2021-07-20 南方医科大学皮肤病医院 Data processing method, device, terminal and storage medium
CN113223726A (en) * 2021-04-23 2021-08-06 武汉大学 Visualized interactive system for data treatment mode and treatment result in medical big data
CN113312887A (en) * 2021-06-10 2021-08-27 中国汽车工程研究院股份有限公司 Digital processing method and system for vehicle detection report
WO2021174812A1 (en) * 2020-03-02 2021-09-10 平安科技(深圳)有限公司 Data cleaning method and apparatus for profile, and medium and electronic device
CN113407564A (en) * 2021-06-18 2021-09-17 浙江非线数联科技股份有限公司 Data processing method and system
CN113626461A (en) * 2021-08-10 2021-11-09 平安国际智慧城市科技股份有限公司 Information searching method, terminal device and computer readable storage medium
CN113779343A (en) * 2021-09-18 2021-12-10 北京锐安科技有限公司 Mass data processing method, device, medium and electronic equipment
CN113824717A (en) * 2021-09-18 2021-12-21 北京天融信网络安全技术有限公司 Configuration checking method and device
CN113961549A (en) * 2021-09-22 2022-01-21 李凤杰 Medical data integration method and system based on data warehouse
CN114491177A (en) * 2022-02-15 2022-05-13 北京百度网讯科技有限公司 Information determination method, model training method, model determination device and electronic equipment
CN114676229A (en) * 2022-04-20 2022-06-28 国网安徽省电力有限公司滁州供电公司 Technical improvement major repair project file management system and management method
CN116561795A (en) * 2023-04-26 2023-08-08 合芯科技(苏州)有限公司 Data parallel desensitization processing method
CN116757647A (en) * 2023-08-17 2023-09-15 广东南方电信规划咨询设计院有限公司 Intelligent verification method and device for exploration data
TWI824927B (en) * 2023-01-17 2023-12-01 中華電信股份有限公司 Data synthesis system with differential privacy protection, method and computer readable medium thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776951A (en) * 2016-12-02 2017-05-31 航天星图科技(北京)有限公司 One kind cleaning contrast storage method
CN108153793A (en) * 2016-12-02 2018-06-12 航天星图科技(北京)有限公司 A kind of original data processing method
CN108446391A (en) * 2018-03-23 2018-08-24 万帮充电设备有限公司 Processing method, device, electronic equipment and the computer-readable medium of data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776951A (en) * 2016-12-02 2017-05-31 航天星图科技(北京)有限公司 One kind cleaning contrast storage method
CN108153793A (en) * 2016-12-02 2018-06-12 航天星图科技(北京)有限公司 A kind of original data processing method
CN108446391A (en) * 2018-03-23 2018-08-24 万帮充电设备有限公司 Processing method, device, electronic equipment and the computer-readable medium of data

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109998A (en) * 2019-05-17 2019-08-09 贵州数据宝网络科技有限公司 Data trade intelligence integration system
CN110109998B (en) * 2019-05-17 2023-05-30 贵州数据宝网络科技有限公司 Intelligent data transaction integration system
CN110263024A (en) * 2019-05-20 2019-09-20 平安普惠企业管理有限公司 Data processing method, terminal device and computer storage medium
CN110263016A (en) * 2019-05-20 2019-09-20 平安普惠企业管理有限公司 Data processing method, terminal device and computer storage medium
CN110263024B (en) * 2019-05-20 2023-08-22 重庆盛本亚信息技术有限公司 Data processing method, terminal device and computer storage medium
CN110414579A (en) * 2019-07-18 2019-11-05 北京信远通科技有限公司 Metadata schema closes mark property inspection method and device, storage medium
CN112347749A (en) * 2019-08-06 2021-02-09 南通深南电路有限公司 Data processing method, data processing equipment and data processing system
CN110737689A (en) * 2019-10-10 2020-01-31 广东省科技基础条件平台中心 Data standard conformance detection method, device, system and storage medium
CN111177176A (en) * 2019-11-18 2020-05-19 腾讯科技(深圳)有限公司 Data detection method, device and storage medium
CN111177176B (en) * 2019-11-18 2023-05-16 腾讯科技(深圳)有限公司 Data detection method, device and storage medium
CN111061733B (en) * 2019-12-10 2024-01-19 北京明略软件***有限公司 Data processing method, device, electronic equipment and computer readable storage medium
CN111061733A (en) * 2019-12-10 2020-04-24 北京明略软件***有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN111026744A (en) * 2019-12-11 2020-04-17 新奥数能科技有限公司 Data management method and device based on energy station system model framework
CN111159479A (en) * 2019-12-31 2020-05-15 上海亿保健康管理有限公司 Data processing method, device and equipment
CN111291050A (en) * 2020-01-21 2020-06-16 北京工业大数据创新中心有限公司 Method and device for processing data standard of equipment
CN111291031A (en) * 2020-01-22 2020-06-16 北京明略软件***有限公司 Data correction method and device
WO2021174812A1 (en) * 2020-03-02 2021-09-10 平安科技(深圳)有限公司 Data cleaning method and apparatus for profile, and medium and electronic device
CN111400296A (en) * 2020-03-16 2020-07-10 北京大学深圳医院 Kidney pathology immunofluorescence data processing method and device and related equipment
CN111597177A (en) * 2020-05-14 2020-08-28 重庆农村商业银行股份有限公司 Data governance method for improving data quality
CN111612007A (en) * 2020-05-19 2020-09-01 黑龙江工业学院 English second-level braille conversion system based on image acquisition and correction
CN112102098A (en) * 2020-08-12 2020-12-18 泰康保险集团股份有限公司 Data processing method and device, electronic equipment and storage medium
CN112102098B (en) * 2020-08-12 2023-10-27 泰康保险集团股份有限公司 Data processing method, device, electronic equipment and storage medium
CN111984987B (en) * 2020-09-01 2024-04-02 上海梅斯医药科技有限公司 Method, device, system and medium for desensitizing and restoring electronic medical records
CN111984987A (en) * 2020-09-01 2020-11-24 上海梅斯医药科技有限公司 Method, device, system and medium for desensitization and reduction of electronic medical record
CN112466199A (en) * 2020-11-26 2021-03-09 联盛(厦门)彩印有限公司 Automatic typesetting method, system, equipment and storage medium for electronic tag hang tag
CN112613764A (en) * 2020-12-25 2021-04-06 北京知因智慧科技有限公司 Data processing method and device and electronic equipment
CN112633736A (en) * 2020-12-30 2021-04-09 上海魔橙网络科技有限公司 Risk monitoring method, system and device based on block chain system
CN112650806A (en) * 2020-12-30 2021-04-13 邦邦汽车销售服务(北京)有限公司 ERP system docking accessory data standardization method and device and storage medium
CN113223726A (en) * 2021-04-23 2021-08-06 武汉大学 Visualized interactive system for data treatment mode and treatment result in medical big data
CN113138980A (en) * 2021-05-13 2021-07-20 南方医科大学皮肤病医院 Data processing method, device, terminal and storage medium
CN113312887A (en) * 2021-06-10 2021-08-27 中国汽车工程研究院股份有限公司 Digital processing method and system for vehicle detection report
CN113407564A (en) * 2021-06-18 2021-09-17 浙江非线数联科技股份有限公司 Data processing method and system
CN113626461A (en) * 2021-08-10 2021-11-09 平安国际智慧城市科技股份有限公司 Information searching method, terminal device and computer readable storage medium
CN113626461B (en) * 2021-08-10 2024-02-13 深圳平安智慧医健科技有限公司 Information searching method, terminal device and computer readable storage medium
CN113779343A (en) * 2021-09-18 2021-12-10 北京锐安科技有限公司 Mass data processing method, device, medium and electronic equipment
CN113824717A (en) * 2021-09-18 2021-12-21 北京天融信网络安全技术有限公司 Configuration checking method and device
CN113961549A (en) * 2021-09-22 2022-01-21 李凤杰 Medical data integration method and system based on data warehouse
CN114491177A (en) * 2022-02-15 2022-05-13 北京百度网讯科技有限公司 Information determination method, model training method, model determination device and electronic equipment
CN114676229A (en) * 2022-04-20 2022-06-28 国网安徽省电力有限公司滁州供电公司 Technical improvement major repair project file management system and management method
CN114676229B (en) * 2022-04-20 2023-01-24 国网安徽省电力有限公司滁州供电公司 Technical improvement major repair project file management system and management method
TWI824927B (en) * 2023-01-17 2023-12-01 中華電信股份有限公司 Data synthesis system with differential privacy protection, method and computer readable medium thereof
CN116561795A (en) * 2023-04-26 2023-08-08 合芯科技(苏州)有限公司 Data parallel desensitization processing method
CN116561795B (en) * 2023-04-26 2024-04-16 合芯科技(苏州)有限公司 Data parallel desensitization processing method
CN116757647B (en) * 2023-08-17 2023-12-22 广东南方电信规划咨询设计院有限公司 Intelligent verification method and device for exploration data
CN116757647A (en) * 2023-08-17 2023-09-15 广东南方电信规划咨询设计院有限公司 Intelligent verification method and device for exploration data

Similar Documents

Publication Publication Date Title
CN109522746A (en) A kind of data processing method, electronic equipment and computer storage medium
CN101878461B (en) Method and system for analysis of system for matching data records
Maleki et al. A comprehensive literature review of the rank reversal phenomenon in the analytic hierarchy process
CN109542965A (en) A kind of data processing method, electronic equipment and storage medium
CN105868373B (en) Method and device for processing key data of power business information system
US20140249865A1 (en) Claims analytics engine
US10560484B2 (en) Managing access in one or more computing systems
CN109285076A (en) Intelligent core protects processing method, server and storage medium
CN109858740A (en) Appraisal procedure, device, computer equipment and the storage medium of business risk
CN109597805A (en) A kind of data processing method, electronic equipment and storage medium
CN104756106A (en) Characterizing data sources in a data storage system
US20210112101A1 (en) Data set and algorithm validation, bias characterization, and valuation
CN112860769B (en) Energy planning data management system
CN110263016A (en) Data processing method, terminal device and computer storage medium
US20160125025A1 (en) Most likely classification code
US20230092559A1 (en) Systems and methods for unstructured data processing
CN109522301A (en) A kind of data processing method, electronic equipment and storage medium
US20140067443A1 (en) Business process transformation recommendation generation
CN110175276A (en) Infringing information acquisition methods, device, computer equipment and storage medium
CN112786124B (en) Problem troubleshooting method and device, storage medium and equipment
CN109947797B (en) Data inspection device and method
CN111858236B (en) Knowledge graph monitoring method and device, computer equipment and storage medium
CN111695077A (en) Asset information pushing method, terminal equipment and readable storage medium
CN114722789B (en) Data report integrating method, device, electronic equipment and storage medium
US10891268B2 (en) Methods and system for determining a most reliable record

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220527

Address after: 518000 China Aviation Center 2901, No. 1018, Huafu Road, Huahang community, Huaqiang North Street, Futian District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Ping An medical and Health Technology Service Co.,Ltd.

Address before: Room 12G, Block H, 666 Beijing East Road, Huangpu District, Shanghai 200000

Applicant before: PING AN MEDICAL AND HEALTHCARE MANAGEMENT Co.,Ltd.