CN106126741B - A kind of electric network information secure and trusted work system based on big data - Google Patents

A kind of electric network information secure and trusted work system based on big data Download PDF

Info

Publication number
CN106126741B
CN106126741B CN201610524803.6A CN201610524803A CN106126741B CN 106126741 B CN106126741 B CN 106126741B CN 201610524803 A CN201610524803 A CN 201610524803A CN 106126741 B CN106126741 B CN 106126741B
Authority
CN
China
Prior art keywords
data
submodule
attribute
quality
credible
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610524803.6A
Other languages
Chinese (zh)
Other versions
CN106126741A (en
Inventor
陈祖斌
谢铭
胡继军
翁小云
袁勇
邓戈锋
莫英红
谢菁
张鹏
唐玲丽
黄连月
曾明霏
杭聪
贺冠博
王海
黎新
何钟柱
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Boao Zongheng Network Technology Co ltd
NANJING KEERTE ELECTRIC POWER TECHNOLOGY CO.,LTD.
Original Assignee
Guangxi Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Power Grid Co Ltd filed Critical Guangxi Power Grid Co Ltd
Priority to CN201610524803.6A priority Critical patent/CN106126741B/en
Publication of CN106126741A publication Critical patent/CN106126741A/en
Application granted granted Critical
Publication of CN106126741B publication Critical patent/CN106126741B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/78Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure storage of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Water Supply & Treatment (AREA)
  • Human Resources & Organizations (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Bioethics (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of electric network information secure and trusted work system based on big data, the system architecture is with protecting function, a kind of reliability assessment mechanism and trusted relationships pass through mechanism are built in the feature base such as certification and integrity measurement, including data quality management module, useful data excavates module, authentication module and credible evaluation module, wherein quality management module describes submodule including data, quality testing submodule and data quality grading management submodule, useful data excavates module includes data prediction submodule, useful data builds submodule, useful data amendment submodule and useful data layer digging submodule, authentication module includes fingerprint recognition submodule and alarm submodule.

Description

A kind of electric network information secure and trusted work system based on big data
Technical field
The present invention relates to big data field, and in particular to a kind of electric network information secure and trusted job family based on big data System.
Background technology
Big data refers to that its content cannot be captured, managed and processed with conventional software instrument within a certain period of time Data acquisition system, the research and application of big data have become one indispensable research field of present information.
In the data message for currently using, having substantial portion of data is issued by manager, and root Modified by manager according to the demand of the suggestion of user or manager itself, for the magnanimity information of this part, such as What can preferably carry out quality management and excavation, fast and effeciently therefrom find useful information, be a urgent need to resolve Problem.
Trust computing is that the trust computing under being supported based on hardware security module is widely used in calculating and communication system Platform, the security overall to improve system.Information security has four sides:Equipment safety, data safety, content safety with Behavior safety.Behavior safety includes:The features such as the confidentiality of behavior, the integrality of behavior, the authenticity of behavior.Trust computing is Behavior safety and give birth to.
On the one hand electric network information is necessary open to society and accepts oversight, on the other hand, if not carrying out authentication Just information is conducted interviews, the normal work of power network can be impacted, potential safety hazard is caused, how in abundant public information Under the premise of ensure electric network information fail safely effectively solution.
The content of the invention
Regarding to the issue above, the present invention provides a kind of electric network information secure and trusted work system based on big data.
The purpose of the present invention is realized using following technical scheme:
A kind of electric network information secure and trusted work system based on big data, it is characterized in that, including data quality management mould Block, useful data excavate module, authentication module and credible evaluation module, and wherein quality management module includes that data describe son Module, quality testing submodule and data quality grading management submodule, useful data excavate module and locate in advance including data Reason submodule, useful data build submodule, useful data amendment submodule and useful data layer digging submodule, and identity is tested Card module includes fingerprint recognition submodule and alarm submodule;
(1) data describe submodule
Data, data attribute number in itself are described by introducing the attribute of data attribute in itself and data influencer Represented according to size, date created, comprising picture number, related data amount, wherein, related data amount is other that current data is pointed to The summation of other data of data and sensing current data;The attribute of data influence person influencer's network clustering coefficientCarry out table Show,Obtained by following methods:
Build data influence person and describe network, for each data, influencer includes multiple users and a pipe Reason person, each of which influencer represents a node, and user may browse through data, it is also possible to which data are proposed with the suggestion of modification, And manager voluntarily can both modify to data, it is also possible to modified according to user's suggestion,
Then influencer's network clustering coefficientIt is defined as:
In formula, σ1Represent that user often browses the factor of influence of secondary data applying, m represents that user browses total degree;σ2Represent User often proposes the factor of influence that suggestion for revision applies, and l represents that user advises total degree;σ3Represent manager often certainly The factor of influence that row one secondary data of modification applies, σ4Represent the influence that manager often applies according to user's suggestion one secondary data of modification The factor, δ1And δ2Respectively σ3And σ4Weights, n represents that manager changes total degree;It is user's modification frequency system Number, for representing satisfaction of the user to data, the coefficient shows that more greatly modification of the user to data is more frequent;
(2) quality testing submodule
The quality of data is evaluated using " three-level evaluation model ", three classes is splitted data into according to size of data first, Then all other attribute in addition to size of data of integrated data is evaluated its quality of data, and specific method is as follows:
Sample data is divided into quality data, middle qualitative data and low quality data, if size of data is more than threshold value T1, then the data belong to quality data, if size of data is more than threshold values T2But it is less than threshold values T1, then the data belong to middle matter Amount data, if size of data is less than threshold values T2, then the data belong to low quality data, T1> T2And T1、T2Span be [1KB, 1MB];Quality data and low quality are further divided into different brackets, all other attribute composition of data is chosen Vector, and the average of each data attribute of each grade is calculated according to sample data, it is that each grade sets up corresponding average Vector, new data vector X=(x1,…,xN) represent, the mean vector Y=(y of certain grade1,…,yN) represent, N is represented All other attribute number of data in addition to size of data, two similarities of vector are represented with similarity function R (X, Y):
R (X, Y) value is smaller, then show that similarity is bigger, conversely, then similarity is smaller, each data calculate respectively with not The similarity of the mean vector of ad eundem, so as to confirm its credit rating;
(3) quality of data differentiated control submodule
Data by being divided into different quality grade after quality testing submodule, according to data level different pairs According to carrying out differentiated control;
(4) fingerprint recognition submodule
Accessing electric network information data needs to be input into fingerprint, is matched with the fingerprint in fingerprint base, is only known by fingerprint Others could conduct interviews to electric network information data;
(5) alarm submodule
If not passing through fingerprint recognition, it is impossible to electric network information is conducted interviews, system sends alarm.
Preferably, it is characterised in that
(1) data prediction submodule
Data are divided into different field, the data fields according to needed for user's request determines client use above-mentioned three-level Evaluation model is screened to the high-quality High-level Data in field, constitutes a new tables of data K;
(2) useful data builds submodule
By the data for pre-processing, each data fields contains different classification, introduces coefficient correlation P and screens useful number According to classification:
In formula, ZsRepresent the quantity of data double-way sensing in mono- classification of new data table K, i.e., for data A and B, can B is pointed to from A, also A can be pointed to from B, Z represents the related data amount in mono- classification of tables of data K,Wherein N represents one The sum of data in classification;
(3) useful data amendment submodule
Useful data in use, can be influenceed by artificial destruction and user two aspects of voting, according to this two The revised coefficient correlation of aspect is P ';Concurrently set threshold value T, T ∈ (0,0.1], if P ' > T, show that this classification is that have Use data;When from quality data qualified useful data cannot be obtained, successively in middle qualitative data and low quality number Carry out searching qualified useful data in, and after all data search are finished, if the P ' for finally giving is maximum Value is less than T, although or P ' maximum more than T its absolute value with the difference of threshold values T less than setting value C, show nothing Although method finds useful data or can find useful data but the useful data degree of correlation that obtains is already below expection, then Prompting, modification or increase related data are now sent to manager automatically;Take C=T/5;
(4) useful data layer digging submodule
Scan data table K first, it is assumed that the maximum and minimum value of P ' are respectively P 'maxWith P 'min, by tables of data K segmentations IntoIndividual Non-overlapping Domain, P mining goes out Local frequent itemset, and wherein int is bracket function;Then it is sharp Priori property is used, connection Local frequent itemset obtains global candidate;Scanning K counts the actual branch of each candidate again Degree of holding is determining global frequentItemset;
It is according to artificial destruction and user's specific correction formula for being modified of ballot in useful data amendment submodule:
P '=P × (1-Y) × (1+H)
In formula, Y represents that data are subject to the probability of artificial destruction, H to represent that ballot user accounts for the ratio of total number of persons.
Preferably, credible evaluation module includes following submodule:
Submodule 1:Define the every credible attribute for evaluation module, credible attribute is classification, credible attribute can be with Sub- attribute is decomposed into downwards;
Submodule 2:For each credible attribute or sub- attribute, the evaluation index to it is extracted, can be evaluated with these Index is evaluated credible attribute or sub- attribute from different sides;
Submodule 3:For each credible attribute or sub- attribute, the evaluation criterion to it is defined, evaluation criterion is divided into four Level:It is excellent, good, in, it is poor, evaluation criterion is determined based on evaluation index, the i.e. valued combinations according to each evaluation index can Which rank of evaluation criterion letter attribute or sub- attribute have reached;
Submodule 4:The standard of classification that determining module is credible, credible grade scale is divided into Pyatyi, is according to each credible category The evaluation conclusion of property and draw;
Submodule 5:Before credible evaluation activity is carried out, according to the difference of assessment emphasis, different credible evaluation moulds are formed Plate, and credible evaluation activity is carried out based on this template, so that credible evaluation is more targeted, the result of assessment is more accurate.
Beneficial effects of the present invention are:Introduce network clustering coefficient to be described data, consider data in itself Attribute and data influencer attribute, improve the accuracy rate of classification, at the same by user change coefficient of frequency introducing come Reduce manual intervention, realize the target of the efficient detection quality of data;Using three-level evaluation model, memory space is saved, carried Computational efficiency high;Using brand-new similarity function, the effect of larger relative error is exaggerated so that credit rating more section Learn accurate;Introduce data correction submodule to be modified coefficient correlation, can fully overcome artificial destruction and user's ballot right The influence of data;Association rule mining application based on region division is combined with the classification of useful data, it is only necessary to three Layer digging is carried out in the sorted tables of data of level, only when current data table does not have satisfactory data, just meeting Excavated in next tables of data, amount of calculation declines to a great extent, and the excavation of the data can associate useful data classification, excavate Purpose is stronger;Authentication module is set, and electric network information is effectively ensured safely;The data storage area specified is protected, Prevent opponent from implementing certain types of physical access;All codes performed in calculating platform are imparted to prove it at one Be not tampered with the ability run in environment, from the angle of broad sense, credible calculating platform for the network user provide one it is more wide Wide security context, it describes safety problem from the angle of security system, it is ensured that the secure execution environments of user, and it is passive to break through Defence patch installing mode.
Brief description of the drawings
Using accompanying drawing, the invention will be further described, but embodiment in accompanying drawing is not constituted to any limit of the invention System, for one of ordinary skill in the art, on the premise of not paying creative work, can also obtain according to the following drawings Other accompanying drawings.
Fig. 1 is the electric network information trouble free service system architecture diagram based on big data.
Reference:Quality management module -1;Useful data excavates module -2;Authentication module -3;Credible evaluation mould Block -4;Data describe submodule -11;Quality testing submodule -12;Quality of data differentiated control submodule -13;Data are pre- Treatment submodule -21;Useful data builds submodule -22;Useful data amendment submodule -23;Useful data layer digging Module -24;Fingerprint recognition submodule -31;Alarm submodule -32.
Specific embodiment
The invention will be further described with the following Examples.
Embodiment 1:
A kind of electric network information secure and trusted work system based on big data as shown in Figure 1, including data quality management Module 1, useful data excavates module 2, authentication module 3 and credible evaluation module 4, and wherein quality management module 1 includes number According to description submodule 11, quality testing submodule 12 and data quality grading management submodule 13, useful data excavates mould Block 2 includes that data prediction submodule 21, useful data builds submodule 22, useful data amendment submodule 23 and useful data Layer digging submodule 24, authentication module 3 includes fingerprint recognition submodule 31 and alarm submodule 32.
(1) data describe submodule 11:
Data, data attribute number in itself are described by introducing the attribute of data attribute in itself and data influencer Represented according to size, date created, comprising picture number, related data amount, wherein, related data amount is other that current data is pointed to The summation of other data of data and sensing current data;The attribute of data influence person influencer's network clustering coefficientCarry out table Show,Obtained by following methods:
Build data influence person and describe network, for each data, influencer includes multiple users and a pipe Reason person, each of which influencer represents a node, and user may browse through data, it is also possible to which data are proposed with the suggestion of modification, And manager voluntarily can both modify to data, it is also possible to modified according to user's suggestion,
Then influencer's network clustering coefficientIt is defined as:
In formula, σ1Represent that user often browses the factor of influence of secondary data applying, m represents that user browses total degree;σ2Represent User often proposes the factor of influence that suggestion for revision applies, and l represents that user advises total degree;σ3Represent manager often certainly The factor of influence that row one secondary data of modification applies, σ4Represent the influence that manager often applies according to user's suggestion one secondary data of modification The factor, δ1And δ2Respectively σ3And σ4Weights, n represents that manager changes total degree;It is user's modification frequency system Number, for representing satisfaction of the user to data, the coefficient shows that more greatly modification of the user to data is more frequent.
(2) quality testing submodule 12:
The quality of data is evaluated using " three-level evaluation model ", three classes is splitted data into according to size of data first, Then all other attribute in addition to size of data of integrated data is evaluated its quality of data, and specific method is as follows:
Sample data is divided into quality data, middle qualitative data and low quality data, if size of data is more than threshold value T1, then the data belong to quality data, if size of data is more than threshold values T2But it is less than threshold values T1, then the data belong to middle matter Amount data, if size of data is less than threshold values T2, then the data belong to low quality data, T1> T2And T1、T2Span be [1KB, 1MB];Quality data and low quality are further divided into different brackets, all other attribute composition of data is chosen Vector, and the average of each data attribute of each grade is calculated according to sample data, it is that each grade sets up corresponding average Vector, new data vector X=(x1,…,xN) represent, the mean vector Y=(y of certain grade1,…,yN) represent, N is represented All other attribute number of data in addition to size of data, two similarities of vector are represented with similarity function R (X, Y):
R (X, Y) value is smaller, then show that similarity is bigger, conversely, then similarity is smaller, each data calculate respectively with not The similarity of the mean vector of ad eundem, so as to confirm its credit rating.
(3) quality of data differentiated control submodule 13:
Data by being divided into different quality grade after quality testing submodule, according to data level different pairs According to carrying out differentiated control.
(4) fingerprint recognition submodule 31:
Accessing electric network information data needs to be input into fingerprint, is matched with the fingerprint in fingerprint base, is only known by fingerprint Others could conduct interviews to electric network information data.
(5) alarm submodule 32:
If not passing through fingerprint recognition, it is impossible to electric network information is conducted interviews, system sends alarm.
Preferably, it is characterised in that
(1) data prediction submodule 21:
Data are divided into different field, the data fields according to needed for user's request determines client use above-mentioned three-level Evaluation model is screened to the high-quality High-level Data in field, constitutes a new tables of data K.
(2) useful data builds submodule 22:
By the data for pre-processing, each data fields contains different classification, introduces coefficient correlation P and screens useful number According to classification:
In formula, ZsRepresent the quantity of data double-way sensing in mono- classification of new data table K, i.e., for data A and B, can B is pointed to from A, also A can be pointed to from B, Z represents the related data amount in mono- classification of tables of data K,Wherein N represents one The sum of data in classification.
(3) useful data amendment submodule 23:
Useful data in use, can be influenceed by artificial destruction and user two aspects of voting, according to this two The revised coefficient correlation of aspect is P ';Concurrently set threshold value T, T ∈ (0,0.1], if P ' > T, show that this classification is that have Use data;When from quality data qualified useful data cannot be obtained, successively in middle qualitative data and low quality number Carry out searching qualified useful data in, and after all data search are finished, if the P ' for finally giving is maximum Value is less than T, although or P ' maximum more than T its absolute value with the difference of threshold values T less than setting value C, show nothing Although method finds useful data or can find useful data but the useful data degree of correlation that obtains is already below expection, then Prompting, modification or increase related data are now sent to manager automatically.C=T/5 is taken, data area increase by 5% is pointed out, but It is that amount of calculation increased 3.7%.
(4) useful data layer digging submodule 24:
Scan data table K first, it is assumed that the maximum and minimum value of P ' are respectively P 'maxWith P 'min, by tables of data K segmentations IntoIndividual Non-overlapping Domain, P mining goes out Local frequent itemset, and wherein int is bracket function;Then it is sharp Priori property is used, connection Local frequent itemset obtains global candidate;Scanning K counts the actual branch of each candidate again Degree of holding is determining global frequentItemset.
It is according to artificial destruction and user's specific correction formula for being modified of ballot in useful data amendment submodule 23:
P '=P × (1-Y) × (1+H)
In formula, Y represents that data are subject to the probability of artificial destruction, H to represent that ballot user accounts for the ratio of total number of persons.
Preferably, credible evaluation module 4 includes following submodule:
Submodule 1:Define the every credible attribute for evaluation module, credible attribute is classification, credible attribute can be with Sub- attribute is decomposed into downwards;
Submodule 2:For each credible attribute or sub- attribute, the evaluation index to it is extracted, can be evaluated with these Index is evaluated credible attribute or sub- attribute from different sides;
Submodule 3:For each credible attribute or sub- attribute, the evaluation criterion to it is defined, evaluation criterion is divided into four Level:It is excellent, good, in, it is poor, evaluation criterion is determined based on evaluation index, the i.e. valued combinations according to each evaluation index can Which rank of evaluation criterion letter attribute or sub- attribute have reached;
Submodule 4:The standard of classification that determining module is credible, credible grade scale is divided into Pyatyi, is according to each credible category The evaluation conclusion of property and draw;
Submodule 5:Before credible evaluation activity is carried out, according to the difference of assessment emphasis, different credible evaluation moulds are formed Plate, and credible evaluation activity is carried out based on this template, so that credible evaluation is more targeted, the result of assessment is more accurate.
In the invention of the present embodiment, introduce network clustering coefficient and data are described, consider data in itself Attribute and the attribute of data influencer, improve the accuracy rate of classification, while being subtracted by the introducing of user's modification coefficient of frequency Small manual intervention, realizes the target of the efficient detection quality of data;Using three-level evaluation model, memory space is saved, improved Computational efficiency;Using brand-new similarity function, the effect of larger relative error is exaggerated so that credit rating more science Accurately;Introduce data correction submodule to be modified coefficient correlation, can fully overcome artificial destruction and user's ballot logarithm According to influence, take C=T/5, point out data area to increase by 5%, but amount of calculation increased 3.7%;Region division will be based on Association rule mining application is combined with the classification of useful data, it is only necessary to divided in the sorted tables of data of three-level Layer is excavated, and only when current data table does not have satisfactory data, can just be excavated in next tables of data, is calculated Amount declines to a great extent, and the excavation of the data can associate useful data classification, excavate purpose stronger;Authentication module is set, Electric network information is effectively ensured safely;All codes performed in calculating platform are imparted not usurped at one to prove it Change the ability run in environment, from the angle of broad sense, credible calculating platform provides a more broad peace for the network user Full ambient engine, it describes safety problem from the angle of security system, it is ensured that the secure execution environments of user, breaks through Passive Defence and beats Patch mode.
Embodiment 2:
A kind of electric network information secure and trusted work system based on big data as shown in Figure 1, including data quality management Module 1, useful data excavates module 2, authentication module 3 and credible evaluation module 4, and wherein quality management module 1 includes number According to description submodule 11, quality testing submodule 12 and data quality grading management submodule 13, useful data excavates mould Block 2 includes that data prediction submodule 21, useful data builds submodule 22, useful data amendment submodule 23 and useful data Layer digging submodule 24, authentication module 3 includes fingerprint recognition submodule 31 and alarm submodule 32.
(1) data describe submodule 11:
Data, data attribute number in itself are described by introducing the attribute of data attribute in itself and data influencer Represented according to size, date created, comprising picture number, related data amount, wherein, related data amount is other that current data is pointed to The summation of other data of data and sensing current data;The attribute of data influence person influencer's network clustering coefficientCarry out table Show,Obtained by following methods:
Build data influence person and describe network, for each data, influencer includes multiple users and a pipe Reason person, each of which influencer represents a node, and user may browse through data, it is also possible to which data are proposed with the suggestion of modification, And manager voluntarily can both modify to data, it is also possible to modified according to user's suggestion,
Then influencer's network clustering coefficientIt is defined as:
In formula, σ1Represent that user often browses the factor of influence of secondary data applying, m represents that user browses total degree;σ2Represent User often proposes the factor of influence that suggestion for revision applies, and l represents that user advises total degree;σ3Represent manager often certainly The factor of influence that row one secondary data of modification applies, σ4Represent the influence that manager often applies according to user's suggestion one secondary data of modification The factor, δ1And δ2Respectively σ3And σ4Weights, n represents that manager changes total degree;It is user's modification frequency system Number, for representing satisfaction of the user to data, the coefficient shows that more greatly modification of the user to data is more frequent.
(2) quality testing submodule 12:
The quality of data is evaluated using " three-level evaluation model ", three classes is splitted data into according to size of data first, Then all other attribute in addition to size of data of integrated data is evaluated its quality of data, and specific method is as follows:
Sample data is divided into quality data, middle qualitative data and low quality data, if size of data is more than threshold value T1, then the data belong to quality data, if size of data is more than threshold values T2But it is less than threshold values T1, then the data belong to middle matter Amount data, if size of data is less than threshold values T2, then the data belong to low quality data, T1> T2And T1、T2Span be [1KB, 1MB];Quality data and low quality are further divided into different brackets, all other attribute composition of data is chosen Vector, and the average of each data attribute of each grade is calculated according to sample data, it is that each grade sets up corresponding average Vector, new data vector X=(x1,…,xN) represent, the mean vector Y=(y of certain grade1,…,yN) represent, N is represented All other attribute number of data in addition to size of data, two similarities of vector are represented with similarity function R (X, Y):
R (X, Y) value is smaller, then show that similarity is bigger, conversely, then similarity is smaller, each data calculate respectively with not The similarity of the mean vector of ad eundem, so as to confirm its credit rating.
(3) quality of data differentiated control submodule 13:
Data by being divided into different quality grade after quality testing submodule, according to data level different pairs According to carrying out differentiated control.
(4) fingerprint recognition submodule 31:
Accessing electric network information data needs to be input into fingerprint, is matched with the fingerprint in fingerprint base, is only known by fingerprint Others could conduct interviews to electric network information data.
(5) alarm submodule 32:
If not passing through fingerprint recognition, it is impossible to electric network information is conducted interviews, system sends alarm.
Preferably, it is characterised in that
(1) data prediction submodule 21:
Data are divided into different field, the data fields according to needed for user's request determines client use above-mentioned three-level Evaluation model is screened to the high-quality High-level Data in field, constitutes a new tables of data K.
(2) useful data builds submodule 22:
By the data for pre-processing, each data fields contains different classification, introduces coefficient correlation P and screens useful number According to classification:
In formula, ZsRepresent the quantity of data double-way sensing in mono- classification of new data table K, i.e., for data A and B, can B is pointed to from A, also A can be pointed to from B, Z represents the related data amount in mono- classification of tables of data K,Wherein N represents one The sum of data in classification.
(3) useful data amendment submodule 23:
Useful data in use, can be influenceed by artificial destruction and user two aspects of voting, according to this two The revised coefficient correlation of aspect is P ';Concurrently set threshold value T, T ∈ (0,0.1], if P ' > T, show that this classification is that have Use data;When from quality data qualified useful data cannot be obtained, successively in middle qualitative data and low quality number Carry out searching qualified useful data in, and after all data search are finished, if the P ' for finally giving is maximum Value is less than T, although or P ' maximum more than T its absolute value with the difference of threshold values T less than setting value C, show nothing Although method finds useful data or can find useful data but the useful data degree of correlation that obtains is already below expection, then Prompting, modification or increase related data are now sent to manager automatically.C=T/5 is taken, data area increase by 5% is pointed out, but It is that amount of calculation increased 3.7%.
(4) useful data layer digging submodule 24:
Scan data table K first, it is assumed that the maximum and minimum value of P ' are respectively P 'maxWith P 'min, by tables of data K segmentations IntoIndividual Non-overlapping Domain, P mining goes out Local frequent itemset, and wherein int is bracket function;Then it is sharp Priori property is used, connection Local frequent itemset obtains global candidate;Scanning K counts the actual branch of each candidate again Degree of holding is determining global frequentItemset.
It is according to artificial destruction and user's specific correction formula for being modified of ballot in useful data amendment submodule 23:
P '=P × (1-Y) × (1+H)
In formula, Y represents that data are subject to the probability of artificial destruction, H to represent that ballot user accounts for the ratio of total number of persons.
Preferably, credible evaluation module 4 includes following submodule:
Submodule 1:Define the every credible attribute for evaluation module, credible attribute is classification, credible attribute can be with Sub- attribute is decomposed into downwards;
Submodule 2:For each credible attribute or sub- attribute, the evaluation index to it is extracted, can be evaluated with these Index is evaluated credible attribute or sub- attribute from different sides;
Submodule 3:For each credible attribute or sub- attribute, the evaluation criterion to it is defined, evaluation criterion is divided into four Level:It is excellent, good, in, it is poor, evaluation criterion is determined based on evaluation index, the i.e. valued combinations according to each evaluation index can Which rank of evaluation criterion letter attribute or sub- attribute have reached;
Submodule 4:The standard of classification that determining module is credible, credible grade scale is divided into Pyatyi, is according to each credible category The evaluation conclusion of property and draw;
Submodule 5:Before credible evaluation activity is carried out, according to the difference of assessment emphasis, different credible evaluation moulds are formed Plate, and credible evaluation activity is carried out based on this template, so that credible evaluation is more targeted, the result of assessment is more accurate.
In the invention of the present embodiment, introduce network clustering coefficient and data are described, consider data in itself Attribute and the attribute of data influencer, improve the accuracy rate of classification, while being subtracted by the introducing of user's modification coefficient of frequency Small manual intervention, realizes the target of the efficient detection quality of data;Using three-level evaluation model, memory space is saved, improved Computational efficiency;Using brand-new similarity function, the effect of larger relative error is exaggerated so that credit rating more science Accurately;Introduce data correction submodule to be modified coefficient correlation, can fully overcome artificial destruction and user's ballot logarithm According to influence, take C=T/6, point out data area to increase by 4%, but amount of calculation increased 3.3%;Region division will be based on Association rule mining application is combined with the classification of useful data, it is only necessary to divided in the sorted tables of data of three-level Layer is excavated, and only when current data table does not have satisfactory data, can just be excavated in next tables of data, is calculated Amount declines to a great extent, and the excavation of the data can associate useful data classification, excavate purpose stronger;Authentication module is set, Electric network information is effectively ensured safely;All codes performed in calculating platform are imparted not usurped at one to prove it Change the ability run in environment, from the angle of broad sense, credible calculating platform provides a more broad peace for the network user Full ambient engine, it describes safety problem from the angle of security system, it is ensured that the secure execution environments of user, breaks through Passive Defence and beats Patch mode.
Embodiment 3:
A kind of electric network information secure and trusted work system based on big data as shown in Figure 1, including data quality management Module 1, useful data excavates module 2, authentication module 3 and credible evaluation module 4, and wherein quality management module 1 includes number According to description submodule 11, quality testing submodule 12 and data quality grading management submodule 13, useful data excavates mould Block 2 includes that data prediction submodule 21, useful data builds submodule 22, useful data amendment submodule 23 and useful data Layer digging submodule 24, authentication module 3 includes fingerprint recognition submodule 31 and alarm submodule 32.
(1) data describe submodule 11:
Data, data attribute number in itself are described by introducing the attribute of data attribute in itself and data influencer Represented according to size, date created, comprising picture number, related data amount, wherein, related data amount is other that current data is pointed to The summation of other data of data and sensing current data;The attribute of data influence person influencer's network clustering coefficientCarry out table Show,Obtained by following methods:
Build data influence person and describe network, for each data, influencer includes multiple users and a pipe Reason person, each of which influencer represents a node, and user may browse through data, it is also possible to which data are proposed with the suggestion of modification, And manager voluntarily can both modify to data, it is also possible to modified according to user's suggestion,
Then influencer's network clustering coefficientIt is defined as:
In formula, σ1Represent that user often browses the factor of influence of secondary data applying, m represents that user browses total degree;σ2Represent User often proposes the factor of influence that suggestion for revision applies, and l represents that user advises total degree;σ3Represent manager often certainly The factor of influence that row one secondary data of modification applies, σ4Represent the influence that manager often applies according to user's suggestion one secondary data of modification The factor, δ1And δ2Respectively σ3And σ4Weights, n represents that manager changes total degree;It is user's modification frequency system Number, for representing satisfaction of the user to data, the coefficient shows that more greatly modification of the user to data is more frequent.
(2) quality testing submodule 12:
The quality of data is evaluated using " three-level evaluation model ", three classes is splitted data into according to size of data first, Then all other attribute in addition to size of data of integrated data is evaluated its quality of data, and specific method is as follows:
Sample data is divided into quality data, middle qualitative data and low quality data, if size of data is more than threshold value T1, then the data belong to quality data, if size of data is more than threshold values T2But it is less than threshold values T1, then the data belong to middle matter Amount data, if size of data is less than threshold values T2, then the data belong to low quality data, T1> T2And T1、T2Span be [1KB, 1MB];Quality data and low quality are further divided into different brackets, all other attribute composition of data is chosen Vector, and the average of each data attribute of each grade is calculated according to sample data, it is that each grade sets up corresponding average Vector, new data vector X=(x1,…,xN) represent, the mean vector Y=(y of certain grade1,…,yN) represent, N is represented All other attribute number of data in addition to size of data, two similarities of vector are represented with similarity function R (X, Y):
R (X, Y) value is smaller, then show that similarity is bigger, conversely, then similarity is smaller, each data calculate respectively with not The similarity of the mean vector of ad eundem, so as to confirm its credit rating.
(3) quality of data differentiated control submodule 13:
Data by being divided into different quality grade after quality testing submodule, according to data level different pairs According to carrying out differentiated control.
(4) fingerprint recognition submodule 31:
Accessing electric network information data needs to be input into fingerprint, is matched with the fingerprint in fingerprint base, is only known by fingerprint Others could conduct interviews to electric network information data.
(5) alarm submodule 32:
If not passing through fingerprint recognition, it is impossible to electric network information is conducted interviews, system sends alarm.
Preferably, it is characterised in that
(1) data prediction submodule 21:
Data are divided into different field, the data fields according to needed for user's request determines client use above-mentioned three-level Evaluation model is screened to the high-quality High-level Data in field, constitutes a new tables of data K.
(2) useful data builds submodule 22:
By the data for pre-processing, each data fields contains different classification, introduces coefficient correlation P and screens useful number According to classification:
In formula, ZsRepresent the quantity of data double-way sensing in mono- classification of new data table K, i.e., for data A and B, can B is pointed to from A, also A can be pointed to from B, Z represents the related data amount in mono- classification of tables of data K,Wherein N represents one The sum of data in classification.
(3) useful data amendment submodule 23:
Useful data in use, can be influenceed by artificial destruction and user two aspects of voting, according to this two The revised coefficient correlation of aspect is P ';Concurrently set threshold value T, T ∈ (0,0.1], if P ' > T, show that this classification is that have Use data;When from quality data qualified useful data cannot be obtained, successively in middle qualitative data and low quality number Carry out searching qualified useful data in, and after all data search are finished, if the P ' for finally giving is maximum Value is less than T, although or P ' maximum more than T its absolute value with the difference of threshold values T less than setting value C, show nothing Although method finds useful data or can find useful data but the useful data degree of correlation that obtains is already below expection, then Prompting, modification or increase related data are now sent to manager automatically.C=T/5 is taken, data area increase by 5% is pointed out, but It is that amount of calculation increased 3.7%.
(4) useful data layer digging submodule 24:
Scan data table K first, it is assumed that the maximum and minimum value of P ' are respectively P 'maxWith P 'min, by tables of data K segmentations IntoIndividual Non-overlapping Domain, P mining goes out Local frequent itemset, and wherein int is bracket function;Then it is sharp Priori property is used, connection Local frequent itemset obtains global candidate;Scanning K counts the actual branch of each candidate again Degree of holding is determining global frequentItemset.
It is according to artificial destruction and user's specific correction formula for being modified of ballot in useful data amendment submodule 23:
P '=P × (1-Y) × (1+H)
In formula, Y represents that data are subject to the probability of artificial destruction, H to represent that ballot user accounts for the ratio of total number of persons.
Preferably, credible evaluation module 4 includes following submodule:
Submodule 1:Define the every credible attribute for evaluation module, credible attribute is classification, credible attribute can be with Sub- attribute is decomposed into downwards;
Submodule 2:For each credible attribute or sub- attribute, the evaluation index to it is extracted, can be evaluated with these Index is evaluated credible attribute or sub- attribute from different sides;
Submodule 3:For each credible attribute or sub- attribute, the evaluation criterion to it is defined, evaluation criterion is divided into four Level:It is excellent, good, in, it is poor, evaluation criterion is determined based on evaluation index, the i.e. valued combinations according to each evaluation index can Which rank of evaluation criterion letter attribute or sub- attribute have reached;
Submodule 4:The standard of classification that determining module is credible, credible grade scale is divided into Pyatyi, is according to each credible category The evaluation conclusion of property and draw;
Submodule 5:Before credible evaluation activity is carried out, according to the difference of assessment emphasis, different credible evaluation moulds are formed Plate, and credible evaluation activity is carried out based on this template, so that credible evaluation is more targeted, the result of assessment is more accurate.
In the invention of the present embodiment, introduce network clustering coefficient and data are described, consider data in itself Attribute and the attribute of data influencer, improve the accuracy rate of classification, while being subtracted by the introducing of user's modification coefficient of frequency Small manual intervention, realizes the target of the efficient detection quality of data;Using three-level evaluation model, memory space is saved, improved Computational efficiency;Using brand-new similarity function, the effect of larger relative error is exaggerated so that credit rating more science Accurately;Introduce data correction submodule to be modified coefficient correlation, can fully overcome artificial destruction and user's ballot logarithm According to influence, take C=T/7, point out data area to increase by 3.5%, but amount of calculation increased 3%;Region division will be based on Association rule mining application is combined with the classification of useful data, it is only necessary to divided in the sorted tables of data of three-level Layer is excavated, and only when current data table does not have satisfactory data, can just be excavated in next tables of data, is calculated Amount declines to a great extent, and the excavation of the data can associate useful data classification, excavate purpose stronger;Authentication module is set, Electric network information is effectively ensured safely;All codes performed in calculating platform are imparted not usurped at one to prove it Change the ability run in environment, from the angle of broad sense, credible calculating platform provides a more broad peace for the network user Full ambient engine, it describes safety problem from the angle of security system, it is ensured that the secure execution environments of user, breaks through Passive Defence and beats Patch mode.
Embodiment 4:
A kind of electric network information secure and trusted work system based on big data as shown in Figure 1, including data quality management Module 1, useful data excavates module 2, authentication module 3 and credible evaluation module 4, and wherein quality management module 1 includes number According to description submodule 11, quality testing submodule 12 and data quality grading management submodule 13, useful data excavates mould Block 2 includes that data prediction submodule 21, useful data builds submodule 22, useful data amendment submodule 23 and useful data Layer digging submodule 24, authentication module 3 includes fingerprint recognition submodule 31 and alarm submodule 32.
(1) data describe submodule 11:
Data, data attribute number in itself are described by introducing the attribute of data attribute in itself and data influencer Represented according to size, date created, comprising picture number, related data amount, wherein, related data amount is other that current data is pointed to The summation of other data of data and sensing current data;The attribute of data influence person influencer's network clustering coefficientCarry out table Show,Obtained by following methods:
Build data influence person and describe network, for each data, influencer includes multiple users and a pipe Reason person, each of which influencer represents a node, and user may browse through data, it is also possible to which data are proposed with the suggestion of modification, And manager voluntarily can both modify to data, it is also possible to modified according to user's suggestion,
Then influencer's network clustering coefficientIt is defined as:
In formula, σ1Represent that user often browses the factor of influence of secondary data applying, m represents that user browses total degree;σ2Represent User often proposes the factor of influence that suggestion for revision applies, and l represents that user advises total degree;σ3Represent manager often certainly The factor of influence that row one secondary data of modification applies, σ4Represent the influence that manager often applies according to user's suggestion one secondary data of modification The factor, δ1And δ2Respectively σ3And σ4Weights, n represents that manager changes total degree;It is user's modification frequency system Number, for representing satisfaction of the user to data, the coefficient shows that more greatly modification of the user to data is more frequent.
(2) quality testing submodule 12:
The quality of data is evaluated using " three-level evaluation model ", three classes is splitted data into according to size of data first, Then all other attribute in addition to size of data of integrated data is evaluated its quality of data, and specific method is as follows:
Sample data is divided into quality data, middle qualitative data and low quality data, if size of data is more than threshold value T1, then the data belong to quality data, if size of data is more than threshold values T2But it is less than threshold values T1, then the data belong to middle matter Amount data, if size of data is less than threshold values T2, then the data belong to low quality data, T1> T2And T1、T2Span be [1KB, 1MB];Quality data and low quality are further divided into different brackets, all other attribute composition of data is chosen Vector, and the average of each data attribute of each grade is calculated according to sample data, it is that each grade sets up corresponding average Vector, new data vector X=(x1,…,xN) represent, the mean vector Y=(y of certain grade1,…,yN) represent, N is represented All other attribute number of data in addition to size of data, two similarities of vector are represented with similarity function R (X, Y):
R (X, Y) value is smaller, then show that similarity is bigger, conversely, then similarity is smaller, each data calculate respectively with not The similarity of the mean vector of ad eundem, so as to confirm its credit rating.
(3) quality of data differentiated control submodule 13:
Data by being divided into different quality grade after quality testing submodule, according to data level different pairs According to carrying out differentiated control.
(4) fingerprint recognition submodule 31:
Accessing electric network information data needs to be input into fingerprint, is matched with the fingerprint in fingerprint base, is only known by fingerprint Others could conduct interviews to electric network information data.
(5) alarm submodule 32:
If not passing through fingerprint recognition, it is impossible to electric network information is conducted interviews, system sends alarm.
Preferably, it is characterised in that
(1) data prediction submodule 21:
Data are divided into different field, the data fields according to needed for user's request determines client use above-mentioned three-level Evaluation model is screened to the high-quality High-level Data in field, constitutes a new tables of data K.
(2) useful data builds submodule 22:
By the data for pre-processing, each data fields contains different classification, introduces coefficient correlation P and screens useful number According to classification:
In formula, ZsRepresent the quantity of data double-way sensing in mono- classification of new data table K, i.e., for data A and B, can B is pointed to from A, also A can be pointed to from B, Z represents the related data amount in mono- classification of tables of data K,Wherein N represents one The sum of data in classification.
(3) useful data amendment submodule 23:
Useful data in use, can be influenceed by artificial destruction and user two aspects of voting, according to this two The revised coefficient correlation of aspect is P ';Concurrently set threshold value T, T ∈ (0,0.1], if P ' > T, show that this classification is that have Use data;When from quality data qualified useful data cannot be obtained, successively in middle qualitative data and low quality number Carry out searching qualified useful data in, and after all data search are finished, if the P ' for finally giving is maximum Value is less than T, although or P ' maximum more than T its absolute value with the difference of threshold values T less than setting value C, show nothing Although method finds useful data or can find useful data but the useful data degree of correlation that obtains is already below expection, then Prompting, modification or increase related data are now sent to manager automatically.C=T/5 is taken, data area increase by 5% is pointed out, but It is that amount of calculation increased 3.7%.
(4) useful data layer digging submodule 24:
Scan data table K first, it is assumed that the maximum and minimum value of P ' are respectively P 'maxWith P 'min, by tables of data K segmentations IntoIndividual Non-overlapping Domain, P mining goes out Local frequent itemset, and wherein int is bracket function;Then it is sharp Priori property is used, connection Local frequent itemset obtains global candidate;Scanning K counts the actual branch of each candidate again Degree of holding is determining global frequentItemset.
It is according to artificial destruction and user's specific correction formula for being modified of ballot in useful data amendment submodule 23:
P '=P × (1-Y) × (1+H)
In formula, Y represents that data are subject to the probability of artificial destruction, H to represent that ballot user accounts for the ratio of total number of persons.
Preferably, credible evaluation module 4 includes following submodule:
Submodule 1:Define the every credible attribute for evaluation module, credible attribute is classification, credible attribute can be with Sub- attribute is decomposed into downwards;
Submodule 2:For each credible attribute or sub- attribute, the evaluation index to it is extracted, can be evaluated with these Index is evaluated credible attribute or sub- attribute from different sides;
Submodule 3:For each credible attribute or sub- attribute, the evaluation criterion to it is defined, evaluation criterion is divided into four Level:It is excellent, good, in, it is poor, evaluation criterion is determined based on evaluation index, the i.e. valued combinations according to each evaluation index can Which rank of evaluation criterion letter attribute or sub- attribute have reached;
Submodule 4:The standard of classification that determining module is credible, credible grade scale is divided into Pyatyi, is according to each credible category The evaluation conclusion of property and draw;
Submodule 5:Before credible evaluation activity is carried out, according to the difference of assessment emphasis, different credible evaluation moulds are formed Plate, and credible evaluation activity is carried out based on this template, so that credible evaluation is more targeted, the result of assessment is more accurate.
In the invention of the present embodiment, introduce network clustering coefficient and data are described, consider data in itself Attribute and the attribute of data influencer, improve the accuracy rate of classification, while being subtracted by the introducing of user's modification coefficient of frequency Small manual intervention, realizes the target of the efficient detection quality of data;Using three-level evaluation model, memory space is saved, improved Computational efficiency;Using brand-new similarity function, the effect of larger relative error is exaggerated so that credit rating more science Accurately;Introduce data correction submodule to be modified coefficient correlation, can fully overcome artificial destruction and user's ballot logarithm According to influence, take C=T/8, point out data area to increase by 3%, but amount of calculation increased 2.7%;Region division will be based on Association rule mining application is combined with the classification of useful data, it is only necessary to divided in the sorted tables of data of three-level Layer is excavated, and only when current data table does not have satisfactory data, can just be excavated in next tables of data, is calculated Amount declines to a great extent, and the excavation of the data can associate useful data classification, excavate purpose stronger;Authentication module is set, Electric network information is effectively ensured safely;All codes performed in calculating platform are imparted not usurped at one to prove it Change the ability run in environment, from the angle of broad sense, credible calculating platform provides a more broad peace for the network user Full ambient engine, it describes safety problem from the angle of security system, it is ensured that the secure execution environments of user, breaks through Passive Defence and beats Patch mode.
Embodiment 5:
A kind of electric network information secure and trusted work system based on big data as shown in Figure 1, including data quality management Module 1, useful data excavates module 2, authentication module 3 and credible evaluation module 4, and wherein quality management module 1 includes number According to description submodule 11, quality testing submodule 12 and data quality grading management submodule 13, useful data excavates mould Block 2 includes that data prediction submodule 21, useful data builds submodule 22, useful data amendment submodule 23 and useful data Layer digging submodule 24, authentication module 3 includes fingerprint recognition submodule 31 and alarm submodule 32.
(1) data describe submodule 11:
Data, data attribute number in itself are described by introducing the attribute of data attribute in itself and data influencer Represented according to size, date created, comprising picture number, related data amount, wherein, related data amount is other that current data is pointed to The summation of other data of data and sensing current data;The attribute of data influence person influencer's network clustering coefficientCarry out table Show,Obtained by following methods:
Build data influence person and describe network, for each data, influencer includes multiple users and a pipe Reason person, each of which influencer represents a node, and user may browse through data, it is also possible to which data are proposed with the suggestion of modification, And manager voluntarily can both modify to data, it is also possible to modified according to user's suggestion,
Then influencer's network clustering coefficientIt is defined as:
In formula, σ1Represent that user often browses the factor of influence of secondary data applying, m represents that user browses total degree;σ2Represent User often proposes the factor of influence that suggestion for revision applies, and l represents that user advises total degree;σ3Represent manager often certainly The factor of influence that row one secondary data of modification applies, σ4Represent the influence that manager often applies according to user's suggestion one secondary data of modification The factor, δ1And δ2Respectively σ3And σ4Weights, n represents that manager changes total degree;It is user's modification frequency system Number, for representing satisfaction of the user to data, the coefficient shows that more greatly modification of the user to data is more frequent.
(2) quality testing submodule 12:
The quality of data is evaluated using " three-level evaluation model ", three classes is splitted data into according to size of data first, Then all other attribute in addition to size of data of integrated data is evaluated its quality of data, and specific method is as follows:
Sample data is divided into quality data, middle qualitative data and low quality data, if size of data is more than threshold value T1, then the data belong to quality data, if size of data is more than threshold values T2But it is less than threshold values T1, then the data belong to middle matter Amount data, if size of data is less than threshold values T2, then the data belong to low quality data, T1> T2And T1、T2Span be [1KB, 1MB];Quality data and low quality are further divided into different brackets, all other attribute composition of data is chosen Vector, and the average of each data attribute of each grade is calculated according to sample data, it is that each grade sets up corresponding average Vector, new data vector X=(x1,…,xN) represent, the mean vector Y=(y of certain grade1,…,yN) represent, N is represented All other attribute number of data in addition to size of data, two similarities of vector are represented with similarity function R (X, Y):
R (X, Y) value is smaller, then show that similarity is bigger, conversely, then similarity is smaller, each data calculate respectively with not The similarity of the mean vector of ad eundem, so as to confirm its credit rating.
(3) quality of data differentiated control submodule 13:
Data by being divided into different quality grade after quality testing submodule, according to data level different pairs According to carrying out differentiated control.
(4) fingerprint recognition submodule 31:
Accessing electric network information data needs to be input into fingerprint, is matched with the fingerprint in fingerprint base, is only known by fingerprint Others could conduct interviews to electric network information data.
(5) alarm submodule 32:
If not passing through fingerprint recognition, it is impossible to electric network information is conducted interviews, system sends alarm.
Preferably, it is characterised in that
(1) data prediction submodule 21:
Data are divided into different field, the data fields according to needed for user's request determines client use above-mentioned three-level Evaluation model is screened to the high-quality High-level Data in field, constitutes a new tables of data K.
(2) useful data builds submodule 22:
By the data for pre-processing, each data fields contains different classification, introduces coefficient correlation P and screens useful number According to classification:
In formula, ZsRepresent the quantity of data double-way sensing in mono- classification of new data table K, i.e., for data A and B, can B is pointed to from A, also A can be pointed to from B, Z represents the related data amount in mono- classification of tables of data K,Wherein N represents one The sum of data in classification.
(3) useful data amendment submodule 23:
Useful data in use, can be influenceed by artificial destruction and user two aspects of voting, according to this two The revised coefficient correlation of aspect is P ';Concurrently set threshold value T, T ∈ (0,0.1], if P ' > T, show that this classification is that have Use data;When from quality data qualified useful data cannot be obtained, successively in middle qualitative data and low quality number Carry out searching qualified useful data in, and after all data search are finished, if the P ' for finally giving is maximum Value is less than T, although or P ' maximum more than T its absolute value with the difference of threshold values T less than setting value C, show nothing Although method finds useful data or can find useful data but the useful data degree of correlation that obtains is already below expection, then Prompting, modification or increase related data are now sent to manager automatically.C=T/5 is taken, data area increase by 5% is pointed out, but It is that amount of calculation increased 3.7%.
(4) useful data layer digging submodule 24:
Scan data table K first, it is assumed that the maximum and minimum value of P ' are respectively P 'maxWith P 'min, by tables of data K segmentations IntoIndividual Non-overlapping Domain, P mining goes out Local frequent itemset, and wherein int is bracket function;Then it is sharp Priori property is used, connection Local frequent itemset obtains global candidate;Scanning K counts the actual branch of each candidate again Degree of holding is determining global frequentItemset.
It is according to artificial destruction and user's specific correction formula for being modified of ballot in useful data amendment submodule 23:
P '=P × (1-Y) × (1+H)
In formula, Y represents that data are subject to the probability of artificial destruction, H to represent that ballot user accounts for the ratio of total number of persons.
Preferably, credible evaluation module 4 includes following submodule:
Submodule 1:Define the every credible attribute for evaluation module, credible attribute is classification, credible attribute can be with Sub- attribute is decomposed into downwards;
Submodule 2:For each credible attribute or sub- attribute, the evaluation index to it is extracted, can be evaluated with these Index is evaluated credible attribute or sub- attribute from different sides;
Submodule 3:For each credible attribute or sub- attribute, the evaluation criterion to it is defined, evaluation criterion is divided into four Level:It is excellent, good, in, it is poor, evaluation criterion is determined based on evaluation index, the i.e. valued combinations according to each evaluation index can Which rank of evaluation criterion letter attribute or sub- attribute have reached;
Submodule 4:The standard of classification that determining module is credible, credible grade scale is divided into Pyatyi, is according to each credible category The evaluation conclusion of property and draw;
Submodule 5:Before credible evaluation activity is carried out, according to the difference of assessment emphasis, different credible evaluation moulds are formed Plate, and credible evaluation activity is carried out based on this template, so that credible evaluation is more targeted, the result of assessment is more accurate.
In the invention of the present embodiment, introduce network clustering coefficient and data are described, consider data in itself Attribute and the attribute of data influencer, improve the accuracy rate of classification, while being subtracted by the introducing of user's modification coefficient of frequency Small manual intervention, realizes the target of the efficient detection quality of data;Using three-level evaluation model, memory space is saved, improved Computational efficiency;Using brand-new similarity function, the effect of larger relative error is exaggerated so that credit rating more science Accurately;Introduce data correction submodule to be modified coefficient correlation, can fully overcome artificial destruction and user's ballot logarithm According to influence, take C=T/9, point out data area to increase by 2.7%, but amount of calculation increased 2.5%;Will be based on region division Association rule mining application be combined with the classification of useful data, it is only necessary to carried out in the sorted tables of data of three-level Layer digging, only when current data table does not have satisfactory data, can just be excavated in next tables of data, be counted Calculation amount declines to a great extent, and the excavation of the data can associate useful data classification, excavate purpose stronger;Authentication mould is set Block, electric network information is effectively ensured safely;All codes performed in calculating platform are imparted to prove it at one not Be tampered the ability run in environment, from the angle of broad sense, credible calculating platform for the network user provide one it is more broad Security context, it describes safety problem from the angle of security system, it is ensured that the secure execution environments of user, and it is passive anti-to break through Imperial patch installing mode.
Finally it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than the present invention is protected The limitation of scope is protected, although being explained to the present invention with reference to preferred embodiment, one of ordinary skill in the art should Work as understanding, technical scheme can be modified or equivalent, without deviating from the reality of technical solution of the present invention Matter and scope.

Claims (1)

1. a kind of electric network information secure and trusted work system based on big data, it is characterized in that, including data quality management module, Useful data excavates module, authentication module and credible evaluation module, and wherein data quality management module is described including data Submodule, quality testing submodule and data quality grading management submodule, useful data excavates module includes that data are pre- Treatment submodule, useful data build submodule, useful data amendment submodule and useful data layer digging submodule, identity Authentication module includes fingerprint recognition submodule and alarm submodule;
(1) data describe submodule
Data, data attribute number in itself are described by introducing the attribute of trust data attribute in itself and data influencer Represented according to size, date created, comprising picture number, related data amount, wherein, related data amount is other that current data is pointed to The summation of other data of data and sensing current data;The attribute of data influence person influencer's network clustering coefficientCarry out table Show,Obtained by following methods:
Build data influence person and describe network, for each data, influencer includes multiple users and a manager, Each of which influencer represents a node, and user may browse through data, it is also possible to which data are proposed with the suggestion of modification, and manages Person voluntarily can both modify to data, it is also possible to be modified according to user's suggestion,
Then influencer's network clustering coefficientIt is defined as:
K ‾ = mσ 1 + lσ 2 + n ( δ 1 × σ 3 + δ 2 × σ 4 ) m + l + n × 1 - ( m - l m ) 3
In formula, σ1Represent that user often browses the factor of influence of secondary data applying, m represents that user browses total degree;σ2Represent user Often propose the factor of influence that suggestion for revision applies, l represents that user advises total degree;σ3Represent that manager is every voluntarily to repair Change the factor of influence of secondary data applying, σ4Represent manager often according to user suggestion modification one secondary data apply influence because Son, δ1And δ2Respectively σ3And σ4Weights, n represents that manager changes total degree;For user changes coefficient of frequency, For representing satisfaction of the user to data, the coefficient shows that more greatly modification of the user to data is more frequent;
(2) quality testing submodule
Trust data quality is evaluated using " three-level evaluation model ", three classes is splitted data into according to size of data first, Then all other attribute in addition to size of data of integrated data is evaluated its quality of data, and specific method is as follows:
Sample data is divided into quality data, middle qualitative data and low quality data, if size of data is more than threshold value T1, then The data belong to quality data, if size of data is more than threshold values T2But it is less than threshold values T1, then the data belong to middle mass number According to if size of data is less than threshold values T2, then the data belong to low quality data, T1> T2And T1、T2Span be [1KB, 1MB];Quality data and low quality are further divided into different brackets, all other attribute composition of vector of data is chosen, And the average of each data attribute of each grade is calculated according to sample data, it is that each grade sets up corresponding mean vector, New data vector X=(x1,…,xN) represent, the mean vector Y=(y of certain grade1,…,yN) represent, N represents divisor According to all other attribute number of the outer data of size, two similarities of vector are represented with similarity function R (X, Y):
R ( X , Y ) = Σ i = 1 N | x i - y i x i | 2 + Σ i = 1 N | x i - y i y i | 2
R (X, Y) value is smaller, then show that similarity is bigger, conversely, then similarity is smaller, each data calculate respectively with it is not equal The similarity of the mean vector of level, so as to confirm its credit rating;
(3) quality of data differentiated control submodule
Trust data by being divided into different quality grade after quality testing submodule, according to data level different pairs According to carrying out differentiated control;
(4) fingerprint recognition submodule
Accessing electric network information data needs to be input into fingerprint, is matched with the fingerprint in fingerprint base, only by fingerprint recognition People could conduct interviews to electric network information data;
(5) alarm submodule
If not passing through fingerprint recognition, it is impossible to electric network information is conducted interviews, system sends alarm;
The data prediction submodule:
Data are divided into different field, the data fields according to needed for user's request determines client use above-mentioned three-level evaluation Model is screened to the high-quality High-level Data in field, constitutes a new tables of data K;
The useful data builds submodule:
By the data for pre-processing, each data fields contains different classification, introduces coefficient correlation P screening useful datas point Class:
P = Z s Z - ρ 1 - ρ
In formula, ZsThe quantity of data double-way sensing in mono- classification of new data table K is represented, i.e., for data A and B, can be referred to from A To B, also A can be pointed to from B, Z represents the related data amount in mono- classification of tables of data K,Wherein N is represented in a classification The sum of data;
The useful data amendment submodule:
Useful data in use, can be influenceed, according to these two aspects by artificial destruction and user's two aspects of ballot Revised coefficient correlation is P ';Concurrently set threshold value T, T ∈ (0,0.1], if P '>T, then show that this classification is useful number According to;When from quality data qualified useful data cannot be obtained, successively in middle qualitative data and low quality data Carry out searching qualified useful data, and after all data search are finished, if the P ' maximums for finally giving are small In T, although or P ' maximum more than T its absolute value with the difference of threshold values T less than setting value C, show to look for Although to useful data or useful data can be found but the useful data degree of correlation that obtains is already below expection, then now It is automatic that prompting, modification or increase related data are sent to manager;Take C=T/5;
The useful data layer digging submodule:
Scan data table K first, it is assumed that the maximum and minimum value of P ' are respectively P 'maxWith P 'min, tables of data K is divided intoIndividual Non-overlapping Domain, P mining goes out Local frequent itemset, and wherein int is bracket function;Then it is sharp Priori property is used, connection Local frequent itemset obtains global candidate;Scanning K counts the reality of each candidate again Support is determining global frequentItemset;
It is according to artificial destruction and user's specific correction formula for being modified of ballot in useful data amendment submodule:
P '=P × (1-Y) × (1+H)
In formula, Y represents that data are subject to the probability of artificial destruction, H to represent that ballot user accounts for the ratio of total number of persons;
The credible evaluation module includes following submodule:
Submodule 1:Define the every credible attribute for credible evaluation module, credible attribute is classification, credible attribute can be with Sub- attribute is decomposed into downwards;
Submodule 2:For each credible attribute or sub- attribute, the evaluation index to it is extracted, with these evaluation indexes never Same side is evaluated by credible attribute or sub- attribute;
Submodule 3:For each credible attribute or sub- attribute, the evaluation criterion to it is defined, evaluation criterion is divided into level Four: It is excellent, good, in, it is poor, evaluation criterion determines credible category based on evaluation index, the i.e. valued combinations according to each evaluation index Which rank of evaluation criterion property or sub- attribute have reached;
Submodule 4:Determine the standard of the credible classification of credible evaluation module, credible grade scale is divided into Pyatyi, and being can according to each Believe the evaluation conclusion of attribute and draw;
Submodule 5:Before credible evaluation activity is carried out, according to the difference of assessment emphasis, different credible evaluation templates are formed, and Credible evaluation activity is carried out based on this template.
CN201610524803.6A 2016-07-01 2016-07-01 A kind of electric network information secure and trusted work system based on big data Active CN106126741B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610524803.6A CN106126741B (en) 2016-07-01 2016-07-01 A kind of electric network information secure and trusted work system based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610524803.6A CN106126741B (en) 2016-07-01 2016-07-01 A kind of electric network information secure and trusted work system based on big data

Publications (2)

Publication Number Publication Date
CN106126741A CN106126741A (en) 2016-11-16
CN106126741B true CN106126741B (en) 2017-05-31

Family

ID=57468733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610524803.6A Active CN106126741B (en) 2016-07-01 2016-07-01 A kind of electric network information secure and trusted work system based on big data

Country Status (1)

Country Link
CN (1) CN106126741B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188085A (en) * 2019-04-18 2019-08-30 红云红河烟草(集团)有限责任公司 Quality data model method for building up between a kind of tobacco volume hired car
CN113129481A (en) * 2019-12-31 2021-07-16 广州海英智慧家居科技有限公司 Fingerprint lock control method
CN113129480A (en) * 2019-12-31 2021-07-16 广州海英智慧家居科技有限公司 Fingerprint lock control method for Internet of things
CN113129482A (en) * 2019-12-31 2021-07-16 广州海英智慧家居科技有限公司 Fingerprint lock identification method
CN117592994A (en) * 2020-07-22 2024-02-23 支付宝(杭州)信息技术有限公司 Payment verification method, device, equipment and storage medium
CN112866278B (en) * 2021-02-04 2023-04-07 许昌学院 Computer network information safety protection system based on big data
CN112948837B (en) * 2021-02-23 2023-04-25 国网山东省电力公司电力科学研究院 Power grid information safety and credibility working system based on Internet of things

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201699728U (en) * 2010-06-17 2011-01-05 宁波电业局 Trusted network management system for electric power real-time system
CN103684793B (en) * 2013-12-25 2017-12-05 国家电网公司 A kind of method based on trust computing enhancing communication security of power distribution network
CN104809244B (en) * 2015-05-15 2018-02-09 成都睿峰科技有限公司 Data digging method and device under a kind of big data environment
CN104918239B (en) * 2015-06-04 2019-01-18 西安交通大学 Safe transmission method based on the cooperation interference of untrusted cognitive user

Also Published As

Publication number Publication date
CN106126741A (en) 2016-11-16

Similar Documents

Publication Publication Date Title
CN106126741B (en) A kind of electric network information secure and trusted work system based on big data
Toyoda et al. A novel methodology for hyip operators’ bitcoin addresses identification
US7693767B2 (en) Method for generating predictive models for a business problem via supervised learning
CN110417721A (en) Safety risk estimating method, device, equipment and computer readable storage medium
CN111429255B (en) Risk assessment method, apparatus, device and storage medium
CN108596638A (en) Anti- fraud recognition methods and system based on big data, terminal and storage medium
CN105894372A (en) Method and device for predicting group credit
CN111177743B (en) Credit big data oriented risk control method and system thereof
CN112417176B (en) Method, equipment and medium for mining implicit association relation between enterprises based on graph characteristics
CN109840676B (en) Big data-based wind control method and device, computer equipment and storage medium
KR20100002915A (en) Behavior based method for filtering out unfair rating in trust model
Layton et al. Unsupervised authorship analysis of phishing webpages
CN101295388A (en) Credit estimation method and system
CN103970651A (en) Software architecture safety assessment method based on module safety attributes
CN109101574B (en) Task approval method and system of data leakage prevention system
CN112990583B (en) Method and equipment for determining model entering characteristics of data prediction model
CN113487241A (en) Method, device, equipment and storage medium for classifying enterprise environment-friendly credit grades
Jamshidi et al. An efficient data enrichment scheme for fraud detection using social network analysis
CN110457009B (en) Method for realizing software security requirement recommendation model based on data analysis
Wang et al. Spatial clustering method based on cloud model
Imprialou et al. Multilevel logistic regression modeling for crash mapping in metropolitan areas
CN109636627B (en) Insurance product management method, device, medium and electronic equipment based on block chain
CN115115244B (en) Evaluation method and device for investment environment of mining projects and computer equipment
Dhurandhar et al. Robust system for identifying procurement fraud
CN108629506A (en) Modeling method, device, computer equipment and the storage medium of air control model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
CB03 Change of inventor or designer information

Inventor after: Chen Zubin

Inventor after: Tang Lingli

Inventor after: Huang Lianyue

Inventor after: Zeng Mingfei

Inventor after: Hang Cong

Inventor after: He Guanbo

Inventor after: Wang Hai

Inventor after: Li Xin

Inventor after: He Zhongzhu

Inventor after: Xie Ming

Inventor after: Hu Jijun

Inventor after: Weng Xiaoyun

Inventor after: Yuan Yong

Inventor after: Deng Gefeng

Inventor after: Mo Yinghong

Inventor after: Xie Jing

Inventor after: Zhang Peng

Inventor before: Chen Zubin

Inventor before: Tang Lingli

Inventor before: Huang Lianyue

Inventor before: Zeng Mingfei

Inventor before: Hang Cong

Inventor before: He Guanbo

Inventor before: Wang Hai

Inventor before: Li Xin

Inventor before: Xie Ming

Inventor before: Hu Jijun

Inventor before: Weng Xiaoyun

Inventor before: Yuan Yong

Inventor before: Deng Gefeng

Inventor before: Mo Yinghong

Inventor before: Xie Jing

Inventor before: Zhang Peng

COR Change of bibliographic data
TA01 Transfer of patent application right

Effective date of registration: 20170213

Address after: 530000 Xingning, Nanning District, democratic road, No. 6,

Applicant after: GUANGXI POWER GRID Co.,Ltd.

Address before: 530000 Xingning, Nanning District, democratic road, No. 6,

Applicant before: He Zhongzhu

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20180214

Address after: 510000 Guangdong Province, Guangzhou high tech Industrial Development Zone, No. 233 science road 231 floor B1B2 building one layer, two layer, three layer, four layer

Patentee after: BOAO ZONGHENG NETWORK TECHNOLOGY Co.,Ltd.

Address before: 530000 Xingning, Nanning District, democratic road, No. 6,

Patentee before: GUANGXI POWER GRID Co.,Ltd.

Effective date of registration: 20180214

Address after: Jiangning District of Nanjing City, Jiangsu province 211111 streets moling Jiangjun Road No. 6

Patentee after: NANJING KEERTE ELECTRIC POWER TECHNOLOGY CO.,LTD.

Address before: 510000 Guangdong Province, Guangzhou high tech Industrial Development Zone, No. 233 science road 231 floor B1B2 building one layer, two layer, three layer, four layer

Patentee before: BOAO ZONGHENG NETWORK TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right
CP03 Change of name, title or address

Address after: 211100 No.6 Jiangjun Road, moling street, Jiangning District, Nanjing City, Jiangsu Province

Patentee after: Jiangsu kerert Information Technology Co.,Ltd.

Address before: 211111 No.6 Jiangjun Road, moling street, Jiangning District, Nanjing City, Jiangsu Province

Patentee before: NANJING KEERTE ELECTRIC POWER TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A trusted work system of power grid information security based on big data

Effective date of registration: 20210113

Granted publication date: 20170531

Pledgee: Bank of Jiangsu Limited by Share Ltd. Nanjing Jiangning branch

Pledgor: Jiangsu kerert Information Technology Co.,Ltd.

Registration number: Y2021980000353

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20220411

Granted publication date: 20170531

Pledgee: Bank of Jiangsu Limited by Share Ltd. Nanjing Jiangning branch

Pledgor: Jiangsu kerert Information Technology Co.,Ltd.

Registration number: Y2021980000353

PC01 Cancellation of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A power grid information security and credibility work system based on big data

Effective date of registration: 20230112

Granted publication date: 20170531

Pledgee: Nanjing Branch of Jiangsu Bank Co.,Ltd.

Pledgor: Jiangsu kerert Information Technology Co.,Ltd.

Registration number: Y2023980031060

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Granted publication date: 20170531

Pledgee: Nanjing Branch of Jiangsu Bank Co.,Ltd.

Pledgor: Jiangsu kerert Information Technology Co.,Ltd.

Registration number: Y2023980031060