CN110674529A - Document auditing method and document auditing device based on data security information - Google Patents

Document auditing method and document auditing device based on data security information Download PDF

Info

Publication number
CN110674529A
CN110674529A CN201910906142.7A CN201910906142A CN110674529A CN 110674529 A CN110674529 A CN 110674529A CN 201910906142 A CN201910906142 A CN 201910906142A CN 110674529 A CN110674529 A CN 110674529A
Authority
CN
China
Prior art keywords
document
auditing
information
audited
characteristic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910906142.7A
Other languages
Chinese (zh)
Other versions
CN110674529B (en
Inventor
杨晓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Enyike (beijing) Data Technology Co Ltd
Original Assignee
Enyike (beijing) Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enyike (beijing) Data Technology Co Ltd filed Critical Enyike (beijing) Data Technology Co Ltd
Priority to CN201910906142.7A priority Critical patent/CN110674529B/en
Publication of CN110674529A publication Critical patent/CN110674529A/en
Application granted granted Critical
Publication of CN110674529B publication Critical patent/CN110674529B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a document auditing method and a document auditing device based on data security information, wherein the document auditing method comprises the following steps: acquiring document information of a document to be checked, which is input by a user; determining a plurality of target characteristic information of the document to be audited under different dimensions from the document information; inputting the target characteristic information into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene of the document to be audited; and auditing the document to be audited according to the auditing flow. Therefore, on the basis of improving the complexity degree of the auditing process, the auditing of the document to be audited is pertinently carried out according to the matched auditing process, which is beneficial to improving the auditing accuracy, improving the auditing efficiency and reducing the auditing time cost and the auditing labor cost.

Description

Document auditing method and document auditing device based on data security information
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a document auditing method and a document auditing apparatus based on data security information.
Background
Since the information technology appeared, the corresponding security problem was an unavoidable issue, and the scope of the network security industry is continuously extended along with the requirement of network security. If the importance of the internet service operator on the data information safety is not enough, the data information of the user has serious potential safety hazard.
At present, in various industries, some process links need to be audited and approved, particularly, an auditing process related to safety problems is often a development process from local to whole, from primary to advanced, and from simple to complex, and the auditing process is mostly set by technical personnel according to experience, mostly data files to be audited and the like need to be audited uniformly, but in order to cover various audits, the process is mostly fussy and complex, has no pertinence, and cannot be used for more accurate auditing aiming at different data safety problems.
Disclosure of Invention
In view of this, an object of the present application is to provide a document auditing method and a document auditing device based on data security information, which can identify auditing scenes corresponding to different documents to be audited, and can match corresponding auditing processes for the auditing scenes, and audit the documents to be audited according to the matched auditing processes on the basis of improving the complexity of the auditing process, thereby facilitating to improve the accuracy of auditing, improve the auditing efficiency, and reduce the auditing time cost and the auditing labor cost.
The embodiment of the application provides a document auditing method based on data security information, which comprises the following steps:
acquiring document information of a document to be checked, which is input by a user;
determining a plurality of target characteristic information of the document to be audited under different dimensions from the document information;
inputting the target characteristic information into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene of the document to be audited;
and auditing the document to be audited according to the auditing flow.
Further, the determining, from the document information, a plurality of target feature information of the document to be audited under different dimensions includes:
extracting a plurality of characteristic information of the document to be audited under different dimensions from the document information;
performing box separation processing on the plurality of characteristic information, and determining the characteristic information group to which each characteristic information belongs;
aiming at each characteristic information group, calculating an evidence weight value of each characteristic information group and an information value corresponding to the evidence weight value;
and determining a plurality of target characteristic information of the document to be audited under different dimensions according to the information value.
Further, the evidence weight value and the information value are calculated by the following formulas:
WOEi=ln(Pig/Pib);
IVi=(Pig-Pib)*WOEi
wherein, WOEiWeight value of evidence for ith characteristic information packet, PigThe number of response characteristics in the i-th group is the ratio of the total characteristics, PibThe number of unresponsive feature information in the i-th group is the proportion of the total feature information, IViThe information value of the ith characteristic information packet.
Further, after the document to be audited is audited according to the auditing process, the document auditing method further includes:
judging whether the content in the document to be audited conforms to safety regulations or not based on the auditing result of the document to be audited;
and if the audit result shows that the content in the document to be audited conforms to the safety regulation, auditing is finished, and a notice that the auditing is passed is sent to the user.
Further, after the determining whether the content in the document to be audited conforms to the safety regulations based on the auditing result of the document to be audited, the document auditing method further includes:
if the auditing result shows that the content in the document to be audited has a safety problem which does not accord with the safety regulation, acquiring the safety problem;
and aiming at the safety problem, generating a risk check list and sending the risk check list to the user.
Further, the identification model of the audit scene and the audit process is trained in the following way:
obtaining sample document information of a plurality of sample audit documents and an actual audit process corresponding to an actual use scene to which each sample audit document belongs;
determining a plurality of target sample characteristic information of each sample audit document under different dimensions from each sample document information;
and taking the characteristic information of the target samples as input characteristics, taking the actual auditing process corresponding to each sample auditing document as output characteristics, training the constructed regression analysis model, and obtaining the auditing scene and the auditing process recognition model.
The embodiment of the present application further provides a document auditing device based on data security information, where the document auditing device includes:
the information acquisition module is used for acquiring document information of a document to be audited, which is input by a user;
the determining module is used for determining a plurality of target characteristic information of the document to be audited under different dimensions from the document information acquired by the information acquiring module;
the model determining module is used for inputting the target characteristic information determined by the determining module into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene to which the document to be audited belongs;
and the auditing module is used for auditing the document to be audited according to the auditing process determined by the model determining module.
Further, the determining module comprises:
the extracting unit is used for extracting a plurality of feature information of the document to be audited under different dimensions from the document information;
a first determining unit, configured to perform binning processing on the plurality of pieces of feature information extracted by the extracting unit, and determine a feature information group to which each piece of feature information belongs;
a calculating unit, configured to, for each of the feature information packets determined by the first determining unit, an evidence weight value of each of the feature information packets and an information value corresponding to the evidence weight value;
and the second determining unit is used for determining a plurality of target characteristic information of the document to be audited under different dimensions according to the information value calculated by the calculating unit.
Further, the calculation unit calculates the weight value of the evidence and the information value by the following formula:
WOEi=ln(Pig/Pib);
IVi=(Pig-Pib)*WOEi
wherein, WOEiWeight value of evidence for ith characteristic information packet, PigThe number of response characteristics in the i-th group is the ratio of the total characteristics, PibThe number of unresponsive feature information in the i-th group is the proportion of the total feature information, IViThe information value of the ith characteristic information packet.
Further, the document auditing device further comprises an auditing result judging module, and the auditing result judging module is used for:
judging whether the content in the document to be audited conforms to safety regulations or not based on the auditing result of the document to be audited;
and if the audit result shows that the content in the document to be audited conforms to the safety regulation, auditing is finished, and a notice that the auditing is passed is sent to the user.
Further, after the determination of whether the content in the document to be audited meets the safety regulations is made based on the audit result of the document to be audited, the audit result determination module is further configured to:
if the auditing result shows that the content in the document to be audited has a safety problem which does not accord with the safety regulation, acquiring the safety problem;
and aiming at the safety problem, generating a risk check list and sending the risk check list to the user.
Furthermore, the auditing device also comprises a model training module, and the model training module trains the auditing scene and auditing process identification model in the following modes:
obtaining sample document information of a plurality of sample audit documents and an actual audit process corresponding to an actual use scene to which each sample audit document belongs;
determining a plurality of target sample characteristic information of each sample audit document under different dimensions from each sample document information;
and taking the characteristic information of the target samples as input characteristics, taking the actual auditing process corresponding to each sample auditing document as output characteristics, training the constructed regression analysis model, and obtaining the auditing scene and the auditing process recognition model.
An embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory communicate via the bus when the electronic device is running, and the machine-readable instructions are executed by the processor to perform the steps of the document auditing method based on data security information.
Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the document auditing method based on data security information as described above.
The document auditing method and the document auditing device based on the data security information, provided by the embodiment of the application, are used for acquiring the document information of a document to be audited, which is input by a user; determining a plurality of target characteristic information of the document to be audited under different dimensions from the document information; inputting the target characteristic information into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene to which the document to be audited belongs; and auditing the document to be audited according to the auditing flow.
In this way, the document information in the document to be audited is obtained, the target characteristic information of the document to be audited under different dimensions is determined from the document information, the target characteristic information is input into the audit scene and the audit process identification model which are trained in advance, the audit scene to which the document to be audited belongs is obtained, the audit process corresponding to the audit scene to which the document to be audited belongs is determined, and the document to be audited is audited according to the audit process. Therefore, corresponding auditing processes can be formulated for different auditing documents, and the auditing of the document to be audited can be performed more specifically on the basis of improving the complexity degree of the auditing process, which is beneficial to improving the accuracy of auditing.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a flowchart of a document auditing method based on data security information according to an embodiment of the present application;
FIG. 2 is a flowchart of a document auditing method based on data security information according to another embodiment of the present application;
FIG. 3 is a schematic structural diagram of a document auditing apparatus based on data security information according to an embodiment of the present application;
fig. 4 is a second schematic structural diagram of a document auditing apparatus based on data security information according to an embodiment of the present application;
FIG. 5 is a schematic diagram of the structure of the determination module shown in FIG. 3;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. Every other embodiment that can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present application falls within the protection scope of the present application.
First, an application scenario to which the present application is applicable will be described. The method and the device can be applied to the technical field of data processing.
Research shows that at present, for a company, the auditing process of the data security problem is usually a development process from local to whole, from primary to advanced, and from simple to complex, and the auditing process of each company is artificially established, so that the auditing process of any data security problem of one company is unified, and the same auditing process is executed no matter what data security problem is, so that the auditing process is complicated and has no pertinence, and more accurate auditing can not be performed for different data security problems.
Based on this, the embodiment of the application provides a document auditing method based on data security information, which can identify auditing scenes corresponding to different documents to be audited, match corresponding auditing processes according to the auditing scenes, and finally audit the documents to be audited according to the auditing processes.
Referring to fig. 1, fig. 1 is a flowchart of a document auditing method based on data security information according to an embodiment of the present application. As shown in fig. 1, a document auditing method based on data security information provided in an embodiment of the present application includes:
step 101, acquiring document information of a document to be audited, which is input by a user;
in this step, the document auditing device can acquire document information in a document to be audited from the document to be audited after receiving the document to be audited, which is uploaded by a user and needs to be audited.
The document to be audited may be a word document which is arranged by the user after talking, may be a questionnaire obtained by filling in a questionnaire form, or may be a filled-in check list, and the like. The document information refers to content information in the above different forms of documents to be audited.
And 102, determining a plurality of target characteristic information of the document to be audited under different dimensions from the document information.
In the step, feature extraction is performed on the obtained document information, and a plurality of target feature information of the document to be audited under different dimensions, which can be applied to the audit scene and the audit process identification model which are constructed in advance subsequently, are determined.
Step 103, inputting the target characteristic information into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene to which the document to be audited belongs.
In the step, the determined target characteristic information is input into a pre-trained auditing scene and auditing process identification model, the grading score of each target characteristic information input into the auditing scene and auditing process identification model is determined through the auditing scene and auditing process identification model, the auditing scene to which the document to be audited belongs is determined according to the grading score, and then the auditing scene and auditing process identification model can determine the auditing process corresponding to the auditing scene. Illustratively, staff in a company department a, where information to be confidential is related to a document to be audited, extracting target feature information from the document content according to the input document content of the document to be audited, and inputting the target feature information into the scene identification model, where the auditing scene and the auditing process identification model determine that the auditing scene of the document to be audited is 'scene a' (assuming that 'scene a' is required to be audited under a confidential condition), the document auditing apparatus can add "data desensitization '(data desensitization' indicates that content in the document to be audited is encrypted so that the content can be audited under a confidential condition) into the auditing process of the document to be audited; on the other hand, if the audit content related to the document to be audited does not need to be audited by the department B, the "department B" can be skipped when the audit process is formulated, and the "department B" is not formulated in the audit process.
Therefore, the prepared auditing process can reduce the complexity of the auditing process to the greatest extent, so that the whole auditing process can be more targeted and simplified, the auditing process of enterprises is facilitated to be accelerated, and the auditing time is shortened.
And 104, auditing the document to be audited according to the auditing process.
In the step, the document to be audited is audited according to an audit flow made aiming at the audit scene of the document to be audited.
The document auditing method based on the data security information provided by the embodiment of the application obtains the document information of a document to be audited, which is input by a user; determining a plurality of target characteristic information of the document to be audited under different dimensions from the document information; inputting the target characteristic information into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene of the document to be audited; and auditing the document to be audited according to the auditing flow.
In this way, the document information in the document to be audited is obtained, the target feature information of the document to be audited under different dimensions is determined from the document information, the target feature information is input into the audition scene and the audition process identification model which are trained in advance, the audition process corresponding to the audition scene to which the document to be audited belongs is obtained, and the document to be audited is audited according to the audition process. Therefore, on the basis of improving the complexity degree of the auditing process, the auditing of the document to be audited is pertinently carried out according to the matched auditing process, which is beneficial to improving the auditing accuracy, improving the auditing efficiency and reducing the auditing time cost and the auditing labor cost.
Referring to fig. 2, fig. 2 is a flowchart of a document auditing method based on data security information according to another embodiment of the present application. As shown in fig. 2, a document auditing method based on data security information provided in an embodiment of the present application includes:
step 201, obtaining document information of a document to be audited, which is input by a user.
Step 202, extracting a plurality of feature information of the document to be audited under different dimensions from the document information.
In the step, feature extraction is performed on the obtained document information, and a plurality of feature information of the document to be checked, which can represent the features of the document to be checked, in different dimensions is determined.
Illustratively, when the feature information extracted from a document to be audited is "data decryption", "privacy protocol" and "network environment", the feature information of the document to be audited under three different dimensions, namely "data acquisition information", "data security information" and "environment security information", is respectively represented. The information contained in each dimension is determined by presetting, and the information contained in each dimension is updated in real time.
Step 203, performing box separation processing on the plurality of feature information, and determining the feature information group to which each feature information belongs.
In the step, the extracted characteristic information of the document to be checked is subjected to box separation processing, the originally extracted characteristic information of the document to be checked is subjected to discretization processing, and the characteristic information group to which the characteristic information of the document to be checked after discretization processing belongs is determined.
The binning processing is to discretize continuous data variables, so that the discretized features can be increased or reduced more easily, rapid iteration of the model is facilitated, and the risk of overfitting the model is reduced.
For example, the acquired feature information of the same dimension is "data encryption" and "data decryption", and the feature data packet to which "data encryption, decryption" belongs is determined after discretization.
Step 204, for each characteristic information packet, calculating an evidence weight value of each characteristic information packet and an information value corresponding to the evidence weight value.
In this step, after a plurality of feature information packets are obtained, for each of the feature information packets, an evidence weight value of the feature information packet and an information value of the feature information packet corresponding to the evidence weight value are calculated.
Further, the evidence weight value and the information value are calculated by the following formulas:
WOEi=ln(Pig/Pib);
IVi=(Pig-Pib)*WOEi
wherein, WOEiWeight value of evidence, P, representing the ith characteristic information packetigIn order to respond to the ratio of the number of the characteristic information in the i-th group to the total characteristic information, PibThe ratio of the number of unresponsive feature information to the total feature information in the ith group, IViThe information value of the ith characteristic information packet.
The response characteristic information is preset in advance according to experience or historical data, and when the characteristic information is extracted and grouped, the response characteristic information in each group can be automatically identified.
Wherein the unresponsive feature information is feature information other than the responsive feature information among all feature information in the ith feature information packet.
And step 205, determining a plurality of target characteristic information of the document to be audited under different dimensions according to the information value.
In the step, according to the calculated information value, selecting the feature information in the feature information group with the information value exceeding a preset value threshold, determining the feature information in the feature information group with the information value exceeding the preset value threshold as a plurality of target feature information of the document to be audited under different dimensions, and inputting the target feature information into an audit scene and an audit process identification model.
And step 206, inputting the target characteristic information into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene to which the document to be audited belongs.
And step 207, auditing the document to be audited according to the auditing process.
The descriptions of step 201, step 206, and step 207 may refer to the descriptions of step 101, step 103, and step 104, and the same technical effect can be achieved, which is not described in detail herein.
Further, after step 207, the document auditing method further includes: judging whether the content in the document to be audited conforms to safety regulations or not based on the auditing result of the document to be audited; and if the audit result shows that the content in the document to be audited conforms to the safety regulation, auditing is finished, and a notice that the auditing is passed is sent to the user.
In this step, after the document to be audited is audited according to the formulated audit flow, the document audit device can also judge whether the content in the document to be audited meets the safety requirement according to the audit result of the document to be audited, if the content in the document to be audited meets the safety requirement, the audit of the document to be audited is completed, and a notification that the audit is passed is sent to the user to inform the user that the user passes the audit.
The notification may be sent to the user in a document manner, and may also be in a form of an email, and the like, which is not limited herein.
Further, after step 207, the document auditing method further includes: if the auditing result shows that the content in the document to be audited has a safety problem which does not accord with the safety regulation, acquiring the safety problem; and aiming at the safety problem, generating a risk check list and sending the risk check list to the user.
In the step, if the content in the document to be audited has a safety problem which does not accord with the safety regulation, the safety problem which does not accord with the safety regulation is obtained, and a safety problem risk check list is generated aiming at the safety problem, the risk check list comprises the risk of the safety problem, and the risk is described; also included are the impact that the security issue can cause, the risk level, and the opinion of improvement to the security issue, etc.
Further, the document auditing method trains the auditing scene and auditing process recognition model in the following way:
obtaining sample document information of a plurality of sample audit documents and an actual audit process corresponding to an actual use scene to which each sample audit document belongs; determining a plurality of target sample characteristic information of each sample audit document under different dimensions from each sample document information; and taking the characteristic information of the target samples as input characteristics, taking the actual auditing process corresponding to each sample auditing document as output characteristics, training the constructed regression analysis model, and obtaining the auditing scene and the auditing process recognition model.
In the step, when an audit scene and an audit process identification model are trained, sample document information of a plurality of sample audit documents and an actual audit process corresponding to an actual use audit scene to which each sample audit document belongs are firstly obtained; extracting a plurality of target sample characteristic information of the sample to-be-audited document required in model training under different dimensions from the sample document information, inputting the target sample characteristic information into a constructed regression analysis model as an input characteristic during training of an audit scene and an audit process identification model, inputting an actual audit process corresponding to each sample audit document into the constructed regression analysis model as an output characteristic, and training the constructed regression analysis model based on the input characteristic and the output characteristic to obtain the audit scene and the audit process identification model.
Extracting a plurality of target sample characteristic information of a plurality of sample audit documents under different dimensions, which is required in model training, from the sample document information, specifically, extracting a plurality of sample characteristic information of the sample audit documents under different dimensions from each sample document information, and then performing box separation on the plurality of sample characteristic information to obtain a plurality of sample characteristic information groups; calculating a sample evidence weight value and a sample information value of each sample characteristic information packet; and determining a plurality of target sample characteristic information of the sample audit document under different dimensions according to the sample information value.
Specifically, in the training process, a scoring equation (shown in formula 1) in the linear regression model is determined by training a regression coefficient, intercept, a scale factor and an offset in the linear regression model through input features and output features, so that a trained audit scene and an audit process identification model are obtained; when the document auditing method is used, the scoring score of each input target characteristic information is determined through the scoring equation, the total scoring score of the characteristic information groups of the target characteristic information is determined according to the scoring score of each target characteristic information, the auditing scene of the document to be audited is determined based on the total scoring scores, and the corresponding auditing process is matched according to the determined auditing scene.
Wherein Score is the total Score of the ith characteristic information group, WOEiIs a target featureEvidence weight, β, of the ith characteristic information group to which it belongsiAnd in the verification scene and the verification process identification model, alpha is the intercept of the verification scene and the verification process identification model, factor is the scale factor of the verification scene and the verification process identification model, offset is the offset of the verification scene and the verification process identification model, and n is the number of all feature information in the ith feature information group.
The document auditing method for the data security information, provided by the embodiment of the application, acquires the document information of a document to be audited, which is input by a user; extracting a plurality of characteristic information of the document to be audited under different dimensions from the document information; performing box separation processing on the plurality of characteristic information, and determining the characteristic information group to which each characteristic information belongs; aiming at each characteristic information group, calculating an evidence weight value of each characteristic information group and an information value corresponding to the evidence weight value; determining a plurality of target characteristic information of the document to be audited under different dimensions according to the information value; inputting the target characteristic information into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene of the document to be audited; and auditing the document to be audited according to the auditing flow.
In this way, the document information in the document to be audited is obtained, a plurality of characteristic information of the document to be audited under different dimensions are determined from the document information, the characteristic information is subjected to box separation processing, a plurality of target characteristic information capable of being input into an audit scene and an audit process identification model is determined, the target characteristic information is input into a pre-trained audit scene and an audit process identification model, an audit scene corresponding to the document to be audited is obtained, an audit process corresponding to the audit scene is determined, and the document to be audited is audited according to the audit process. Therefore, on the basis of improving the complexity degree of the auditing process, the auditing of the document to be audited is pertinently carried out according to the matched auditing process, which is beneficial to improving the auditing accuracy, improving the auditing efficiency and reducing the auditing time cost and the auditing labor cost.
Referring to fig. 3 to 5, fig. 3 is a schematic structural diagram of a document auditing apparatus based on data security information according to an embodiment of the present application, fig. 4 is a second schematic structural diagram of a document auditing apparatus based on data security information according to an embodiment of the present application, and fig. 5 is a schematic structural diagram of a determination module shown in fig. 3. As shown in fig. 3, the document auditing apparatus 300 based on data security information includes:
the information acquisition module 310 is configured to acquire document information of a document to be audited, which is input by a user;
a determining module 320, configured to determine, from the document information acquired by the information acquiring module 310, a plurality of target feature information of the document to be audited in different dimensions;
a model determining module 330, configured to input the multiple pieces of target feature information determined by the determining module 320 into a pre-trained review scene and review process identification model, so as to obtain a review process corresponding to the review scene to which the document to be reviewed belongs;
the auditing module 340 is configured to audit the document to be audited according to the auditing process determined by the model determining module 330.
Further, as shown in fig. 4, the document auditing apparatus 300 further includes an auditing result determining module 350, where the auditing result determining module 350 is configured to:
judging whether the content in the document to be audited conforms to safety regulations or not based on the auditing result of the document to be audited;
and if the audit result shows that the content in the document to be audited conforms to the safety regulation, auditing is finished, and a notice that the auditing is passed is sent to the user.
Further, after determining whether the content in the document to be audited meets the safety regulations based on the audit result of the document to be audited, the audit result determining module 350 is further configured to:
if the auditing result shows that the content in the document to be audited has a safety problem which does not accord with the safety regulation, acquiring the safety problem;
and aiming at the safety problem, generating a risk check list and sending the risk check list to the user.
Further, as shown in fig. 4, the document auditing apparatus further includes a model training module 360, where the model training module 360 trains the auditing scene and auditing process recognition model by:
obtaining sample document information of a plurality of sample audit documents and an actual audit process corresponding to an actual use scene to which each sample audit document belongs;
determining a plurality of target sample characteristic information of each sample audit document under different dimensions from each sample document information;
and taking the characteristic information of the target samples as input characteristics, taking the actual auditing process corresponding to each sample auditing document as output characteristics, training the constructed regression analysis model, and obtaining the auditing scene and the auditing process recognition model.
Further, as shown in fig. 5, fig. 5 is a schematic structural diagram of the determining module 320 shown in fig. 3, where the determining module 320 includes:
an extracting unit 321, configured to extract, from the document information, multiple pieces of feature information of the document to be audited in different dimensions;
a first determining unit 322, configured to perform binning processing on the plurality of pieces of feature information extracted by the extracting unit 321, and determine a feature information group to which each piece of feature information belongs;
a calculating unit 323 configured to calculate, for each of the feature information packets determined by the first determining unit 322, an evidence weight value of each of the feature information packets and an information value corresponding to the evidence weight value;
a second determining unit 324, configured to determine, according to the information value calculated by the calculating unit 323, a plurality of target feature information of the document to be audited in different dimensions.
Further, the calculating unit 323 calculates the weight value of the evidence and the information value by the following formulas:
WOEi=ln(Pig/Pib);
IVi=(Pig-Pib)*WOEi
wherein, WOEiWeight value of evidence, P, representing the ith characteristic information packetigThe number of response characteristics in the i-th group is the ratio of the total characteristics, PibThe number of unresponsive feature information in the i-th group is the proportion of the total feature information, IViThe information value of the ith characteristic information packet.
The document auditing device based on the data security information provided by the embodiment of the application acquires the document information of a document to be audited, which is input by a user; determining a plurality of target characteristic information of the document to be audited under different dimensions from the document information; inputting the target characteristic information into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene of the document to be audited; and auditing the document to be audited according to the auditing flow.
In this way, the document information in the document to be audited is obtained, the target feature information of the document to be audited under different dimensions is determined from the document information, the target feature information is input into the audition scene and the audition process identification model which are trained in advance, the audition process corresponding to the audition scene to which the document to be audited belongs is obtained, and the document to be audited is audited according to the audition process. Therefore, on the basis of improving the complexity degree of the auditing process, the auditing of the document to be audited is pertinently carried out according to the matched auditing process, which is beneficial to improving the auditing accuracy, improving the auditing efficiency and reducing the auditing time cost and the auditing labor cost.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 6, the electronic device 600 includes a processor 610, a memory 620, and a bus 630.
The memory 620 stores machine-readable instructions executable by the processor 610, when the electronic device 600 runs, the processor 610 communicates with the memory 620 through the bus 630, and when the machine-readable instructions are executed by the processor 610, the steps of the document auditing method based on data security information in the method embodiments shown in fig. 1 and fig. 2 may be executed.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the document auditing method based on data security information in the method embodiments shown in fig. 1 and fig. 2 may be executed.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A document auditing method based on data security information is characterized by comprising the following steps:
acquiring document information of a document to be checked, which is input by a user;
determining a plurality of target characteristic information of the document to be audited under different dimensions from the document information;
inputting the target characteristic information into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene of the document to be audited;
and auditing the document to be audited according to the auditing flow.
2. The document auditing method according to claim 1, wherein the determining, from the document information, a plurality of target feature information of the document to be audited in different dimensions includes:
extracting a plurality of characteristic information of the document to be audited under different dimensions from the document information;
performing box separation processing on the plurality of characteristic information, and determining the characteristic information group to which each characteristic information belongs;
aiming at each characteristic information group, calculating an evidence weight value of each characteristic information group and an information value corresponding to the evidence weight value;
and determining a plurality of target characteristic information of the document to be audited under different dimensions according to the information value.
3. A document review method according to claim 2, wherein the weight value of evidence and the value of information are calculated by the following formulas:
WOEi=ln(Pig/Pib);
IVi=(Pig-Pib)*WOEi
wherein, WOEiWeight value of evidence for ith characteristic information packet, PigThe number of response characteristics in the i-th group is the ratio of the total characteristics, PibThe number of unresponsive feature information in the i-th group is the proportion of the total feature information, IViThe information value of the ith characteristic information packet.
4. The document auditing method according to claim 1, where after the auditing of the document to be audited according to the auditing process, the document auditing method further comprises:
judging whether the content in the document to be audited conforms to safety regulations or not based on the auditing result of the document to be audited;
and if the audit result shows that the content in the document to be audited conforms to the safety regulation, auditing is finished, and a notice that the auditing is passed is sent to the user.
5. The document auditing method according to claim 4, after said determining whether the content in the document to be audited conforms to the safety regulations based on the auditing result of the document to be audited, the document auditing method further comprising:
if the auditing result shows that the content in the document to be audited has a safety problem which does not accord with the safety regulation, acquiring the safety problem;
and aiming at the safety problem, generating a risk check list and sending the risk check list to the user.
6. The document review method of claim 1, wherein the review scenario and review flow recognition model is trained by:
obtaining sample document information of a plurality of sample audit documents and an actual audit process corresponding to an actual use scene to which each sample audit document belongs;
determining a plurality of target sample characteristic information of each sample audit document under different dimensions from each sample document information;
and taking the characteristic information of the target samples as input characteristics, taking the actual auditing process corresponding to each sample auditing document as output characteristics, training the constructed regression analysis model, and obtaining the auditing scene and the auditing process recognition model.
7. A document auditing device based on data security information, characterized in that the document auditing device comprises:
the information acquisition module is used for acquiring document information of a document to be audited, which is input by a user;
the determining module is used for determining a plurality of target characteristic information of the document to be audited under different dimensions from the document information acquired by the information acquiring module;
the model determining module is used for inputting the target characteristic information determined by the determining module into a pre-trained auditing scene and auditing process identification model to obtain an auditing process corresponding to the auditing scene to which the document to be audited belongs;
and the auditing module is used for auditing the document to be audited according to the auditing process determined by the model determining module.
8. A document auditing device according to claim 7, where the determining module comprises:
the extracting unit is used for extracting a plurality of feature information of the document to be audited under different dimensions from the document information;
a first determining unit, configured to perform binning processing on the plurality of pieces of feature information extracted by the extracting unit, and determine a feature information group to which each piece of feature information belongs;
a calculating unit configured to calculate, for each of the feature information packets determined by the first determining unit, an evidence weight value of each of the feature information packets and an information value corresponding to the evidence weight value;
and the second determining unit is used for determining a plurality of target characteristic information of the document to be audited under different dimensions according to the information value calculated by the calculating unit.
9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the data security information based document auditing method according to any one of claims 1 to 6.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the data security information-based document auditing method according to any one of claims 1 to 6.
CN201910906142.7A 2019-09-24 2019-09-24 Document auditing method and document auditing device based on data security information Active CN110674529B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910906142.7A CN110674529B (en) 2019-09-24 2019-09-24 Document auditing method and document auditing device based on data security information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910906142.7A CN110674529B (en) 2019-09-24 2019-09-24 Document auditing method and document auditing device based on data security information

Publications (2)

Publication Number Publication Date
CN110674529A true CN110674529A (en) 2020-01-10
CN110674529B CN110674529B (en) 2021-07-27

Family

ID=69077418

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910906142.7A Active CN110674529B (en) 2019-09-24 2019-09-24 Document auditing method and document auditing device based on data security information

Country Status (1)

Country Link
CN (1) CN110674529B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813399A (en) * 2020-07-23 2020-10-23 平安医疗健康管理股份有限公司 Machine learning-based auditing rule processing method and device and computer equipment
CN113538002A (en) * 2020-04-14 2021-10-22 北京沃东天骏信息技术有限公司 Method and device for auditing texts
CN113779472A (en) * 2021-07-30 2021-12-10 阿里巴巴(中国)有限公司 Content auditing method and device and electronic equipment
CN114169641A (en) * 2021-12-31 2022-03-11 南京星云数字技术有限公司 Client interest rate sensitivity prediction method based on characteristic entropy
CN116071006A (en) * 2022-11-23 2023-05-05 北京航星永志科技有限公司 File auditing system, method, equipment and medium based on blockchain

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101447038A (en) * 2007-11-26 2009-06-03 鸿富锦精密工业(深圳)有限公司 System and method for auditing proposal improvement
CN107633380A (en) * 2017-08-30 2018-01-26 北京明朝万达科技股份有限公司 The task measures and procedures for the examination and approval and system of a kind of anti-data-leakage system
CN107886306A (en) * 2017-11-24 2018-04-06 网易(杭州)网络有限公司 Document approvals method, medium, device and computing device
US20180205546A1 (en) * 2016-12-31 2018-07-19 Assetvault Limited Systems, methods, apparatuses for secure management of legal documents
CN108334954A (en) * 2018-01-22 2018-07-27 中国平安人寿保险股份有限公司 Construction method, device, storage medium and the terminal of Logic Regression Models
CN108763948A (en) * 2018-03-16 2018-11-06 北京明朝万达科技股份有限公司 A kind of automatic measures and procedures for the examination and approval of file and system of data-oriented anti-disclosure system
CN109101574A (en) * 2018-07-18 2018-12-28 北京明朝万达科技股份有限公司 A kind of the task measures and procedures for the examination and approval and system of anti-data-leakage system
CN109165907A (en) * 2018-07-02 2019-01-08 北京天辰信科技有限公司 A kind of document approvals method and system
CN109523225A (en) * 2018-10-12 2019-03-26 平安科技(深圳)有限公司 A kind of measure of managing contract, system and terminal device
CN109636624A (en) * 2018-10-29 2019-04-16 平安医疗健康管理股份有限公司 Generation method, device, equipment and the storage medium of air control audit model
CN109829692A (en) * 2019-01-17 2019-05-31 深圳壹账通智能科技有限公司 Contract trial method, apparatus, equipment and storage medium based on artificial intelligence
CN109871200A (en) * 2017-12-04 2019-06-11 星际空间(天津)科技发展有限公司 One kind being used for rapid build business approval systems approach
US20190228153A1 (en) * 2015-09-23 2019-07-25 University Of Florida Research Foundation, Incorporated Malware detection via data transformation monitoring
CN110245571A (en) * 2019-05-20 2019-09-17 深圳壹账通智能科技有限公司 Contract signature checking method, device, computer equipment and storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101447038A (en) * 2007-11-26 2009-06-03 鸿富锦精密工业(深圳)有限公司 System and method for auditing proposal improvement
US20190228153A1 (en) * 2015-09-23 2019-07-25 University Of Florida Research Foundation, Incorporated Malware detection via data transformation monitoring
US20180205546A1 (en) * 2016-12-31 2018-07-19 Assetvault Limited Systems, methods, apparatuses for secure management of legal documents
CN107633380A (en) * 2017-08-30 2018-01-26 北京明朝万达科技股份有限公司 The task measures and procedures for the examination and approval and system of a kind of anti-data-leakage system
CN107886306A (en) * 2017-11-24 2018-04-06 网易(杭州)网络有限公司 Document approvals method, medium, device and computing device
CN109871200A (en) * 2017-12-04 2019-06-11 星际空间(天津)科技发展有限公司 One kind being used for rapid build business approval systems approach
CN108334954A (en) * 2018-01-22 2018-07-27 中国平安人寿保险股份有限公司 Construction method, device, storage medium and the terminal of Logic Regression Models
CN108763948A (en) * 2018-03-16 2018-11-06 北京明朝万达科技股份有限公司 A kind of automatic measures and procedures for the examination and approval of file and system of data-oriented anti-disclosure system
CN109165907A (en) * 2018-07-02 2019-01-08 北京天辰信科技有限公司 A kind of document approvals method and system
CN109101574A (en) * 2018-07-18 2018-12-28 北京明朝万达科技股份有限公司 A kind of the task measures and procedures for the examination and approval and system of anti-data-leakage system
CN109523225A (en) * 2018-10-12 2019-03-26 平安科技(深圳)有限公司 A kind of measure of managing contract, system and terminal device
CN109636624A (en) * 2018-10-29 2019-04-16 平安医疗健康管理股份有限公司 Generation method, device, equipment and the storage medium of air control audit model
CN109829692A (en) * 2019-01-17 2019-05-31 深圳壹账通智能科技有限公司 Contract trial method, apparatus, equipment and storage medium based on artificial intelligence
CN110245571A (en) * 2019-05-20 2019-09-17 深圳壹账通智能科技有限公司 Contract signature checking method, device, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MICHAEL ELLIOTT 等: "An improved process model for internal auditing", 《MANAGERIAL AUDITING JOURNAL》 *
孙鹏璐: "基于BIM的建设项目投资控制研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538002A (en) * 2020-04-14 2021-10-22 北京沃东天骏信息技术有限公司 Method and device for auditing texts
CN111813399A (en) * 2020-07-23 2020-10-23 平安医疗健康管理股份有限公司 Machine learning-based auditing rule processing method and device and computer equipment
CN111813399B (en) * 2020-07-23 2022-05-31 平安医疗健康管理股份有限公司 Machine learning-based auditing rule processing method and device and computer equipment
CN113779472A (en) * 2021-07-30 2021-12-10 阿里巴巴(中国)有限公司 Content auditing method and device and electronic equipment
CN114169641A (en) * 2021-12-31 2022-03-11 南京星云数字技术有限公司 Client interest rate sensitivity prediction method based on characteristic entropy
CN116071006A (en) * 2022-11-23 2023-05-05 北京航星永志科技有限公司 File auditing system, method, equipment and medium based on blockchain

Also Published As

Publication number Publication date
CN110674529B (en) 2021-07-27

Similar Documents

Publication Publication Date Title
CN110674529B (en) Document auditing method and document auditing device based on data security information
CN110399925B (en) Account risk identification method, device and storage medium
US11941491B2 (en) Methods and apparatus for identifying an impact of a portion of a file on machine learning classification of malicious content
CN106447239B (en) Data release auditing method and device
CN107204960B (en) Webpage identification method and device and server
CN112231484B (en) News comment auditing method, system, device and storage medium
CN109582833B (en) Abnormal text detection method and device
CN109508373B (en) Method and device for calculating enterprise public opinion index and computer readable storage medium
CN104158828B (en) The method and system of suspicious fishing webpage are identified based on cloud content rule base
CN106446070A (en) Information processing apparatus and method based on patent group
CN109933648B (en) Real user comment distinguishing method and device
US20150286706A1 (en) Forensic system, forensic method, and forensic program
CN107665164A (en) Secure data detection method and device
CN113064973A (en) Text classification method, device, equipment and storage medium
CN111611786B (en) Text similarity calculation method and device
CN106790025B (en) Method and device for detecting link maliciousness
CN110020430B (en) Malicious information identification method, device, equipment and storage medium
KR101566153B1 (en) Forensic system, forensic method, and forensic program
CN108804501B (en) Method and device for detecting effective information
CN114169313A (en) Priority determination method and device, electronic equipment and storage medium
CN107862599B (en) Bank risk data processing method and device, computer equipment and storage medium
CN107085684A (en) The detection method and device of performance of program
CN107832925A (en) Internet content risk evaluating method, device and server
KR20190022430A (en) Systems, methods, electronic devices and storage media for identifying social events based risk events
CN110728585A (en) Authority guaranteeing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant