CN117312904A - Data classification and classification method and related products - Google Patents

Data classification and classification method and related products Download PDF

Info

Publication number
CN117312904A
CN117312904A CN202311048703.7A CN202311048703A CN117312904A CN 117312904 A CN117312904 A CN 117312904A CN 202311048703 A CN202311048703 A CN 202311048703A CN 117312904 A CN117312904 A CN 117312904A
Authority
CN
China
Prior art keywords
data
classification
target
data set
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311048703.7A
Other languages
Chinese (zh)
Inventor
张新雨
卢西昌
李华
朱蕙
张琳
李继业
宋袁婧筠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pacific Insurance Technology Co Ltd
Original Assignee
Pacific Insurance Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pacific Insurance Technology Co Ltd filed Critical Pacific Insurance Technology Co Ltd
Priority to CN202311048703.7A priority Critical patent/CN117312904A/en
Publication of CN117312904A publication Critical patent/CN117312904A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Finance (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Computational Linguistics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data classification and classification method and related products, and relates to the technical field of data processing. In the method, phrases with corresponding relations with fields in a target data set are determined from a target word stock, and classified data items to be classified of the target data set are constructed according to the determined phrases; and inputting the classified data items into a classified prediction model, and acquiring the data types and the safety grades of the classified data items through the classified prediction model, so that the automatic classification and the automatic classification of the classified data items are realized. Because the data classification prediction model is adopted to replace manual data classification and data classification of classified data items, the working efficiency of safety classification and safety classification of the data is greatly improved.

Description

Data classification and classification method and related products
Technical Field
The application relates to the technical field of big data processing, in particular to a data classification and classification method and related products.
Background
Data security refers to the process or state of protecting information or information systems from unauthorized access, use, disclosure, destruction, modification, and destruction. Data security administration is not only a security tool or solution, but is also a series of appropriate measures based on the strategy, business, application, organic whole of security and risk management, from management system to supporting tool, from upper management architecture to lower technology implementation. The data security management is an important application link of the artificial intelligence in the whole data management process.
Because insurance enterprises master a large amount of data, the supervision departments require the insurance enterprises to strictly regulate the data security, and the privacy protection is enhanced. Currently, most insurance enterprises also adopt a manual labeling method to classify and classify a large amount of data mastered by the insurance enterprises, and the traditional method has the defects of time and labor waste and low efficiency; under the background of digital transformation development, how to establish a set of efficient data classification and classification method, so as to improve the working efficiency of data security classification and data security classification on data, and the method becomes a problem to be solved urgently.
Disclosure of Invention
Based on the above problems, the application provides a data classification and classification method, which is used for efficiently classifying and classifying data, and improving the working efficiency of classifying and classifying the data safely.
The first aspect of the present application provides a data classification and classification method, including:
determining phrases with corresponding relation with fields in the target data set from the target word stock; the target word stock comprises professional words of the industry in which the target data set is located; the phrase with the corresponding relation with the field in the target data set is the phrase same as the field in the target data set or the phrase coincident with the field in the target data set;
Constructing hierarchical data items to be classified of the target data set according to the determined phrase;
taking the data items to be classified as input of a classification prediction model, and obtaining the data types and the security levels of the data items to be classified through the classification prediction model; the classification hierarchical prediction model comprises a classification prediction model and a hierarchical prediction model; the classification prediction model is used for obtaining a data type prediction result of the classified data item to be classified; the grading prediction model is used for obtaining a safety grade prediction result of the grading data item to be classified; the classification hierarchical prediction model is constructed based on the real correspondence of the data type and the security level.
Optionally, the training step of the classification hierarchical prediction model includes:
determining phrases with corresponding relations with fields in a sample data set from the target word stock; the phrase with the corresponding relation with the fields in the sample data set is the phrase same as the fields in the sample data set or the phrase coincident with the fields in the sample data set;
constructing a sample classification hierarchical data item of the sample data set according to the determined phrase with the corresponding relation with the fields in the sample data set;
Obtaining a data type prediction result of the sample classification grading data item by using the sample classification grading data item and a first model to be trained;
obtaining a safety level prediction result of the sample classification data item by using the sample classification data item and a second model to be trained;
and adjusting parameters in the first model to be trained and parameters in the second model to be trained according to the difference between the prediction corresponding relation between the data type prediction result and the safety level prediction result of the sample classification data item and the real corresponding relation until training is finished to obtain the data classification prediction model.
Optionally, the obtaining, by using the sample classification hierarchical data item and the first model to be trained, a data type prediction result of the sample classification hierarchical data item includes:
and inputting the sample classification grading data item into the first model to be trained, and obtaining a data type prediction result of the sample classification grading data item through analysis of the first model to be trained on the sample classification grading data item.
Optionally, the obtaining, by using the sample classification hierarchical data item and the second model to be trained, a security level prediction result of the sample classification hierarchical data item includes:
And inputting the sample classification grading data item into the second model to be trained, and analyzing the sample classification grading data item through the second model to be trained to obtain a safety level prediction result of the sample classification grading data item.
Optionally, the step of constructing the target word stock includes:
screening out target Chinese characters which are the same as the Chinese characters in the Chinese word stock from the industry data set; the target Chinese characters comprise single words and phrases; the industry data set includes the target data set and a sample data set;
removing special characters in the foreign language character strings from the industry data set to obtain a target foreign language; the target foreign language comprises foreign language words and foreign language phrases;
and arranging the target Chinese characters and the target foreign language according to the sequence of the occurrence frequency in the industry data set from high to low, and constructing the target word stock.
Optionally, if there are multiple phrases overlapping with the fields in the target data set, determining, from the target word stock, a phrase having a correspondence with the fields in the target data set, including:
respectively acquiring the arrangement sequence numbers of a plurality of phrases overlapped with the fields in the target data set in the target word stock;
And determining the phrase with the smallest sequence number as the phrase for constructing the hierarchical data item to be classified of the target data set.
A second aspect of the present application provides a data classification and ranking apparatus, comprising:
the target phrase screening module is used for determining phrases with corresponding relations with fields in the target data set from the target word stock; the target word stock comprises professional words of the industry in which the target data set is located; the phrase with the corresponding relation with the fields in the target data set is the target number
The phrase with the same field in the data set is the phrase which is coincident with the field in the target data set;
the target data item construction module is used for constructing classified data items to be classified of the target data set according to the determined phrase;
the result acquisition module is used for taking the data items to be classified as the input of a classification prediction model, and acquiring the data types and the security levels of the data items to be classified through the classification prediction model; the classification hierarchical prediction model comprises a classification prediction model and a hierarchical prediction model; the classification prediction model is used for obtaining a data type prediction result of the classified data item to be classified; the grading prediction model is used for obtaining a safety grade prediction result of the grading data item to be classified; the classification hierarchical prediction model is constructed based on the real correspondence of the data type and the security level.
Optionally, the device further comprises a model training module, wherein the classification hierarchical prediction model is trained by the model training unit; the model training module comprises:
the sample phrase screening unit is used for determining phrases with corresponding relations with fields in a sample data set from the target word stock; the phrase with the corresponding relation with the fields in the sample data set is the phrase same as the fields in the sample data set or the phrase coincident with the fields in the sample data set;
a sample data item construction unit, configured to construct a sample classification hierarchical data item of the sample data set according to the determined phrase having a correspondence with a field in the sample data set;
the data type acquisition unit is used for acquiring the data type of the sample classification grading data item by utilizing the sample classification grading data item and the first model to be trained;
the data type obtaining unit is used for obtaining a data type prediction result of the sample classification grading data item by utilizing the sample classification grading data item and a first model to be trained;
the safety level obtaining unit is used for obtaining a safety level prediction result of the sample classification data item by utilizing the sample classification data item and the second model to be trained;
And the parameter adjustment unit is used for adjusting the parameters in the first model to be trained and the parameters in the second model to be trained according to the difference between the prediction corresponding relation between the data type prediction result and the safety level prediction result of the sample classification data item and the real corresponding relation until training is finished to obtain the data classification prediction model.
A third aspect of the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the method provided by the first aspect.
A fourth aspect of the present application provides an electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method provided in the first aspect.
Compared with the prior art, the application has the following beneficial effects:
according to the data classification and classification method, phrases with corresponding relations with fields in the target data set are determined from the target word stock, and classified data items to be classified of the target data set are constructed according to the determined phrases; inputting the classified data items into a classified prediction model, obtaining a data type prediction result of the classified data items through the classified prediction model, and the security level prediction result of the classified data item to be classified is obtained through the classified prediction model, so that the automatic classification and automatic classification of the classified data item to be classified are realized. Because the data classification prediction model is adopted to replace manual data security classification and data security classification for classified data items, the working efficiency of security classification and security classification for the data is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a flowchart of a data classification and classification method according to an embodiment of the present application;
FIG. 2 is a flowchart of a training classification hierarchical prediction model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a first model to be trained according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a second model to be trained according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a data processing process of a classification hierarchical prediction model to be trained according to an embodiment of the present application;
fig. 6 is a schematic diagram of a working principle of training a hierarchical prediction model to be classified according to an embodiment of the present application;
fig. 7 is a schematic diagram of a target word stock according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a data classification and classification device according to an embodiment of the present application.
Detailed Description
Currently, most insurance enterprises also adopt a manual labeling method to carry out security classification and security classification on a large amount of data mastered by the insurance enterprises, and in view of the defects of time and labor waste and low classification efficiency of the traditional method; under the background of digital transformation development, how to establish a set of efficient data classification and classification method and improve the working efficiency of carrying out safety classification and safety classification on data becomes a problem to be solved urgently.
According to the data classification and classification method, phrases with corresponding relations with fields in the target data set are determined from the target word stock, and classified data items to be classified of the target data set are constructed according to the determined phrases; inputting the classified data items into a classified prediction model, acquiring a data type prediction result of the classified data items through the classified prediction model, and acquiring a safety level prediction result of the classified data items through the classified prediction model; automatic classification and automatic classification of classified data items are achieved. Because the data classification prediction model is adopted to replace manual data classification and data classification of classified data items, the working efficiency of safety classification and safety classification of the data is greatly improved.
In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Fig. 1 is a flowchart of a data classification and classification method according to an embodiment of the present application. The data classification and classification method as shown in fig. 1 includes:
s101, determining the phrase with the corresponding relation with the field in the target data set from the target word stock.
The main purpose of the step is to reduce and refine a large amount of information and a large amount of fields contained in the target data set, and screen out key words which can represent the core content of the data contained in the target data set; removing a large number of redundant words contained in the target data set; constructing classified data items to be classified by using the screened key words; and the data items to be classified and classified are used as input data of the classified and classified prediction model, so that convenience is brought to classifying and classifying the data in the target data set.
The target data set used in this step is a set of data that contains a particular industry. For example, the target data set may be a set containing user data, business data, and insurance enterprise system data in the insurance industry. The user data in the insurance industry comprises personal user data and enterprise user data, wherein the personal user data comprises data information such as user name, birth date, address, identity card number and the like; the enterprise user data comprises data information such as enterprise names, unified social credit codes of enterprises, enterprise legal person information, stock right structures of enterprises, service fields, high management information and the like; the business data comprises data information such as premium data, policy data, claim data and the like; the insurance enterprise system data includes data information such as system operation log data, system fault data, and operation data of a user operating the insurance enterprise system.
It should be noted that, in actual work, the data of the same type and the same meaning in the target data set may be expressed in different languages. For example, the user identity information data may be represented by the identity card information, or the user identity information data may be represented by the identity card number; the enterprise high management information data can be represented by the enterprise high management structure, and the enterprise high management information data can be represented by the enterprise master management structure. In the above description, although the expression forms of the keywords in the target data set are different, the types of specific information in the target data set to which the keywords refer are the same.
In order to convert keywords in different expression forms in the target data set into keywords in a unified expression form, a target word stock is introduced in the embodiment of the application.
In the embodiment of the application, the target word stock is a set of professional words including the industry in which the target data set is located. For example, the target data set is an insurance industry data set, and the target word stock is a collection of specialized words including insurance industry specialized words such as premium, policy, insurance amount, and insurance claim.
It should be emphasized that the order of the words in the target word stock used in this application is related to the frequency of occurrence of the words in the industry dataset of the industry. Words with high occurrence frequency in an industry data set of the insurance industry are smaller in arrangement sequence number and arranged at a position in front of a target word stock; the data with low occurrence frequency in the industry data set of the insurance industry has larger arrangement sequence number and is arranged at the position behind the target word stock.
Specifically, assuming that the frequency of occurrence of the policy in an industry data set of the insurance industry is 1000 times, the arrangement sequence number A of the policy in the target word stock; the frequency of occurrence of premium in the industry data set of insurance industry is 800 times, and the arrangement serial number of premium in the target word stock is B; sequence number a is smaller than sequence number B. If the number A is 50, the number B is a number greater than 50.
In the step, determining a phrase with a corresponding relation with a field in a target data set from a target word stock; i.e. selecting the word group which is the same as the field in the target data set or the word which has the overlapping part with the field in the target data set from the target word library. Wherein the fields in the target data set are key words of data in the target data. Words such as policy, premium, and user name described above are fields in this step.
Specifically, if the target dataset has the field "premium"; comparing the field 'premium' with all words in the target word stock one by one, and judging whether the words which are identical to the field 'premium' exist in the target word stock; or whether there is a term that coincides with the field "premium".
If the target word stock has the word premium identical to the field premium; the "premium" is screened out.
If the target word stock does not have the same words as the field "premium", but there are multiple words that overlap with the field "premium", such as "policy", "premium", and "insurance beneficiary", etc.
In one possible case, if there are multiple phrases overlapping with the fields in the target data set, the phrases having the correspondence with the fields in the target data set may be determined by the following method.
Respectively acquiring the arrangement sequence numbers of a plurality of phrases overlapped with the fields in the target data set in the target word stock; and determining the phrase with the smallest sequence number as the phrase for constructing the hierarchical data item to be classified of the target data set.
Specifically, the arrangement sequence numbers of the insurance policy, the insurance amount and the insurance beneficiary in the target word stock are respectively obtained, and if the arrangement sequence number of the insurance policy is 20, the arrangement sequence number of the insurance beneficiary is 50 and the arrangement sequence number of the insurance beneficiary is 100; since the arrangement sequence number of the "policy" is 20 minimum, the "policy" is taken as a phrase having a correspondence relationship with the field "premium" in the target data set.
S102, constructing hierarchical data items to be classified of the target data set according to the determined phrase.
Constructing a hierarchical data item to be classified of the target data set according to the phrase with the corresponding relation with the field in the target data set determined in the step S101; and then taking the classified and graded data item to be classified as input data of the classified and graded prediction model. The main purpose of the step is to utilize the classified data item to represent a plurality of fields in the target data set, so that the workload of the classified classification prediction model in classifying and classifying the data is reduced, and the working efficiency of data classified classification is improved.
The possible implementation manner of constructing the classified data items of the target data set according to the determined phrase is to splice the determined phrase to obtain the classified data items.
Specifically, if the phrase determined in step S10 is "policy", "business name", and "insurance fraud"; the phrase may be concatenated by a hyphen "-" to form the hierarchical data item to be classified of "policy-business name-insurance fraud".
It should be noted that, in the embodiment of the present application, a possible implementation manner of constructing the hierarchical data item to be classified according to the determined phrase is given by way of example, and other possible implementation manners of constructing the hierarchical data item to be classified according to the determined phrase are not limited in the present application.
S103, taking the data items to be classified as input of a classification prediction model, and obtaining the data types and the security levels of the data items to be classified through the classification prediction model.
The classification hierarchical prediction model adopted in the embodiment of the application comprises a classification prediction model and a hierarchical prediction model. The classification prediction model is used for acquiring a data type prediction result of the classified data item to be classified; and the hierarchical prediction model user acquires the security level prediction and the country of the hierarchical data item to be classified. The classification hierarchical prediction model is constructed based on the true correspondence of the data type and the security level. In subsequent embodiments of the present application, a specific implementation of obtaining the classification hierarchical prediction model will be described in detail.
The data types in the application are divided into personal basic information data, personal health examination information data and personal financial information data; enterprise basic information data, enterprise organization architecture data, enterprise operation management data, and enterprise financial data, among others.
The security levels in this application include five levels of privacy, confidentiality, privacy, internal disclosure, and external disclosure. It should be noted that, the security level of the data may be expressed in different forms such as level 1, level 2, level 3, and level 4.
The data of different types correspond to different security levels, namely, a real corresponding relation exists between the data types and the security levels. Specifically, the personal basic information data may be set as an internal disclosure; setting enterprise operation management data as a secret; setting corporate financial data to confidential, and so forth. According to actual needs, different types of data are corresponding to different security levels; and effective management and efficient management and control of data security are realized. The classification hierarchical prediction model in the application is trained based on the real corresponding relation between the set data type and the safety grade.
The process of this step can be described in detail as: inputting the classified data items of the target data set constructed in the step S102 into a classified prediction model trained in the application; obtaining a data type prediction result of the classified data item to be classified through a classified prediction model in the classified prediction model; obtaining a security level prediction result of the classified data item through a classified prediction model in the classified prediction model; and finally obtaining the data type and the data level of the classified data item.
In summary, the embodiments of the present application introduce determining, from a target word stock, a phrase having a correspondence with a field in a target data set; constructing classified data items to be classified of the target data set according to the determined phrase; and inputting the data items to be classified into a classification prediction model, and finally obtaining the data classification method of the data types and the security levels of the data items to be classified through analysis of the classification prediction model to the data items to be classified and the classification specification. Compared with the method for classifying and grading the data by manual labeling, the data type and the safety grade of the classified data item are obtained by using the classified grading prediction model in the embodiment of the application, and the working efficiency of data safety classification and data safety grading is greatly improved.
Fig. 2 is a flowchart of a training classification hierarchical prediction model according to an embodiment of the present application. As shown in fig. 2, a process of training a classification hierarchical prediction model includes:
s201, determining the phrase with the corresponding relation with the field in the sample data set from the target word stock.
The sample data set is a data set having an association relationship with the target data set in step S101. If the target data set is the set described above that contains user data, business data, and insurance enterprise system data in the insurance industry; the sample data set is also a collection containing user data, business data, and insurance enterprise system data in the insurance industry.
The sample data set is different from the target data set in that the classification and grading of the data in the sample data set are completed through a manual labeling method; i.e. the data in the sample data set has a corresponding data type and security level. And the data in the target data set is data for which the classification hierarchy of the data has not been completed. The sample dataset is a dataset for training a hierarchical prediction model to be classified. The target data set is a set of classified data, and the classified data in the target data set is classified and classified by using a trained classified classification prediction model.
In this step, the process of determining the phrase having the correspondence with the field in the sample data set from the target word stock is the same as the process of determining the phrase having the correspondence with the field in the target data set from the target word stock in step S101, and the specific content is referred to the description in step S101, and is not repeated here.
S202, constructing sample classification hierarchical data items of the sample data set according to the determined phrase with the corresponding relation with the fields in the sample data set.
The present step is the same as step S102, and the specific content is referred to the description in step S102, and will not be repeated here.
S203, obtaining a data type prediction result of the sample classification grading data item by using the sample classification grading data item and the first model to be trained.
Specifically, the sample classification data item constructed in the step S202 is input into a first model to be trained, and the data type prediction result of the sample classification data item is obtained through analysis of the sample classification data item by the first model to be trained.
Fig. 3 is a schematic diagram of a working process of a first model to be trained according to an embodiment of the present application. As shown in fig. 3, the classification hierarchical data items formed by "data table name-field english name-field chinese name" are input into the first model to be trained, and the first model to be trained outputs the data type prediction result corresponding to the classification hierarchical data formed by "data table name-field english name-field chinese name". The data type prediction result comprises a plurality of categories and confidence probabilities corresponding to each category. Possibly, the expression of the data type prediction result is: category 1, confidence probability 76%; category 2, confidence probability 22%; category 3, confidence probability 2%.
It should be noted that fig. 3 is a schematic diagram illustrating an operation process of the first model to be trained. The expression form of the classified and graded data item is not limited to 'data table name-field English name-field Chinese name'; the expression form of the data type prediction result is not limited to: category 1, confidence probability 76%; category 2, confidence probability 22%; category 3, confidence probability 2%.
S204, obtaining a safety level prediction result of the sample classification data item by using the sample classification data item and the second model to be trained.
Specifically, the sample classification and classification data item constructed in the step S202 is input to the second model to be trained, and the safety level prediction result of the sample classification and classification data item is obtained through analysis of the sample classification and classification data item by the second model to be trained.
Fig. 4 is a schematic diagram of a working process of a second model to be trained according to an embodiment of the present application. As shown in fig. 4, the classified data items formed by "data table name-field english name-field chinese name" are input into the second model to be trained, and the second model to be trained outputs the safety level prediction result corresponding to the classified data formed by "data table name-field english name-field chinese name". The security level prediction result includes a plurality of levels and confidence probabilities corresponding to each level. Possibly, the expression of the security level prediction result is: class 1, confidence probability 70%; level 2, confidence probability 20%; level 3, confidence probability 10%.
It should be noted that, a schematic diagram of the second working process to be trained is exemplarily shown in fig. 4. The expression form of the classified and graded data item is not limited to 'data table name-field English name-field Chinese name'; the expression form of the security level prediction result is not limited to: class 1, confidence probability 70%; level 2, confidence probability 20%; level 3, confidence probability 10%. Category 1, confidence probability 76%; category 2, confidence probability 22%; category 3, confidence probability 2%.
Fig. 5 is a schematic diagram of a data processing process of a classification hierarchical prediction model to be trained according to an embodiment of the application. Next, taking a first model to be trained as an example, introducing the classification hierarchical data item to be input into the first model to be trained, and processing the data by the first model to be trained. It should be noted that, the processing procedure of the second model to be trained on the data may be described herein.
As shown in fig. 5, the first model to be trained includes an input interface, an embedded layer, a convolution layer, a pooling layer, a screening layer, a full connection layer and an output interface 7.
The classification hierarchical data item consisting of the data table name, the field English name and the field Chinese name in fig. 4 is input into the first model to be trained through the input interface.
The embedded layer is used for carrying out vectorization processing on classification and grading data items formed by the data table names, the field English names and the field Chinese names which are input into the first model to be trained, and generating an initial vector matrix of the sample data classification and grading data items; such that the sample classification hierarchical data item completes the mapping from semantic space to vector space.
The convolution layer is used for carrying out convolution operation on the initial vector matrix and obtaining a convolution eigenvector matrix in a convolution mode.
And the pooling layer is used for removing redundant information by performing the operation of reducing the dimension and extracting the feature on the convolution feature vector matrix, selecting the feature with more identifying power and generating a global feature vector matrix.
And the screening layer is used for processing the global feature vector matrix in a random inactivation mode and relieving the transition fitting phenomenon in the training process of the first model to be trained.
The full connection layer can be combined with a softmax function to classify the global feature vector matrix processed by the screening layer and convert the global feature vector matrix into a semantic expression.
And the output interface is used for outputting a data type prediction result obtained from the first model to be trained.
It should be noted that this section only takes the processing procedure of the first model to be trained on the data as an example, and simply introduces the processing procedure of the classification and classification prediction model to be trained on the data; i.e. the processing of data by the neural network model. Because the processing of data by the neural network model is a mature technology, the detailed process of processing the data by the classification hierarchical prediction model to be trained is not specifically developed in the application.
And S205, adjusting parameters in the first model to be trained and parameters in the second model to be trained according to the difference between the prediction corresponding relation between the data type prediction result and the safety level prediction result of the sample classification data item and the real corresponding relation until training is finished to obtain the data classification prediction model.
And according to the difference between the prediction corresponding relation between the data type prediction result and the safety level prediction result of the sample classification grading data item and the real corresponding relation, adjusting the parameters in the first model to be trained and the parameters in the second model to be trained by using a mean square error function or a cross entropy loss function until training is finished to obtain the data classification grading prediction model. Since the use of a mean square error function or a cross entropy loss function to perform parameter adjustment on a classification hierarchical prediction model to be trained is a mature technology, the technology is not described in detail in the present application.
For ease of understanding, the training process of the hierarchical prediction model to be classified will be further described from a global point of view. Fig. 6 is a schematic diagram of a principle of training a hierarchical prediction model to be classified according to an embodiment of the present application.
As can be seen from fig. 6, first, a sample data set is input into a target word stock, and a phrase having a correspondence with a field in the sample data set is determined from the target word stock, thereby generating a sample classification hierarchical data item. And inputting the sample classification data item into a classification prediction model to be classified, and obtaining a data type prediction result of the sample classification data item and a security level prediction result of the sample classification data item.
And then, judging whether the corresponding relation between the data type prediction result and the data grade prediction result obtained by the classification prediction model to be classified is the same as the corresponding relation between the real data type and the safety grade through a mutual verification mechanism.
If the two are the same, the data type and the security level of the sample classification hierarchical data item are directly output.
If the data type prediction result and the safety level prediction result are different, determining the data type and the safety level of the output sample classification hierarchical data item according to the difference condition of the data type prediction result and the safety level prediction result; and feeding the obtained data type prediction result, the safety level prediction result and the sample classification data item corresponding to the data type prediction result back to the target word stock, determining the phrase with the corresponding relation to the field corresponding to the classification data item from the target word stock again, and reproducing and training the classification prediction model to be classified, thereby improving the training precision of the classification prediction model to be classified.
It should be noted that, the mutual authentication mechanism is an operation mechanism for verifying whether the corresponding relationship between the data type in the data type prediction result and the security level in the security level prediction result is the same as the corresponding relationship between the real data type and the security level, or if the two are different, the difference between the two is continuously judged. The implementation of the mutual verification mechanism is beneficial to acquiring sample classification data items with poor classification results in the process of training the classification prediction model to be classified; and then feeding the sample classification and classification data item back to a target word stock, extracting keywords again, and reselecting and training the classification and classification prediction model to be classified so as to obtain the classification and classification prediction model with better classification and classification effects.
In one possible case, the following method can be adopted to determine the data type and the security level of the output sample classification hierarchical data item when the data type prediction result and the security level prediction result are different; or retraining the hierarchical prediction model to be classified.
Step one, the data type prediction result obtained in the step 203 is selected to form a data type file to be judged according to the order of confidence probability from big to small, and the first N data types with the highest confidence probability are selected; wherein N is an odd number greater than or equal to 3.
And secondly, acquiring the security level corresponding to each data type in the data type file to be judged according to the corresponding relation between the real data type and the security level.
Thirdly, selecting the first N security levels with the highest confidence probability to form a security level file to be judged according to the sequence from the high confidence probability to the low confidence probability of the security level prediction result obtained in the step S204.
And fourthly, comparing the security level obtained in the second step with the security level obtained in the third step, and if the probability of the security level and the security level being the same is greater than 50%, selecting the data type with the highest confidence probability in the step S203 and the security level with the highest confidence probability in the step S204 as the data type and the security level of the sample classification data property. If the probability of the two being identical is less than 50%, retraining the classification hierarchical prediction model to be trained.
Specifically, if the data type file to be judged generated according to the method described in the first step includes: five data types of personal basic material, personal property data, personal health physiological data, personal identification information and personal biometric information. Wherein, the personal basic data comprises personal name, birthday, sex, ethnicity, telephone number and other information; the personal property data comprises data information such as bank accounts, credit records, credit information, flow records and the like; the personal health physiological data comprises information such as hospitalization records, medication records, family medical history and nursing records; the personal identity information comprises information such as an identity card number, a passport, a employee card, a social security card and the like; the personal biometric information includes information such as personal genes, fingerprints, facial recognition features, and the like.
According to the corresponding relation between the preset and real data types and the safety level, the safety level of the obtained personal basic information is internal disclosure, the personal property data is secret, the personal health physiological data is secret, the personal identity information is confidential and the personal biological identification information is secret; the security level corresponding to each data type in the data type file to be judged is internal public, secret and secret.
If the security level file to be judged generated according to the method described in the third step includes: internal disclosure, secret, confidential, external disclosure, and external disclosure.
The security level corresponding to each data type in the data type file to be judged in the second step, namely internal disclosure, secret and secret; and comparing the security level included in the security level file to be judged in the third step with the security levels, namely the internal disclosure, the secret, the confidential, the external disclosure and the external disclosure.
Since three of the five security levels in the two documents are identical, i.e. the probability of the two being identical is greater than 50%; selecting the data type with the highest confidence probability in the step S203 and the security level with the highest confidence probability in the step S204 as the data type and the security level of the sample classification hierarchical data item; namely, the personal basic material is selected as the data type of the sample classification grading data item, and the internal disclosure is selected as the security level of the sample classification grading data item.
If three of the security levels in the regions of the two files are different, the probability of the two being different is greater than 50%. The data type and security level of the data classification hierarchical data item are not output; and feeding the data classification and grading data item back to the target word stock, and retraining the classification and grading prediction model to be classified.
In summary, the process of training the classification hierarchical prediction model to be trained by using the sample data set to obtain the classification hierarchical prediction model is detailed in steps S201 to S205. According to the method, the classification and grading prediction model is adopted to replace a manual marking method to classify and grade the data in the target data training set, so that the efficiency of classifying and grading the data is greatly improved, and the efficiency and the strength of data management and data control of insurance enterprises are improved.
The following method may be employed to construct, where possible, a target word stock containing specialized words of the industry in which the target data set is located.
Screening out target Chinese characters which are the same as Chinese characters in a Chinese word segmentation word stock from an industry data set; removing special characters in the foreign language character strings from the industry data set to obtain a target foreign language; and arranging the target Chinese characters and the target foreign language according to the sequence of the occurrence frequency in the industry data set from high to low to obtain the target word stock.
In the following, taking the insurance industry as an example, a method for constructing a target word stock in the insurance industry is specifically described.
The industry data set contains words frequently used by the insurance industry, and words in the industry data set comprise different forms such as Chinese and foreign language. For the insurance industry, foreign language is generally english and english abbreviations.
Firstly, extracting Chinese in an industry data set by using a Chinese word segmentation word stock to obtain target Chinese characters.
The Chinese word segmentation word stock is a common Chinese character division extraction tool, and the common Chinese word segmentation word stock comprises a Jieba word stock or an IK Analyzer word stock. Inputting the data in the industry data set into a Chinese word stock, and extracting the Chinese characters from the industry data set by the Chinese word stock through a Chinese word segmentation technology to obtain the target Chinese characters. It should be noted that the target Chinese characters include both individual Chinese characters and Chinese character phrases. Because the Chinese word segmentation word stock is a mature technology for extracting Chinese characters in the industry data set, the detailed step of extracting data in the industry data set by using the Jieba word stock or the IK Analyzer word stock or other common word stocks is not specifically introduced in the application.
Then, special characters in the foreign language character string can be removed from the industry data set, and the target foreign language can be obtained.
Specifically, special characters, such as underlines, spaces, and other special symbols, in english strings in the insurance industry dataset are rejected to obtain english words and abbreviations for the english words. For example, the joining insurance industry dataset has strings: "insurance-index insurance claims decision-insurance fraud", the "insurance", "index fraud" and "insurance", "insurance claims decision" and "insurance fraud" target English are obtained after the "-", "and" -in the above character strings are removed.
And finally, arranging the obtained target Chinese characters and the obtained target foreign language according to the sequence from high to low in the frequency of occurrence in the industrial data set, and constructing a target word stock.
Fig. 7 is a schematic diagram of a target word stock provided in an embodiment of the present application. As shown in fig. 7, the word "case" appears 984 times in the industrial dataset; the term "policy" appears 8892 times in the industry dataset, 681 times in the industry dataset, and so on. After each target Chinese character or each target English is obtained, the frequencies of occurrence in the industrial data set are arranged according to the sequence of the occurrence frequencies from top to bottom, and finally the target word stock is obtained.
The target word stock obtained by the method comprises common professional words of insurance industry; and the professional vocabulary is arranged in order of high-to-low frequency of occurrence in the industry dataset. That is, as described in step S101, words with high occurrence frequency in the industrial data set are arranged in front of the target word stock, and the corresponding serial numbers are smaller; words with lower occurrence frequency in the industry data set are arranged behind the target word stock, and the corresponding serial numbers are smaller.
When the fields in the target data set are overlapped with the phrases in the target word stock, respectively acquiring the arrangement sequence numbers of the phrases overlapped with the fields in the target data set in the target word stock, and determining the phrase with the minimum arrangement sequence number as the phrase for constructing the classified data item to be classified in the target data set because the arrangement sequence number is related to the occurrence frequency of the phrases in the industrial data set; if the arrangement sequence number of the phrase in the target word stock is smaller, the phrase is represented to have higher occurrence frequency in the industrial data set. When the phrase with higher frequency of occurrence and the phrase with lower frequency of occurrence are overlapped with the fields in the target data set; in general, the phrase with higher occurrence frequency has higher probability of being identical to the phrase to be compared. The configuration mode is beneficial to improving the accuracy of data classification and data classification.
Based on the method provided by the foregoing embodiment, correspondingly, the present application further provides a data classification grading device. Specific implementations of the apparatus are described below with reference to the examples and figures.
Referring to fig. 8, the structure of a data classification and classification device according to an embodiment of the present application is shown. The apparatus 800 as shown in fig. 8 includes;
A target phrase screening module 801, configured to determine phrases having a corresponding relationship with fields in the target data set from the target word stock; the target word stock comprises professional words of the industry in which the target data set is located; the phrase with the corresponding relation with the field in the target data set is the phrase same as the field in the target data set or the phrase coincident with the field in the target data set;
a target data item construction module 802, configured to construct a hierarchical data item to be classified of the target data set according to the determined phrase;
a result obtaining module 803, configured to take the to-be-classified data item as an input of a classification prediction model, and obtain a data type and a security level of the to-be-classified data item through the classification prediction model; the classification hierarchical prediction model comprises a classification prediction model and a hierarchical prediction model; the classification prediction model is used for obtaining a data type prediction result of the classified data item to be classified; the grading prediction model is used for obtaining a safety grade prediction result of the grading data item to be classified; the classification hierarchical prediction model is constructed based on the true correspondence of data types and security levels.
Possibly, the apparatus 800 further comprises a model training module, the classification hierarchical prediction model being trained by the model training unit; the model training module comprises:
the sample phrase screening unit is used for determining phrases with corresponding relations with fields in a sample data set from the target word stock; the phrase with the corresponding relation with the fields in the sample data set is the phrase same as the fields in the sample data set or the phrase coincident with the fields in the sample data set;
a sample data item construction unit, configured to construct a sample classification hierarchical data item of the sample data set according to the determined phrase having a correspondence with a field in the sample data set;
the data type obtaining unit is used for obtaining a data type prediction result of the sample classification grading data item by utilizing the sample classification grading data item and a first model to be trained;
the safety level obtaining unit is used for obtaining a safety level prediction result of the sample classification data item by utilizing the sample classification data item and the second model to be trained;
and the parameter adjustment unit is used for adjusting the parameters in the first model to be trained and the parameters in the second model to be trained according to the difference between the prediction corresponding relation between the data type prediction result and the safety level prediction result of the sample classification data item and the real corresponding relation until training is finished to obtain the data classification prediction model.
Optionally, the apparatus 800 further includes a target phrase construction module, including:
the Chinese character screening unit is used for screening target Chinese characters which are the same as the Chinese characters in the Chinese word stock from the industry data set; the target Chinese characters comprise single words and phrases; the industry data set includes the target data set and a sample data set;
the foreign language screening unit is used for removing special characters in the foreign language character strings from the industry data set to obtain target foreign language; the target foreign language comprises foreign language words and foreign language phrases;
and the word stock construction unit is used for arranging the target Chinese characters and the target foreign language according to the sequence of the occurrence frequency in the industry data set from high to low to construct the target word stock.
Possibly, the target phrase screening module 801 includes:
a sequence number obtaining unit, configured to obtain, respectively, an arrangement sequence number of a plurality of phrases in the target word stock, where the arrangement sequence number is coincident with a field in the target data set;
and the phrase determining unit is used for determining the phrase with the smallest sequence number as the phrase for constructing the classified data items of the target data set.
Based on the data classification and classification method and apparatus provided in the foregoing embodiments, correspondingly, the present application further provides a computer readable storage medium having a computer program stored thereon, where the program, when executed by a processor, implements some or all of the steps in the data classification and classification method mentioned above.
Based on the data classification and classification method and device provided by the foregoing embodiments, the present application further provides an electronic device, including:
a memory having a computer program stored thereon;
and a processor for executing the computer program in the memory to implement the method and apparatus for classifying and grading data provided in the foregoing embodiments.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements illustrated as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely one specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method of classifying and ranking data, the method comprising:
determining phrases with corresponding relation with fields in the target data set from the target word stock; the target word stock comprises professional words of the industry in which the target data set is located; the phrase with the corresponding relation with the field in the target data set is the phrase same as the field in the target data set or the phrase coincident with the field in the target data set;
constructing hierarchical data items to be classified of the target data set according to the determined phrase;
taking the data items to be classified as input of a classification prediction model, and obtaining the data types and the security levels of the data items to be classified through the classification prediction model; the classification hierarchical prediction model comprises a classification prediction model and a hierarchical prediction model; the classification prediction model is used for obtaining a data type prediction result of the classified data item to be classified; the grading prediction model is used for obtaining a safety grade prediction result of the grading data item to be classified; the classification hierarchical prediction model is constructed based on the real correspondence of the data type and the security level.
2. The method of claim 1, wherein the step of training the classification hierarchical prediction model comprises:
determining phrases with corresponding relations with fields in a sample data set from the target word stock; the phrase with the corresponding relation with the fields in the sample data set is the phrase same as the fields in the sample data set or the phrase coincident with the fields in the sample data set;
constructing a sample classification hierarchical data item of the sample data set according to the determined phrase with the corresponding relation with the fields in the sample data set;
obtaining a data type prediction result of the sample classification grading data item by using the sample classification grading data item and a first model to be trained;
obtaining a safety level prediction result of the sample classification data item by using the sample classification data item and a second model to be trained;
and adjusting parameters in the first model to be trained and parameters in the second model to be trained according to the difference between the prediction corresponding relation between the data type prediction result and the safety level prediction result of the sample classification data item and the real corresponding relation until training is finished to obtain the data classification prediction model.
3. The method according to claim 2, wherein the obtaining a data type prediction result of the sample classification hierarchical data item using the sample classification hierarchical data item and a first model to be trained comprises:
and inputting the sample classification grading data item into the first model to be trained, and obtaining a data type prediction result of the sample classification grading data item through analysis of the first model to be trained on the sample classification grading data item.
4. A method according to claim 2 or 3, wherein said obtaining a security level prediction result of said sample classification hierarchical data item using said sample classification hierarchical data item and a second model to be trained comprises:
and inputting the sample classification grading data item into the second model to be trained, and analyzing the sample classification grading data item through the second model to be trained to obtain a safety level prediction result of the sample classification grading data item.
5. The method of claim 1, wherein the constructing the target word stock comprises:
screening out target Chinese characters which are the same as the Chinese characters in the Chinese word stock from the industry data set; the target Chinese characters comprise single words and phrases; the industry data set includes the target data set and a sample data set;
Removing special characters in the foreign language character strings from the industry data set to obtain a target foreign language; the target foreign language comprises foreign language words and foreign language phrases;
and arranging the target Chinese characters and the target foreign language according to the sequence of the occurrence frequency in the industry data set from high to low, and constructing the target word stock.
6. The method of claim 1, wherein determining the phrase from the target lexicon that has a correspondence with the field in the target dataset if there are a plurality of phrases that are coincident with the field in the target dataset, comprises:
respectively acquiring the arrangement sequence numbers of a plurality of phrases overlapped with the fields in the target data set in the target word stock;
and determining the phrase with the smallest sequence number as the phrase for constructing the hierarchical data item to be classified of the target data set.
7. A data classification and ranking apparatus, the apparatus comprising:
the target phrase screening module is used for determining phrases with corresponding relations with fields in the target data set from the target word stock; the target word stock comprises professional words of the industry in which the target data set is located; the phrase with the corresponding relation with the field in the target data set is the phrase same as the field in the target data set or the phrase coincident with the field in the target data set;
The target data item construction module is used for constructing classified data items to be classified of the target data set according to the determined phrase;
the result acquisition module is used for taking the data items to be classified as the input of a classification prediction model, and acquiring the data types and the security levels of the data items to be classified through the classification prediction model; the classification hierarchical prediction model comprises a classification prediction model and a hierarchical prediction model; the classification prediction model is used for obtaining a data type prediction result of the classified data item to be classified; the grading prediction model is used for obtaining a safety grade prediction result of the grading data item to be classified; the classification hierarchical prediction model is constructed based on the real correspondence of the data type and the security level.
8. The apparatus of claim 7, further comprising a model training module, the classification hierarchical prediction model being trained by the model training unit; the model training module comprises:
the sample phrase screening unit is used for determining phrases with corresponding relations with fields in a sample data set from the target word stock; the phrase with the corresponding relation with the fields in the sample data set is the phrase same as the fields in the sample data set or the phrase coincident with the fields in the sample data set;
A sample data item construction unit, configured to construct a sample classification hierarchical data item of the sample data set according to the determined phrase having a correspondence with a field in the sample data set;
the data type obtaining unit is used for obtaining a data type prediction result of the sample classification grading data item by utilizing the sample classification grading data item and a first model to be trained;
the safety level obtaining unit is used for obtaining a safety level prediction result of the sample classification data item by utilizing the sample classification data item and the second model to be trained;
and the parameter adjustment unit is used for adjusting the parameters in the first model to be trained and the parameters in the second model to be trained according to the difference between the prediction corresponding relation between the data type prediction result and the safety level prediction result of the sample classification data item and the real corresponding result until training is finished to obtain the data classification prediction model.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1-6.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-6.
CN202311048703.7A 2023-08-18 2023-08-18 Data classification and classification method and related products Pending CN117312904A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311048703.7A CN117312904A (en) 2023-08-18 2023-08-18 Data classification and classification method and related products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311048703.7A CN117312904A (en) 2023-08-18 2023-08-18 Data classification and classification method and related products

Publications (1)

Publication Number Publication Date
CN117312904A true CN117312904A (en) 2023-12-29

Family

ID=89254266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311048703.7A Pending CN117312904A (en) 2023-08-18 2023-08-18 Data classification and classification method and related products

Country Status (1)

Country Link
CN (1) CN117312904A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688616A (en) * 2024-02-04 2024-03-12 广东省计算技术应用研究所 Information security processing method, device, equipment and storage medium based on big data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117688616A (en) * 2024-02-04 2024-03-12 广东省计算技术应用研究所 Information security processing method, device, equipment and storage medium based on big data
CN117688616B (en) * 2024-02-04 2024-05-28 广东省计算技术应用研究所 Information security processing method, device, equipment and storage medium based on big data

Similar Documents

Publication Publication Date Title
US10558746B2 (en) Automated cognitive processing of source agnostic data
US10951658B2 (en) IT compliance and request for proposal (RFP) management
US8738552B2 (en) Method and system for classifying documents
US20180075138A1 (en) Electronic document management using classification taxonomy
US8577823B1 (en) Taxonomy system for enterprise data management and analysis
CN111767716B (en) Method and device for determining enterprise multi-level industry information and computer equipment
CN108922633A (en) A kind of disease name standard convention method and canonical system
Chen et al. Web question answering with neurosymbolic program synthesis
CN106096005A (en) A kind of rubbish mail filtering method based on degree of depth study and system
CN110334343B (en) Method and system for extracting personal privacy information in contract
JP2023553121A (en) Document classification using domain-specific natural language processing models
US11507901B1 (en) Apparatus and methods for matching video records with postings using audiovisual data processing
CN117312904A (en) Data classification and classification method and related products
Xu et al. Exploiting lists of names for named entity identification of financial institutions from unstructured documents
Li et al. Recovering traceability links in requirements documents
Xia et al. Automated extraction of abac policies from natural-language documents in healthcare systems
Wang et al. Detecting coreferent entities in natural language requirements
CN106326472B (en) One kind investigation information integrity verification method
Garrido et al. Icix: A semantic information extraction architecture
Vitório et al. Building a Relevance Feedback Corpus for Legal Information Retrieval in the Real-Case Scenario of the Brazilian Chamber of Deputies
Ağduk et al. Classification of news texts from different languages with machine learning algorithms
RU2802549C1 (en) Method and system for depersonalization of confidential data
RU2804747C1 (en) Method and system for depersonalization of confidential data
US20240143632A1 (en) Extracting information from documents using automatic markup based on historical data
Sun et al. ETIP: a lengthy nested NER problem for Chinese insurance policy analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination