CN117555983A

CN117555983A - Auxiliary secret setting method and system based on machine learning

Info

Publication number: CN117555983A
Application number: CN202310418025.2A
Authority: CN
Inventors: 王杰辉; 王光磊; 范世杰
Original assignee: Beijing Shengkewo Technology Development Co ltd
Current assignee: Beijing Shengkewo Technology Development Co ltd
Priority date: 2023-04-19
Filing date: 2023-04-19
Publication date: 2024-02-13
Anticipated expiration: 2043-04-19
Also published as: CN117555983B

Abstract

The invention discloses an auxiliary secret setting method and system based on machine learning, and relates to the technical field of data processing, wherein the method comprises the following steps: acquiring the file type of a target file to be subjected to decryption; performing text preprocessing on the target file to obtain a target keyword set comprising a plurality of keywords; inputting a plurality of keywords into a secret key word database for indexing to obtain a plurality of secret key words, and calculating a plurality of occurrence probabilities; constructing a file auxiliary secret determination analysis model based on machine learning according to file secret determination data in a preset time range in the past; inputting a plurality of occurrence probabilities into a corresponding target auxiliary secret determination module to obtain a secret determination analysis result set; and obtaining a target secret grade, and performing auxiliary secret determination on the target file. The invention solves the technical problems of poor density fixing accuracy and low intelligent degree in the prior art, and achieves the technical effects of intelligently assisting in density fixing and improving the density fixing accuracy and safety.

Description

Auxiliary secret setting method and system based on machine learning

Technical Field

The invention relates to the technical field of data processing, in particular to an auxiliary secret setting method and system based on machine learning.

Background

With the rapid development of economy and science, the value of data has increased. The following problem is the security problem of the data file, and at present, the working efficiency of data confidentiality can be greatly improved and the data security can be ensured by carrying out accurate security identification on the data.

However, for the carding core file, the security range and the security level are often determined manually, so that not only is the security efficiency low, but also the accuracy of the security cannot meet the requirement as the data volume is increased along with gradual accumulation of the file. The technical problems of poor density determination accuracy and low intelligent degree in the prior art are solved.

Disclosure of Invention

The application provides an auxiliary secret setting method and system based on machine learning, which are used for solving the technical problems of poor secret setting accuracy and low intelligent degree in the prior art.

In view of the above, the present application provides an auxiliary secret determining method and system based on machine learning.

In a first aspect of the present application, there is provided a machine learning-based auxiliary secret determination method, the method comprising:

acquiring the file type of a target file to be subjected to decryption;

performing text preprocessing on the target file to obtain a target keyword set comprising a plurality of keywords;

inputting the keywords into a secret key word database for indexing to obtain a plurality of secret key words, and calculating to obtain a plurality of occurrence probabilities of the secret key words in the target keyword set;

constructing a file auxiliary secret determination analysis model based on machine learning according to file secret determination data in a preset time range in the past, wherein the file auxiliary secret determination analysis model comprises a plurality of auxiliary secret determination modules corresponding to a plurality of sample file types;

inputting the multiple occurrence probabilities into corresponding target auxiliary secret setting modules according to the file types to obtain a secret setting analysis result set, wherein each auxiliary secret setting module comprises multiple auxiliary secret setting units; and

and acquiring a target secret level according to the secret analysis result set, and performing auxiliary secret determination on the target file.

In a second aspect of the present application, there is provided a machine learning based auxiliary secret determination system, the system comprising:

the file type acquisition module is used for acquiring the file type of the target file to be subjected to decryption;

the keyword acquisition module is used for carrying out text preprocessing on the target file to obtain a target keyword set comprising a plurality of keywords;

the occurrence probability calculation module is used for inputting the keywords into a secret key word database for indexing, obtaining the secret key words, and calculating to obtain the occurrence probabilities of the secret key words in the target keyword set;

the secret setting model construction module is used for constructing a file auxiliary secret setting analysis model based on machine learning according to file secret setting data in a past preset time range, wherein the file auxiliary secret setting analysis model comprises a plurality of auxiliary secret setting modules corresponding to a plurality of sample file types;

the secret determination analysis module is used for inputting the multiple occurrence probabilities into the corresponding target auxiliary secret determination modules according to the file type to obtain a secret determination analysis result set, wherein each auxiliary secret determination module comprises multiple auxiliary secret determination units; and

the auxiliary secret setting module is used for acquiring a target secret level according to the secret setting analysis result set and carrying out auxiliary secret setting on the target file.

One or more technical solutions provided in the present application have at least the following technical effects or advantages:

according to the method, a file type of a target file to be subjected to secret setting is obtained, then text preprocessing is carried out on the target file, a target keyword set comprising a plurality of keywords is obtained, the plurality of secret setting keywords are obtained by inputting the plurality of keywords into a secret setting keyword database for indexing, a plurality of occurrence probabilities of the plurality of secret setting keywords in the target keyword set are obtained through calculation, then a file auxiliary secret setting analysis model is built based on machine learning according to file secret setting data in a preset time range in the past, wherein the file auxiliary secret setting analysis model comprises a plurality of auxiliary secret setting modules corresponding to a plurality of sample file types, and then the plurality of occurrence probabilities are input into the corresponding target auxiliary secret setting modules according to the file type, a secret setting analysis result set is obtained, each auxiliary secret setting module comprises a plurality of auxiliary secret setting units, and an object secret setting grade is obtained according to the secret setting analysis result set. The technical effects of improving the accuracy of auxiliary secret determination, improving the secret determination efficiency and ensuring the safety of files are achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of an auxiliary secret determination method based on machine learning according to an embodiment of the present application;

fig. 2 is a schematic flow chart of obtaining multiple occurrence probabilities in a machine learning-based auxiliary secret determination method according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a method for obtaining a file-assisted secret analysis model in a machine learning-based secret determination method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an auxiliary secret determination system based on machine learning according to an embodiment of the present application.

Reference numerals illustrate: the system comprises a file type acquisition module 11, a keyword acquisition module 12, an occurrence probability calculation module 13, a secret determination model construction module 14, a secret determination analysis module 15 and an auxiliary secret determination module 16.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.

It should be noted that the terms "comprises" and "comprising," along with any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

As shown in fig. 1, the present application provides an auxiliary secret determination method based on machine learning, the method comprising:

step S100: acquiring the file type of a target file to be subjected to decryption;

further, the step S100 in this embodiment of the present application includes:

step S110: acquiring the data type, the file theme and the file attribution of the target file;

step S120: and taking the data type, the file theme and the file attribution as the file type.

Specifically, in order to ensure the security of the file in the transmission process and the storage process, the file is subjected to confidentiality processing according to the corresponding confidentiality grade, and in order to ensure the confidentiality efficiency and accuracy, the confidentiality processing is performed by using a machine learning mode. The target file to be encrypted is any file needing to be determined in security level, and optionally comprises a document of a document, a contract file of an order and the like. The file types are models which describe different properties after the same properties in a plurality of files are summarized and integrated, and the files can be classified quickly according to the file types to which the files belong.

Specifically, the obtained information is set as the content of the file type by extracting corresponding information of the data type, the file subject and the file attribution of the target file. That is, the file type includes three aspects of the data type of the target file, the file theme and the file attribution. The data type is information described according to the attribute to which the data in the file belongs, and includes txt, HTML, word, excel, PDF and the like. The document theme includes main theme of the document, such as theme including purchase contract, commission contract, etc., and the document attribution includes information of legal attribution, natural attribution, etc. of signing or drafting the target document, such as information of cooperative clients to which the target document belongs. Preferably, the data of the target file is acquired according to the data type, the file theme and the file attribution, and the content of the file type is formed according to the acquired result. The technical effect of acquiring data of the file needing to be fixed in density is achieved, so that reliable analysis data is provided for follow-up fixed in density.

Step S200: performing text preprocessing on the target file to obtain a target keyword set comprising a plurality of keywords;

specifically, in order to improve the efficiency and accuracy of performing auxiliary encryption on the target file, text preprocessing is required to be performed on the obtained target file, so that keyword extraction is performed according to the target file after preprocessing, and a target keyword set including a plurality of keywords is obtained. Preferably, the text preprocessing of the target file is performed by performing steps such as stop word removal processing, repeated word removal processing, chinese word segmentation processing, part-of-speech tagging processing and the like on text content in the target file, so that the target file can meet the precondition of keyword processing. Extracting keywords with all words as targets through the text marked with the parts of speech, and obtaining the target keyword set according to the extracted results. The target keyword set is a vocabulary set which is obtained by extracting key information of contents contained in a target file and comprises a plurality of keywords. The keywords are phrases which can reflect the subject, content key, confidential information and other contents of the target file. And setting a fixed density level to provide a matching object for subsequent fixed density analysis by obtaining the target keyword set.

Step S300: inputting the keywords into a secret key word database for indexing to obtain a plurality of secret key words, and calculating to obtain a plurality of occurrence probabilities of the secret key words in the target keyword set;

further, as shown in fig. 2, the plurality of keywords are input into a secret key word database for indexing, a plurality of secret key words are obtained, and a plurality of occurrence probabilities of the plurality of secret key words in the target keyword set are obtained by calculation, where step S300 in the embodiment of the present application further includes:

step S310: performing text preprocessing according to the secret files of the plurality of sample file types within the past preset time range to obtain a plurality of sample secret keyword sets;

step S320: constructing a plurality of secret key word sub-databases by taking the plurality of sample file types as index elements according to the plurality of sample secret key word sets;

step S330: inputting the keywords into a secret key word sub-database corresponding to the file type for indexing, and obtaining repeated keywords serving as the secret key words;

step S340: and calculating the ratio of the occurrence times of the plurality of secret key words in the target key word set to the occurrence times of all key words, and obtaining the plurality of occurrence probabilities.

Specifically, the secret key word database is a database obtained by summarizing and storing key words used for determining the security level and used by secret files of various sample file types in a past preset time range, and is composed of a plurality of secret key word sub-databases, wherein the secret key word sub-databases are in one-to-one correspondence with the sample file types, and each file type corresponds to one secret key word sub-database, so that the accuracy and efficiency of secret key word secret determination are ensured.

Specifically, the plurality of sample secret key word sets are obtained by performing text preprocessing such as stop word removal processing, repeated word removal processing, chinese word segmentation processing, part-of-speech tagging processing and the like on secret files of the plurality of sample file types within a preset time range in the past, and then performing key word extraction on the secret files after the preprocessing, thereby obtaining secret key word sets corresponding to different sample file types, and obtaining the plurality of sample secret key word sets after collecting key words of the plurality of sample files. The past preset time range is a time period before the current time of performing the decryption processing on the target file, and is set by the staff, and is not limited herein, for example, in the past year. The plurality of sample secret key word sets are secret key word sets corresponding to the plurality of sample file types obtained through key word extraction after text preprocessing is carried out on the secret text pieces of the plurality of sample file types. Dividing the plurality of sample secret key word sets into key word sets belonging to different sample file types by taking the plurality of sample file types as index elements, thereby obtaining a plurality of secret key word sub-databases. And constructing the secret key word database according to the secret key word sub-databases.

Specifically, a plurality of keywords in a target keyword set are input into a secret key word sub-database corresponding to the file type, the keywords are used as indexes to be matched with the keywords in the secret key word sub-database corresponding to the file type, whether the corresponding secret key word sub-database comprises the keywords in the target file or not is judged, so that a matching result is obtained, and the keywords contained in the successfully matched result are set as the secret key words. The secret key word is used for determining the confidentiality level of the content in the target file. In this way, words that help less with the auxiliary secret can be removed.

Specifically, the number of times of occurrence of the plurality of secret key words in the target key word set is obtained, meanwhile, the number of times of occurrence of the plurality of secret key words is accumulated and summarized, so that all the number of times of occurrence of key words is obtained, and the probability of occurrence of the plurality of secret key words is obtained by comparing the number of times of occurrence of the plurality of secret key words with the ratio obtained by the number of times of occurrence of all the key words. Wherein the plurality of occurrence probabilities reflect probabilities of occurrence of a plurality of secret key words at the time of secret setting. The method comprises the steps of obtaining a plurality of secret key words, providing a basis for follow-up intelligent file secret determination, calculating a plurality of occurrence probabilities, quantifying the occurrence probability of the secret key words when secret determination is carried out, and providing a reliable basis for determining the secret level corresponding to the key words.

Step S400: constructing a file auxiliary secret determination analysis model based on machine learning according to file secret determination data in a preset time range in the past, wherein the file auxiliary secret determination analysis model comprises a plurality of auxiliary secret determination modules corresponding to a plurality of sample file types;

further, as shown in fig. 3, according to the file secret setting data in the past preset time range, based on machine learning, a file auxiliary secret setting analysis model is constructed, and step S400 in the embodiment of the present application further includes:

step S410: acquiring a first keyword occurrence probability set of a plurality of samples and a first secret level of the plurality of samples according to file secret data of a first sample file type in a preset time range in the past;

step S420: constructing a first auxiliary secret setting module corresponding to the first sample file type by adopting the first keyword occurrence probability set of the plurality of samples and the first secret setting level of the plurality of samples;

step S430: and continuously constructing a plurality of other auxiliary secret-fixing modules to obtain the file auxiliary secret-fixing analysis model.

Specifically, the file secret setting data in the past preset time range is obtained after data extraction is performed on files subjected to secret setting in the past preset time range in a database, and can reflect the basic information of the files subjected to secret setting and the data of the secret setting processing process, including the occurrence probability of keywords and secret setting grades. Wherein the past preset time range is a past time period set by the staff himself. The file auxiliary secret determination analysis model is a functional model for performing intelligent analysis on file auxiliary secret determination based on machine learning and comprises a plurality of auxiliary secret determination modules corresponding to various sample file types. Wherein each auxiliary encryption module corresponds to a sample file type.

Specifically, a sample file type is randomly selected from a plurality of sample file types to be used as a first sample file type, and the first sample file type is used as a first object for analyzing file confidentiality. And searching corresponding data in the file fixed-density data by taking the occurrence probability of different keywords and the fixed-density level as indexes to obtain a plurality of sample first keyword occurrence probability sets and a plurality of sample first fixed-density levels, wherein the plurality of sample first keyword occurrence probability sets and the plurality of sample first fixed-density levels are in one-to-one correspondence. Wherein, the fixed secret grade can comprise 1-5 five grades, and the higher the grade is, the higher the secret degree is.

Specifically, the first auxiliary secret determining module is a functional module for performing intelligent analysis on the security level determination condition of the first sample file type, corresponds to the first sample file type, is obtained by training the first keyword occurrence probability sets of the samples and the first secret levels of the samples as construction data, and is characterized in that input data are sample file types and output data are secret determining levels. And constructing a plurality of auxiliary secret determining modules corresponding to other various sample file types according to the same construction method, and summarizing the auxiliary secret determining modules with the first auxiliary secret determining module to obtain the file auxiliary secret determining analysis model.

Further, by adopting the first keyword occurrence probability set of the plurality of samples and the first secret level of the plurality of samples, a first auxiliary secret determination module corresponding to the first sample file type is constructed, and step S420 in this embodiment of the present application further includes:

step S421: randomly selecting P groups of data from the first keyword occurrence probability sets of the samples and the first secret levels of the samples in a put-back way to serve as a first construction data set, wherein P is a positive integer smaller than the number of the first keyword occurrence probability sets of the samples;

step S422: constructing a first auxiliary secret setting unit in the first auxiliary secret setting module by adopting the first construction data set;

step S423: randomly selecting P groups of data from the first keyword occurrence probability sets of the samples and the first secret levels of the samples to be used as a second construction data set, and constructing a second auxiliary secret setting unit;

step S424: and continuing to construct and acquire a plurality of auxiliary secret fixing units in the first auxiliary secret fixing module to acquire the first auxiliary secret fixing module.

Further, by using the first construction data set, a first auxiliary secret setting unit in the first auxiliary secret setting module is constructed, and step S422 in this embodiment of the present application further includes:

step S422-1: based on a BP neural network, constructing a first auxiliary secret setting unit, wherein input data of the first auxiliary secret setting unit is a first keyword occurrence probability set, and output data is a first secret setting grade;

step S422-2: labeling and dividing the first constructed data set to obtain a first training set, a first verification set and a first test set;

step S422-3: and performing supervision training on the first auxiliary secret setting unit by adopting the first training set, the first verification set and the first test set, updating network parameters of the first auxiliary secret setting unit through error back propagation until convergence conditions are reached, and performing verification and test to obtain the first auxiliary secret setting unit with accuracy meeting preset conditions.

Specifically, P groups of data are randomly selected from the first keyword occurrence probability sets of the samples and the first secret levels of the samples in a put-back manner, wherein P is a positive integer smaller than the number of the first keyword occurrence probability sets of the samples, preferably 2/3 of the number of the first keyword occurrence probability sets of the samples, and by the put-back sampling, repeated data may exist in the obtained multiple groups of construction data sets. The construction data of the auxiliary secret setting units are not identical, the auxiliary secret setting units with different performances can be constructed and obtained, and the auxiliary secret setting is carried out jointly through the auxiliary secret setting units, so that the accuracy is higher.

Specifically, the first auxiliary secret setting unit is a secret setting unit obtained by training a unit constructed by taking a BP neural network as a basic framework by taking a first constructed data set as training data, wherein input data is a first keyword occurrence probability set, and output data is a first secret level. And marking and dividing the first constructed data set according to a certain dividing proportion, and training, verifying and testing and marking the dividing result, so that the first training set, the first verifying set and the first testing set are obtained.

Specifically, the first auxiliary secret-fixing unit is subjected to supervision training by using a first training set, and the network parameters of the first auxiliary secret-fixing unit are updated by utilizing error back propagation in the training process until the model output result reaches convergence, so that the first auxiliary secret-fixing unit with the training completed is obtained. And inputting the first keyword occurrence probability sets of the plurality of samples in the first verification set into the converged first auxiliary secret setting unit to obtain a plurality of output verification sample first secret setting levels. Matching the first secret level of the plurality of verification samples with the first secret level of the plurality of samples, taking the proportion of successful matching as the accuracy rate, judging whether the accuracy rate meets the preset condition, namely whether the accuracy rate meets the preset accuracy rate, and if so, passing the verification; if the accuracy rate does not meet the preset condition, acquiring more training data to train the first auxiliary secret fixing unit. And inputting the first test set into the first auxiliary secret setting unit to obtain the operation adaptability of the first auxiliary secret setting unit, and when the operation adaptability meets the preset condition, passing the test.

Specifically, the second auxiliary secret setting unit is constructed by randomly selecting the P groups of data again in a put-back way and taking the P groups of data as a second construction data set. And randomly selecting P groups of data through multiple times of replacement, respectively obtaining a plurality of constructed data sets, constructing a plurality of auxiliary secret fixing units, for example 10 auxiliary secret fixing units. And the first auxiliary secret fixing module is obtained by connecting the first auxiliary secret fixing unit, the second auxiliary secret fixing unit and the plurality of auxiliary secret fixing units in parallel.

Step S500: inputting the multiple occurrence probabilities into corresponding target auxiliary secret setting modules according to the file types to obtain a secret setting analysis result set, wherein each auxiliary secret setting module comprises multiple auxiliary secret setting units; and

step S600: and acquiring a target secret level according to the secret analysis result set, and performing auxiliary secret determination on the target file.

Further, according to the set of the secret analysis results, a target secret level is obtained, and step S600 in the embodiment of the present application further includes:

step S610: obtaining the occurrence probability of a plurality of fixed density grades in the fixed density analysis result set;

step S620: and selecting a fixed density grade with the largest occurrence probability as the target fixed density grade, and if at least two fixed density grades with the same occurrence probability exist, calculating the average value of the at least two fixed density grades and rounding the average value to be used as the target fixed density grade.

Specifically, by taking the file type as an index, the occurrence probabilities are input into a plurality of auxiliary secret fixing modules in the target auxiliary secret fixing modules corresponding to the file type, and the secret fixing analysis result set is obtained through intelligent operation of the modules, wherein the secret fixing analysis results in the secret fixing analysis result set are in one-to-one correspondence with the auxiliary secret fixing units. The secret analysis result set is a result set obtained by determining secret levels corresponding to a plurality of keywords. The target auxiliary secret determination module is an intelligent module for carrying out auxiliary secret determination analysis on a target file and comprises a plurality of auxiliary secret determination units.

Specifically, after the set of the fixed density analysis results is obtained, the fixed density grade in the set is extracted, so that the target fixed density grade is obtained. And carrying out auxiliary encryption on the target file according to the target encryption grade. Because the construction data of the auxiliary secret determination units are not identical, the multiple secret determination analysis results may not be identical. And extracting the occurrence probability of a plurality of secret grading in the secret grading analysis result set, and determining the occurrence probability of each secret grading. Preferably, the ratio result is taken as the occurrence probability of the plurality of fixed density levels by comparing the occurrence number of each fixed density level with the occurrence number of all fixed density levels. And selecting a fixed density grade with the highest probability from the occurrence probabilities of the plurality of fixed density grades, and taking the fixed density grade as a target fixed density grade.

Preferably, when at least two fixed density grades with the same occurrence probability exist, calculating the average value of the at least two fixed density grades, rounding the calculation result, and taking the rounded result as the target fixed density grade. The target secret rating is the rating for keeping secret the target file. The technical effect of intelligently assisting in fixing the target file and improving the accuracy is achieved.

In summary, the embodiments of the present application have at least the following technical effects:

according to the method, the file type corresponding to the target file needing to be fixed is obtained to serve as a fixed secret basis, further, the target file is subjected to text preprocessing such as duplicate removal and the like, the keyword is extracted, a plurality of fixed secret keywords are obtained through analysis of the obtained keywords, the dimension of the keywords is reduced, the aim of improving auxiliary fixed secret accuracy is achieved, meanwhile, a plurality of occurrence probabilities of the plurality of fixed secret keywords are calculated, a file auxiliary fixed secret analysis model is constructed through historical data, a foundation is provided for intelligent auxiliary fixed secret, the aim of improving fixed secret accuracy is achieved, the file type serves as an index, the plurality of occurrence probabilities are input into the corresponding target auxiliary fixed secret module, a fixed secret analysis result set is obtained, the target fixed secret grade is obtained, and auxiliary fixed secret is carried out on the target file. The technical effects of improving the accuracy and efficiency of file encryption and further guaranteeing the safety of file contents are achieved.

Example two

Based on the same inventive concept as the auxiliary secret determination method based on machine learning in the foregoing embodiments, as shown in fig. 4, the present application provides an auxiliary secret determination system based on machine learning, and the system and method embodiments in the embodiments of the present application are based on the same inventive concept. Wherein the system comprises:

a file type obtaining module 11, where the file type obtaining module 11 is configured to obtain a file type of a target file to be encrypted;

the keyword obtaining module 12 is configured to perform text preprocessing on the target file to obtain a target keyword set including a plurality of keywords;

the occurrence probability calculation module 13 is configured to input the plurality of keywords into a secret key word database for indexing, obtain a plurality of secret key words, and calculate and obtain a plurality of occurrence probabilities of the plurality of secret key words in the target keyword set;

the secret setting model construction module 14 is used for constructing a file auxiliary secret setting analysis model based on machine learning according to file secret setting data in a past preset time range, wherein the file auxiliary secret setting analysis model comprises a plurality of auxiliary secret setting modules corresponding to a plurality of sample file types;

the secret determination analysis module 15 is configured to input the multiple occurrence probabilities into a corresponding target auxiliary secret determination module according to the file type, to obtain a secret determination analysis result set, where each auxiliary secret determination module includes multiple auxiliary secret determination units; and

the auxiliary secret setting module 16 is used for obtaining a target secret level according to the secret setting analysis result set and carrying out auxiliary secret setting on the target file by the auxiliary secret setting module 16.

Further, the system further comprises:

the file attribution obtaining unit is used for obtaining the data type, the file theme and the file attribution of the target file;

the file type setting unit is used for taking the data type, the file theme and the file attribution as the file type.

Further, the system further comprises:

the text preprocessing unit is used for preprocessing the text according to the secret files of the plurality of sample file types within the past preset time range to obtain a plurality of sample secret keyword sets;

the sub-database construction unit is used for constructing a plurality of secret key word sub-databases by taking the plurality of sample file types as index elements according to the plurality of sample secret key word sets;

the secret key setting unit is used for inputting the keywords into a secret key sub-database corresponding to the file type for indexing, and obtaining repeated keywords serving as the secret key sub-databases;

the occurrence probability obtaining unit is used for calculating the ratio of the occurrence times of the plurality of secret key words in the target key word set to the occurrence times of all key words, and obtaining the plurality of occurrence probabilities.

Further, the system further comprises:

the sample secret rating obtaining unit is used for obtaining a plurality of sample first keyword occurrence probability sets and a plurality of sample first secret ratings according to file secret data of a first sample file type in a past preset time range;

the first auxiliary module construction unit is used for constructing a first auxiliary encryption module corresponding to the first sample file type by adopting the first keyword occurrence probability set of the plurality of samples and the first encryption level of the plurality of samples;

the auxiliary secret determination analysis model obtaining unit is used for continuously constructing a plurality of other auxiliary secret determination modules to obtain the file auxiliary secret determination analysis model.

Further, the system further comprises:

the first construction data set setting unit is used for randomly selecting P groups of data from the first keyword occurrence probability sets of the samples and the first secret levels of the samples in a put-back mode to serve as a first construction data set, wherein P is a positive integer smaller than the number of the first keyword occurrence probability sets of the samples;

a first auxiliary unit for constructing a first auxiliary secret unit within the first auxiliary secret module using the first construction dataset;

the second auxiliary unit is used for randomly selecting P groups of data from the first keyword occurrence probability sets of the samples and the first secret levels of the samples again in a put-back mode, taking the P groups of data as a second construction data set and constructing a second auxiliary secret setting unit;

the first auxiliary secret fixing module obtaining unit is used for continuously constructing and obtaining a plurality of auxiliary secret fixing units in the first auxiliary secret fixing module to obtain the first auxiliary secret fixing module.

Further, the system further comprises:

the device comprises a first auxiliary secret setting unit construction unit, a second auxiliary secret setting unit construction unit and a second secret setting unit, wherein the first auxiliary secret setting unit construction unit is used for constructing the first auxiliary secret setting unit based on a BP neural network, input data of the first auxiliary secret setting unit is a first keyword occurrence probability set, and output data is a first secret level;

the data set labeling and dividing unit is used for labeling and dividing the first constructed data set to obtain a first training set, a first verification set and a first test set;

the monitoring training unit is used for monitoring training the first auxiliary secret setting unit by adopting the first training set, the first verification set and the first test set, updating network parameters of the first auxiliary secret setting unit through error back propagation until convergence conditions are reached, and performing verification and test to obtain the first auxiliary secret setting unit with accuracy meeting preset conditions.

Further, the system further comprises:

the plurality of occurrence probability obtaining units are used for obtaining the occurrence probabilities of a plurality of secret setting grades in the secret setting analysis result set;

the target secret rating setting unit is used for selecting the secret rating with the largest occurrence probability as the target secret rating, and if at least two secret ratings with the same occurrence probability exist, calculating the mean value of the at least two secret ratings and rounding the mean value to serve as the target secret rating.

It should be noted that the sequence of the embodiments of the present application is merely for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The foregoing description of the preferred embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to the particular embodiments of the present application.

The specification and drawings are merely exemplary of the application and are to be regarded as covering any and all modifications, variations, combinations, or equivalents that are within the scope of the application. It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the present application and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A machine learning-based auxiliary encryption method, the method comprising:

acquiring the file type of a target file to be subjected to decryption;

2. The method of claim 1, wherein obtaining a file type of the target file to be secured comprises:

acquiring the data type, the file theme and the file attribution of the target file;

and taking the data type, the file theme and the file attribution as the file type.

3. The method of claim 1, wherein inputting the plurality of keywords into a secret keyword database for indexing, obtaining a plurality of secret keywords, and calculating to obtain a plurality of occurrence probabilities of the plurality of secret keywords within the target keyword set, comprises:

performing text preprocessing according to the secret files of the plurality of sample file types within the past preset time range to obtain a plurality of sample secret keyword sets;

constructing a plurality of secret key word sub-databases by taking the plurality of sample file types as index elements according to the plurality of sample secret key word sets;

inputting the keywords into a secret key word sub-database corresponding to the file type for indexing, and obtaining repeated keywords serving as the secret key words;

and calculating the ratio of the occurrence times of the plurality of secret key words in the target key word set to the occurrence times of all key words, and obtaining the plurality of occurrence probabilities.

4. The method of claim 1, wherein constructing a file assisted encryption analysis model based on machine learning based on file encryption data over a predetermined time frame in the past comprises:

acquiring a first keyword occurrence probability set of a plurality of samples and a first secret level of the plurality of samples according to file secret data of a first sample file type in a preset time range in the past;

constructing a first auxiliary secret setting module corresponding to the first sample file type by adopting the first keyword occurrence probability set of the plurality of samples and the first secret setting level of the plurality of samples;

and continuously constructing a plurality of other auxiliary secret-fixing modules to obtain the file auxiliary secret-fixing analysis model.

5. The method of claim 4, wherein constructing a first auxiliary encryption module corresponding to the first sample file type using the plurality of sample first keyword occurrence probability sets and the plurality of sample first encryption levels comprises:

randomly selecting P groups of data from the first keyword occurrence probability sets of the samples and the first secret levels of the samples in a put-back way to serve as a first construction data set, wherein P is a positive integer smaller than the number of the first keyword occurrence probability sets of the samples;

constructing a first auxiliary secret setting unit in the first auxiliary secret setting module by adopting the first construction data set;

randomly selecting P groups of data from the first keyword occurrence probability sets of the samples and the first secret levels of the samples to be used as a second construction data set, and constructing a second auxiliary secret setting unit;

and continuing to construct and acquire a plurality of auxiliary secret fixing units in the first auxiliary secret fixing module to acquire the first auxiliary secret fixing module.

6. The method of claim 5, wherein constructing a first auxiliary secret unit within the first auxiliary secret module using the first construction dataset comprises:

based on a BP neural network, constructing a first auxiliary secret setting unit, wherein input data of the first auxiliary secret setting unit is a first keyword occurrence probability set, and output data is a first secret setting grade;

labeling and dividing the first constructed data set to obtain a first training set, a first verification set and a first test set;

and performing supervision training on the first auxiliary secret setting unit by adopting the first training set, the first verification set and the first test set, updating network parameters of the first auxiliary secret setting unit through error back propagation until convergence conditions are reached, and performing verification and test to obtain the first auxiliary secret setting unit with accuracy meeting preset conditions.

7. The method of claim 1, wherein obtaining a target secret rating based on the set of secret analysis results comprises:

obtaining the occurrence probability of a plurality of fixed density grades in the fixed density analysis result set;

and selecting a fixed density grade with the largest occurrence probability as the target fixed density grade, and if at least two fixed density grades with the same occurrence probability exist, calculating the average value of the at least two fixed density grades and rounding the average value to be used as the target fixed density grade.

8. An auxiliary secret determination system based on machine learning, the system comprising: