CN115169293A

CN115169293A - Text steganalysis method, system, device and storage medium

Info

Publication number: CN115169293A
Application number: CN202211068809.9A
Authority: CN
Inventors: 付章杰; 于琪
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2022-09-02
Filing date: 2022-09-02
Publication date: 2022-10-11

Abstract

The invention discloses a text steganalysis method, a system, a device and a storage medium, comprising the following steps: acquiring a text to be analyzed, inputting the text to be analyzed into a pre-trained multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a steganographic text; when the multi-graph neural network is trained, a logic graph, a semantic graph and a syntactic graph are generated by utilizing texts in a training sample set, statistical relations, semantic relations and syntactic relations among the texts are analyzed, the three relations are integrated to carry out message updating and feature extraction on the texts, and features with higher distinguishability are obtained, so that the defect that a sequence model does not consider global features in steganalysis is made up, and the analysis efficiency of the multi-graph neural network is greatly improved; and performing inter-graph fusion on the three updated graphs to obtain a total graph, and pooling the total graph to obtain a final representation of the text to be analyzed, so that the final representation contains richer information, and the accuracy of steganalysis of the text is improved.

Description

Text steganalysis method, system, device and storage medium

Technical Field

The invention relates to a text steganalysis method, a system, a device and a storage medium, belonging to the technical field of encryption.

Background

With the continuous development of the internet, people frequently use the internet to communicate with each other, and the safety problem in information transmission cannot be ignored; lawbreakers hide secret information into a text in a certain steganography mode for invisible transmission, which brings huge hidden danger to life and property safety and social stability of people; the method comprises the steps of analyzing a text to judge whether secret information is contained in the text and is widely accepted, wherein one method is a text steganalysis method based on a neural network, extracting text features by using the neural network, and judging whether the text is steganographically or not according to different distributions of the text features in a high-dimensional semantic space.

At present, methods for performing text steganalysis by using a neural network include: extracting features of different scales of the text by using convolution kernels with different sizes for judgment; the local features and the global features extracted by the convolutional neural network and the cyclic neural network are subjected to fusion feature analysis; and extracting the salient features of the text by using a multi-head attention mechanism for judgment.

Combining the text local features and the text long-distance features extracted by the convolutional neural network and the cyclic neural network; the features extracted by the method are more distinguishable, but some irrelevant redundant features exist in the features, so that the text steganography efficiency is influenced.

Extracting text saliency features by using a multi-head attention mechanism; according to the method, suspicious information in the text can be more concerned by using a multi-head attention mechanism, and the multi-head operation can accelerate the feature extraction speed so as to improve the efficiency of text steganalysis; however, only the feature relation in the current text is focused, and the global correlation between the texts is not considered.

Disclosure of Invention

The invention aims to provide a text steganalysis method, a system, a device and a storage medium, which solve the problems that the text steganalysis efficiency is low, the global correlation among texts is not considered, and the like in the prior art.

In order to realize the purpose, the invention is realized by adopting the following technical scheme:

a method of textual steganalysis comprising:

acquiring a text to be analyzed;

inputting a text to be analyzed into a pre-trained multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a steganographic text;

the multi-graph neural network is trained by:

acquiring a training sample set, and converting texts in the training sample set into word vectors;

inputting the word vectors into a composition module during each training, generating three graphs comprising a logic graph, a linguistic intention and a syntactic graph, and updating the information in the three graphs according to the information of each target node in the three graphs and the nodes around the target node;

carrying out inter-graph fusion on the three updated graphs to obtain a total graph;

performing graph pooling on the general graph to obtain a final representation of a text;

inputting the final representation of the text into a classifier to obtain classifier output;

and updating the multi-graph neural network by taking the cross entropy function as a loss function according to the output of the classifier, and repeatedly performing iterative training until the texts in the training sample set are used up to obtain the trained multi-graph neural network.

Preferably, the training sample set consists of a steganographic sample data set and a normal sample data set.

Preferably, in the process of generating the logic diagram, the edge weight in the logic diagram is calculated by the following formula:

wherein, the first and the second end of the pipe are connected with each other,

is a word in a logic diagrama、bThe edge weight of the edge in between,

representing wordsa、bThe probability of co-occurrence of each other,

representing wordsaThe probability of occurrence in the corpus is,

representing wordsbProbability of occurrence in the corpus.

Preferably, in the process of generating the semantic graph, the edge weight in the semantic graph is calculated by the following formula:

representing words in a semantic grapha、bThe edge weight of the edge in between,

representing wordsa、bThe number of the sliding windows with semantic relation,

representing wordsa、bThe number of simultaneously occurring sliding windows.

Preferably, in generating the syntax map, the edge weights in the syntax map are calculated by the following formula:

wherein the content of the first and second substances,

representing words in a syntactic grapha、bThe edge weight of the edge in between,

representing wordsa、bThe number of sliding windows with syntactic relations,

representing wordsa、bThe number of simultaneously occurring sliding windows.

Preferably, the intra-graph information updating of the logic diagram, the linguistic meaning and the syntactic diagram includes:

for any target node in any graph, information is collected from the surrounding nodes of each target node in the graph by:

wherein the content of the first and second substances,mnwhich is indicative of the information that was collected,maxthe representation takes the maximum value of each dimension in the surrounding node information,

indicating connection to target nodepThe number of the nodes is one,e _c representing wordscThe weight between the node and the target node,

representing wordscThe word vector of (2);

aggregating the collected information with the target node itself by:

wherein the content of the first and second substances,

representing wordsaThe aggregated word vector is then used to generate a new word vector,brepresents information toThe extent of the retention is such that,

。

preferably, the expression of the penalty function is:

wherein the content of the first and second substances,y _i a prediction tag that represents a sample of the sample,p _i a prediction tag that represents a sample is provided,Nis the number of samples.

A text steganalysis system comprising:

a text acquisition module: the method comprises the steps of obtaining a text to be analyzed;

the text steganalysis module: the method comprises the steps that a text to be analyzed is input into a pre-trained multi-graph neural network to obtain network output, if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and if not, the text to be analyzed is a steganographic text;

the text steganalysis module comprises a network training unit and is used for training the multi-graph neural network by the following method:

pooling the general graph to obtain a final representation of the text;

and updating the multi-graph neural network by taking the cross entropy function as a loss function according to the output of the classifier, and repeatedly performing iterative training until the texts in the training sample set are used completely to obtain the trained multi-graph neural network.

A text steganalysis device comprises a processor and a storage medium;

the storage medium is to store instructions;

the processor is configured to operate according to the instructions to perform the steps of any of the above methods.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the above.

Compared with the prior art, the invention has the following beneficial effects:

according to the text steganalysis method, the text steganalysis system, the text steganalysis device and the storage medium, steganalysis is carried out on a text to be analyzed through a multi-graph neural network trained in advance, a composition module in the multi-graph neural network is utilized to generate a logic diagram, a semantic diagram and a syntactic diagram, statistical relations, semantic relations and syntactic relations among the texts are analyzed, the three relations are integrated to carry out message updating and feature extraction on the text, features with higher distinguishing degree are obtained, the defect that a sequence model does not consider global features in steganalysis is made up, and the analysis efficiency of the multi-graph neural network is greatly improved; and performing inter-graph fusion on the updated logic diagram, the semantic meaning and the syntactic diagram to obtain a total diagram, pooling the total diagram to obtain a final representation of the text to be analyzed, so that the final representation contains richer information, and the accuracy of steganalysis of the text is improved.

Drawings

Fig. 1 is a flowchart of a text steganalysis method according to an embodiment of the present invention;

FIG. 2 is a second flowchart of a text steganalysis method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of updating the information in the graph according to an embodiment of the present invention.

Detailed Description

The present invention is further described with reference to the accompanying drawings, and the following examples are only for clearly illustrating the technical solutions of the present invention, and should not be taken as limiting the scope of the present invention.

Example 1

As shown in fig. 1, a text steganalysis method provided in an embodiment of the present invention includes:

s1, obtaining a text to be analyzed.

And receiving the text to be analyzed through a communication receiving terminal.

And S2, inputting the text to be analyzed into a pre-trained multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a text containing a secret number.

The multi-graph neural network is trained in advance, a training sample set is formed by a steganographic sample data set and a normal sample data set, and after each training, parameters of the multi-graph neural network are updated by taking a cross entropy function as a loss function until a text in the training sample set is used completely, so that the trained multi-graph neural network is obtained.

In this embodiment, the specific process of training is as follows:

6000 steganographic samples generated in an RNN-stega steganography mode are used as a steganographic sample data set, 6000 normal samples are captured from a real scene to be used as a normal sample data set, a training sample set is formed by the steganographic sample data set and the normal sample data set, and the training sample set comprises 12000 texts.

Training sample set containing 12000 texts

Inputting the data into an embedded layer in a multi-graph neural network, and converting text into word vectors to obtain a set of word vectors

。

Aggregating word vectorsXInputting into a composition module in the multi-graph neural network to generate three graphs

Respectively are a logic diagram, a language intention and a syntactic diagram, and each diagram is represented as

Wherein

A node of a word is represented and,

representing the edge weight.

The edge weights in the logic diagram are calculated by the following formula:

is a word in a logic diagrama、bThe edge weight of the edge in between,

representing wordsa、bThe probability of co-occurrence of each other,

representing wordsaThe probability of occurrence in the corpus is,

representing wordsbProbability of occurrence in the corpus.

The edge weights in the semantic graph are calculated by the following formula:

representing words

The number of simultaneously occurring sliding windows.

The edge weights in the syntax diagrams are calculated by the following formula:

representing wordsa、bThe number of sliding windows with syntactic relations,

representing wordsa、bThe number of simultaneously occurring sliding windows.

The three graphs are respectively updated with the intra-graph information, taking the process of updating the target node in a single graph as an example (as shown in fig. 3, the graph in the figure is updated with the target node in the single graphaIs a target node, andaall connected by solid lines are its surrounding nodes); the target node updating process comprises two steps: and (4) collecting and polymerizing.

First, for any target node in any graph, information is collected from the surrounding nodes of each target node in the graph by the following formula:

representing wordscThe word vector of (2);

then the collected information is aggregated with the target node by the following formula:

wherein the content of the first and second substances,

representing wordsaThe aggregated word vector is then used to generate a new word vector,bindicating the extent to which the information is to be retained,

。

the three graphs finally obtain the updating results as follows:

。

in order to enable the obtained text to contain richer information, the updated results of the three pictures are subjected to inter-picture fusion to obtain a general picture containing logic, semantic and syntactic relations among the texts:

。

and performing graph pooling operation on the general graph to obtain a final representation of the text:

。

the final representation of the text is input to a classifier:

output of the classifierpThe text is judged whether the text contains the secret information or not by the following method, wherein the value is a numerical value between 0 and 1: we set a threshold for it asηWhen is coming into contact with

When the text is considered as containing the ciphertext, when

The text is considered to be normal text.

The expression for the loss function is:

After the multi-graph neural network is trained, inputting the text to be analyzed into the multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a dense text.

The Text steganalysis method provided by the embodiment of the invention can be represented by a flow chart shown in fig. 2, the Text to be analyzed is input into a pre-trained multi-graph neural network, the result of the pooling is input into a classifier to finally obtain the output of the multi-graph neural network after intra-graph information updating, inter-graph information fusion (namely the result obtained after the updating of the three graphs is subjected to inter-graph fusion), and pooling, if the network output is smaller than a preset threshold value, the Text to be analyzed is a normal Text, otherwise, the Text to be analyzed is a dense Text, and steganalysis of the Text to be analyzed is completed.

Example 2

The embodiment of the invention provides a text steganalysis system, which comprises:

pooling the general graph to obtain a final representation of the text;

Example 3

The embodiment of the invention provides a text steganalysis device, which comprises a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate in accordance with the instructions to perform the steps of the method of:

acquiring a text to be analyzed;

the multi-graph neural network is trained by:

pooling the general graph to obtain a final representation of the text;

Example 4

An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps of a method:

acquiring a text to be analyzed;

the multi-graph neural network is trained by:

inputting the word vectors into a composition module during each training, generating three graphs comprising a logic graph, a language intention and a syntactic graph, and updating the graph information of the three graphs according to the information of each target node in the three graphs and the nodes around the target node;

pooling the general graph to obtain a final representation of the text;

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims

1. A method for steganalysis of text comprising:

acquiring a text to be analyzed;

the multi-graph neural network is trained by:

pooling the general graph to obtain a final representation of the text;

2. The method according to claim 1, wherein the training sample set comprises a steganographic sample set and a normal sample set.

3. The method of claim 1, wherein in the step of generating the logic diagram, the edge weight in the logic diagram is calculated by the following formula:

wherein the content of the first and second substances,

is a word in a logic diagrama、bThe edge weight of the edge in between,

representing wordsa、bThe probability of co-occurrence of each other,

representing wordsaThe probability of occurrence in the corpus is,

representing wordsbProbability of occurrence in the corpus.

4. The method of claim 1, wherein the edge weights in the semantic graph are calculated by the following formula during the generation of the semantic meaning:

wherein the content of the first and second substances,

representing wordsa、bThe number of simultaneously occurring sliding windows.

5. The method of claim 1, wherein in the step of generating the syntax map, the edge weight in the syntax map is calculated by the following formula:

representing wordsa、bThe number of sliding windows with syntactic relations,

representing wordsa、bThe number of simultaneously occurring sliding windows.

6. The method of claim 1, wherein the intra-graph information updating of the logic diagram, the linguistic intent and the syntactic diagram comprises:

indicating connection to target nodepThe number of the nodes is equal to the number of the nodes,e _c representing wordscThe weight between the node and the target node,

representing wordscThe word vector of (2);

aggregating the collected information with the target node itself by:

representing wordsaThe aggregated word vector is then used to generate a word vector,bindicating the extent to which the information is to be retained,

。

7. the method of claim 1, wherein the expression of the loss function is:

wherein, the first and the second end of the pipe are connected with each other,y _i a prediction tag that represents a sample is provided,p _i a prediction tag that represents a sample is provided,Nis the number of samples.

8. A text steganalysis system comprising:

the text steganalysis module: the system comprises a text to be analyzed, a pre-trained multi-graph neural network and a network output device, wherein the text to be analyzed is input into the pre-trained multi-graph neural network to obtain the network output, if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a steganographic text;

inputting the word vector into a composition module during each training, generating three graphs comprising a logic graph, a language intention and a syntactic graph, and updating the information in the three graphs according to the information of each target node in the three graphs and the nodes around the target node;

9. A text steganalysis device is characterized by comprising a processor and a storage medium;

the storage medium is used for storing instructions;

the processor is configured to operate according to the instructions to perform the steps of the method according to any one of claims 1 to 7.

10. Computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of a method according to any one of claims 1 to 7.