CN115169293A - Text steganalysis method, system, device and storage medium - Google Patents

Text steganalysis method, system, device and storage medium Download PDF

Info

Publication number
CN115169293A
CN115169293A CN202211068809.9A CN202211068809A CN115169293A CN 115169293 A CN115169293 A CN 115169293A CN 202211068809 A CN202211068809 A CN 202211068809A CN 115169293 A CN115169293 A CN 115169293A
Authority
CN
China
Prior art keywords
text
graph
analyzed
neural network
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211068809.9A
Other languages
Chinese (zh)
Inventor
付章杰
于琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202211068809.9A priority Critical patent/CN115169293A/en
Publication of CN115169293A publication Critical patent/CN115169293A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Bioethics (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text steganalysis method, a system, a device and a storage medium, comprising the following steps: acquiring a text to be analyzed, inputting the text to be analyzed into a pre-trained multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a steganographic text; when the multi-graph neural network is trained, a logic graph, a semantic graph and a syntactic graph are generated by utilizing texts in a training sample set, statistical relations, semantic relations and syntactic relations among the texts are analyzed, the three relations are integrated to carry out message updating and feature extraction on the texts, and features with higher distinguishability are obtained, so that the defect that a sequence model does not consider global features in steganalysis is made up, and the analysis efficiency of the multi-graph neural network is greatly improved; and performing inter-graph fusion on the three updated graphs to obtain a total graph, and pooling the total graph to obtain a final representation of the text to be analyzed, so that the final representation contains richer information, and the accuracy of steganalysis of the text is improved.

Description

Text steganalysis method, system, device and storage medium
Technical Field
The invention relates to a text steganalysis method, a system, a device and a storage medium, belonging to the technical field of encryption.
Background
With the continuous development of the internet, people frequently use the internet to communicate with each other, and the safety problem in information transmission cannot be ignored; lawbreakers hide secret information into a text in a certain steganography mode for invisible transmission, which brings huge hidden danger to life and property safety and social stability of people; the method comprises the steps of analyzing a text to judge whether secret information is contained in the text and is widely accepted, wherein one method is a text steganalysis method based on a neural network, extracting text features by using the neural network, and judging whether the text is steganographically or not according to different distributions of the text features in a high-dimensional semantic space.
At present, methods for performing text steganalysis by using a neural network include: extracting features of different scales of the text by using convolution kernels with different sizes for judgment; the local features and the global features extracted by the convolutional neural network and the cyclic neural network are subjected to fusion feature analysis; and extracting the salient features of the text by using a multi-head attention mechanism for judgment.
Combining the text local features and the text long-distance features extracted by the convolutional neural network and the cyclic neural network; the features extracted by the method are more distinguishable, but some irrelevant redundant features exist in the features, so that the text steganography efficiency is influenced.
Extracting text saliency features by using a multi-head attention mechanism; according to the method, suspicious information in the text can be more concerned by using a multi-head attention mechanism, and the multi-head operation can accelerate the feature extraction speed so as to improve the efficiency of text steganalysis; however, only the feature relation in the current text is focused, and the global correlation between the texts is not considered.
Disclosure of Invention
The invention aims to provide a text steganalysis method, a system, a device and a storage medium, which solve the problems that the text steganalysis efficiency is low, the global correlation among texts is not considered, and the like in the prior art.
In order to realize the purpose, the invention is realized by adopting the following technical scheme:
a method of textual steganalysis comprising:
acquiring a text to be analyzed;
inputting a text to be analyzed into a pre-trained multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a steganographic text;
the multi-graph neural network is trained by:
acquiring a training sample set, and converting texts in the training sample set into word vectors;
inputting the word vectors into a composition module during each training, generating three graphs comprising a logic graph, a linguistic intention and a syntactic graph, and updating the information in the three graphs according to the information of each target node in the three graphs and the nodes around the target node;
carrying out inter-graph fusion on the three updated graphs to obtain a total graph;
performing graph pooling on the general graph to obtain a final representation of a text;
inputting the final representation of the text into a classifier to obtain classifier output;
and updating the multi-graph neural network by taking the cross entropy function as a loss function according to the output of the classifier, and repeatedly performing iterative training until the texts in the training sample set are used up to obtain the trained multi-graph neural network.
Preferably, the training sample set consists of a steganographic sample data set and a normal sample data set.
Preferably, in the process of generating the logic diagram, the edge weight in the logic diagram is calculated by the following formula:
Figure DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,
Figure 544018DEST_PATH_IMAGE002
is a word in a logic diagramabThe edge weight of the edge in between,
Figure DEST_PATH_IMAGE003
representing wordsabThe probability of co-occurrence of each other,
Figure 24940DEST_PATH_IMAGE004
representing wordsaThe probability of occurrence in the corpus is,
Figure DEST_PATH_IMAGE005
representing wordsbProbability of occurrence in the corpus.
Preferably, in the process of generating the semantic graph, the edge weight in the semantic graph is calculated by the following formula:
Figure 804677DEST_PATH_IMAGE006
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE007
representing words in a semantic graphabThe edge weight of the edge in between,
Figure 262203DEST_PATH_IMAGE008
representing wordsabThe number of the sliding windows with semantic relation,
Figure DEST_PATH_IMAGE009
representing wordsabThe number of simultaneously occurring sliding windows.
Preferably, in generating the syntax map, the edge weights in the syntax map are calculated by the following formula:
Figure 138893DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE011
representing words in a syntactic graphabThe edge weight of the edge in between,
Figure 23672DEST_PATH_IMAGE012
representing wordsabThe number of sliding windows with syntactic relations,
Figure DEST_PATH_IMAGE013
representing wordsabThe number of simultaneously occurring sliding windows.
Preferably, the intra-graph information updating of the logic diagram, the linguistic meaning and the syntactic diagram includes:
for any target node in any graph, information is collected from the surrounding nodes of each target node in the graph by:
Figure 290705DEST_PATH_IMAGE014
wherein the content of the first and second substances,mnwhich is indicative of the information that was collected,maxthe representation takes the maximum value of each dimension in the surrounding node information,
Figure DEST_PATH_IMAGE015
indicating connection to target nodepThe number of the nodes is one,e c representing wordscThe weight between the node and the target node,
Figure 50457DEST_PATH_IMAGE016
representing wordscThe word vector of (2);
aggregating the collected information with the target node itself by:
Figure DEST_PATH_IMAGE017
wherein the content of the first and second substances,
Figure 781653DEST_PATH_IMAGE018
representing wordsaThe aggregated word vector is then used to generate a new word vector,brepresents information toThe extent of the retention is such that,
Figure DEST_PATH_IMAGE019
preferably, the expression of the penalty function is:
Figure 837334DEST_PATH_IMAGE020
wherein the content of the first and second substances,y i a prediction tag that represents a sample of the sample,p i a prediction tag that represents a sample is provided,Nis the number of samples.
A text steganalysis system comprising:
a text acquisition module: the method comprises the steps of obtaining a text to be analyzed;
the text steganalysis module: the method comprises the steps that a text to be analyzed is input into a pre-trained multi-graph neural network to obtain network output, if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and if not, the text to be analyzed is a steganographic text;
the text steganalysis module comprises a network training unit and is used for training the multi-graph neural network by the following method:
acquiring a training sample set, and converting texts in the training sample set into word vectors;
inputting the word vectors into a composition module during each training, generating three graphs comprising a logic graph, a linguistic intention and a syntactic graph, and updating the information in the three graphs according to the information of each target node in the three graphs and the nodes around the target node;
carrying out inter-graph fusion on the three updated graphs to obtain a total graph;
pooling the general graph to obtain a final representation of the text;
inputting the final representation of the text into a classifier to obtain classifier output;
and updating the multi-graph neural network by taking the cross entropy function as a loss function according to the output of the classifier, and repeatedly performing iterative training until the texts in the training sample set are used completely to obtain the trained multi-graph neural network.
A text steganalysis device comprises a processor and a storage medium;
the storage medium is to store instructions;
the processor is configured to operate according to the instructions to perform the steps of any of the above methods.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the above.
Compared with the prior art, the invention has the following beneficial effects:
according to the text steganalysis method, the text steganalysis system, the text steganalysis device and the storage medium, steganalysis is carried out on a text to be analyzed through a multi-graph neural network trained in advance, a composition module in the multi-graph neural network is utilized to generate a logic diagram, a semantic diagram and a syntactic diagram, statistical relations, semantic relations and syntactic relations among the texts are analyzed, the three relations are integrated to carry out message updating and feature extraction on the text, features with higher distinguishing degree are obtained, the defect that a sequence model does not consider global features in steganalysis is made up, and the analysis efficiency of the multi-graph neural network is greatly improved; and performing inter-graph fusion on the updated logic diagram, the semantic meaning and the syntactic diagram to obtain a total diagram, pooling the total diagram to obtain a final representation of the text to be analyzed, so that the final representation contains richer information, and the accuracy of steganalysis of the text is improved.
Drawings
Fig. 1 is a flowchart of a text steganalysis method according to an embodiment of the present invention;
FIG. 2 is a second flowchart of a text steganalysis method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of updating the information in the graph according to an embodiment of the present invention.
Detailed Description
The present invention is further described with reference to the accompanying drawings, and the following examples are only for clearly illustrating the technical solutions of the present invention, and should not be taken as limiting the scope of the present invention.
Example 1
As shown in fig. 1, a text steganalysis method provided in an embodiment of the present invention includes:
s1, obtaining a text to be analyzed.
And receiving the text to be analyzed through a communication receiving terminal.
And S2, inputting the text to be analyzed into a pre-trained multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a text containing a secret number.
The multi-graph neural network is trained in advance, a training sample set is formed by a steganographic sample data set and a normal sample data set, and after each training, parameters of the multi-graph neural network are updated by taking a cross entropy function as a loss function until a text in the training sample set is used completely, so that the trained multi-graph neural network is obtained.
In this embodiment, the specific process of training is as follows:
6000 steganographic samples generated in an RNN-stega steganography mode are used as a steganographic sample data set, 6000 normal samples are captured from a real scene to be used as a normal sample data set, a training sample set is formed by the steganographic sample data set and the normal sample data set, and the training sample set comprises 12000 texts.
Training sample set containing 12000 texts
Figure DEST_PATH_IMAGE021
Inputting the data into an embedded layer in a multi-graph neural network, and converting text into word vectors to obtain a set of word vectors
Figure 591663DEST_PATH_IMAGE022
Aggregating word vectorsXInputting into a composition module in the multi-graph neural network to generate three graphs
Figure DEST_PATH_IMAGE023
Respectively are a logic diagram, a language intention and a syntactic diagram, and each diagram is represented as
Figure 390992DEST_PATH_IMAGE024
Wherein
Figure 914377DEST_PATH_IMAGE025
A node of a word is represented and,
Figure DEST_PATH_IMAGE026
representing the edge weight.
The edge weights in the logic diagram are calculated by the following formula:
Figure 908003DEST_PATH_IMAGE027
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE028
is a word in a logic diagrama、bThe edge weight of the edge in between,
Figure 211946DEST_PATH_IMAGE029
representing wordsa、bThe probability of co-occurrence of each other,
Figure DEST_PATH_IMAGE030
representing wordsaThe probability of occurrence in the corpus is,
Figure 487069DEST_PATH_IMAGE031
representing wordsbProbability of occurrence in the corpus.
The edge weights in the semantic graph are calculated by the following formula:
Figure DEST_PATH_IMAGE032
wherein, the first and the second end of the pipe are connected with each other,
Figure 192857DEST_PATH_IMAGE033
representing words in a semantic grapha、bThe edge weight of the edge in between,
Figure DEST_PATH_IMAGE034
representing wordsa、bThe number of the sliding windows with semantic relation,
Figure 590340DEST_PATH_IMAGE035
representing words
Figure DEST_PATH_IMAGE036
The number of simultaneously occurring sliding windows.
The edge weights in the syntax diagrams are calculated by the following formula:
Figure 880114DEST_PATH_IMAGE010
wherein, the first and the second end of the pipe are connected with each other,
Figure 958928DEST_PATH_IMAGE011
representing words in a syntactic graphabThe edge weight of the edge in between,
Figure 456906DEST_PATH_IMAGE012
representing wordsabThe number of sliding windows with syntactic relations,
Figure 290870DEST_PATH_IMAGE013
representing wordsabThe number of simultaneously occurring sliding windows.
The three graphs are respectively updated with the intra-graph information, taking the process of updating the target node in a single graph as an example (as shown in fig. 3, the graph in the figure is updated with the target node in the single graphaIs a target node, andaall connected by solid lines are its surrounding nodes); the target node updating process comprises two steps: and (4) collecting and polymerizing.
First, for any target node in any graph, information is collected from the surrounding nodes of each target node in the graph by the following formula:
Figure 241508DEST_PATH_IMAGE014
wherein the content of the first and second substances,mnwhich is indicative of the information that was collected,maxthe representation takes the maximum value of each dimension in the surrounding node information,
Figure 124013DEST_PATH_IMAGE015
indicating connection to target nodepThe number of the nodes is one,e c representing wordscThe weight between the node and the target node,
Figure 538814DEST_PATH_IMAGE016
representing wordscThe word vector of (2);
then the collected information is aggregated with the target node by the following formula:
Figure 481362DEST_PATH_IMAGE017
wherein the content of the first and second substances,
Figure 919297DEST_PATH_IMAGE018
representing wordsaThe aggregated word vector is then used to generate a new word vector,bindicating the extent to which the information is to be retained,
Figure 402231DEST_PATH_IMAGE019
the three graphs finally obtain the updating results as follows:
Figure 874801DEST_PATH_IMAGE037
in order to enable the obtained text to contain richer information, the updated results of the three pictures are subjected to inter-picture fusion to obtain a general picture containing logic, semantic and syntactic relations among the texts:
Figure DEST_PATH_IMAGE038
and performing graph pooling operation on the general graph to obtain a final representation of the text:
Figure 552032DEST_PATH_IMAGE039
the final representation of the text is input to a classifier:
Figure DEST_PATH_IMAGE040
output of the classifierpThe text is judged whether the text contains the secret information or not by the following method, wherein the value is a numerical value between 0 and 1: we set a threshold for it asηWhen is coming into contact with
Figure 477263DEST_PATH_IMAGE041
When the text is considered as containing the ciphertext, when
Figure DEST_PATH_IMAGE042
The text is considered to be normal text.
The expression for the loss function is:
Figure 763887DEST_PATH_IMAGE043
wherein the content of the first and second substances,y i a prediction tag that represents a sample of the sample,p i a prediction tag that represents a sample is provided,Nis the number of samples.
After the multi-graph neural network is trained, inputting the text to be analyzed into the multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a dense text.
The Text steganalysis method provided by the embodiment of the invention can be represented by a flow chart shown in fig. 2, the Text to be analyzed is input into a pre-trained multi-graph neural network, the result of the pooling is input into a classifier to finally obtain the output of the multi-graph neural network after intra-graph information updating, inter-graph information fusion (namely the result obtained after the updating of the three graphs is subjected to inter-graph fusion), and pooling, if the network output is smaller than a preset threshold value, the Text to be analyzed is a normal Text, otherwise, the Text to be analyzed is a dense Text, and steganalysis of the Text to be analyzed is completed.
Example 2
The embodiment of the invention provides a text steganalysis system, which comprises:
a text acquisition module: the method comprises the steps of obtaining a text to be analyzed;
the text steganalysis module: the method comprises the steps that a text to be analyzed is input into a pre-trained multi-graph neural network to obtain network output, if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and if not, the text to be analyzed is a steganographic text;
the text steganalysis module comprises a network training unit and is used for training the multi-graph neural network by the following method:
acquiring a training sample set, and converting texts in the training sample set into word vectors;
inputting the word vectors into a composition module during each training, generating three graphs comprising a logic graph, a linguistic intention and a syntactic graph, and updating the information in the three graphs according to the information of each target node in the three graphs and the nodes around the target node;
carrying out inter-graph fusion on the three updated graphs to obtain a total graph;
pooling the general graph to obtain a final representation of the text;
inputting the final representation of the text into a classifier to obtain classifier output;
and updating the multi-graph neural network by taking the cross entropy function as a loss function according to the output of the classifier, and repeatedly performing iterative training until the texts in the training sample set are used completely to obtain the trained multi-graph neural network.
Example 3
The embodiment of the invention provides a text steganalysis device, which comprises a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of:
acquiring a text to be analyzed;
inputting a text to be analyzed into a pre-trained multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a steganographic text;
the multi-graph neural network is trained by:
acquiring a training sample set, and converting texts in the training sample set into word vectors;
inputting the word vectors into a composition module during each training, generating three graphs comprising a logic graph, a linguistic intention and a syntactic graph, and updating the information in the three graphs according to the information of each target node in the three graphs and the nodes around the target node;
carrying out inter-graph fusion on the three updated graphs to obtain a total graph;
pooling the general graph to obtain a final representation of the text;
inputting the final representation of the text into a classifier to obtain classifier output;
and updating the multi-graph neural network by taking the cross entropy function as a loss function according to the output of the classifier, and repeatedly performing iterative training until the texts in the training sample set are used up to obtain the trained multi-graph neural network.
Example 4
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following steps of a method:
acquiring a text to be analyzed;
inputting a text to be analyzed into a pre-trained multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a steganographic text;
the multi-graph neural network is trained by:
acquiring a training sample set, and converting texts in the training sample set into word vectors;
inputting the word vectors into a composition module during each training, generating three graphs comprising a logic graph, a language intention and a syntactic graph, and updating the graph information of the three graphs according to the information of each target node in the three graphs and the nodes around the target node;
carrying out inter-graph fusion on the three updated graphs to obtain a total graph;
pooling the general graph to obtain a final representation of the text;
inputting the final representation of the text into a classifier to obtain classifier output;
and updating the multi-graph neural network by taking the cross entropy function as a loss function according to the output of the classifier, and repeatedly performing iterative training until the texts in the training sample set are used up to obtain the trained multi-graph neural network.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims (10)

1. A method for steganalysis of text comprising:
acquiring a text to be analyzed;
inputting a text to be analyzed into a pre-trained multi-graph neural network to obtain network output, wherein if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a steganographic text;
the multi-graph neural network is trained by:
acquiring a training sample set, and converting texts in the training sample set into word vectors;
inputting the word vectors into a composition module during each training, generating three graphs comprising a logic graph, a linguistic intention and a syntactic graph, and updating the information in the three graphs according to the information of each target node in the three graphs and the nodes around the target node;
carrying out inter-graph fusion on the three updated graphs to obtain a total graph;
pooling the general graph to obtain a final representation of the text;
inputting the final representation of the text into a classifier to obtain classifier output;
and updating the multi-graph neural network by taking the cross entropy function as a loss function according to the output of the classifier, and repeatedly performing iterative training until the texts in the training sample set are used completely to obtain the trained multi-graph neural network.
2. The method according to claim 1, wherein the training sample set comprises a steganographic sample set and a normal sample set.
3. The method of claim 1, wherein in the step of generating the logic diagram, the edge weight in the logic diagram is calculated by the following formula:
Figure 930914DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 215265DEST_PATH_IMAGE002
is a word in a logic diagramabThe edge weight of the edge in between,
Figure 690109DEST_PATH_IMAGE003
representing wordsabThe probability of co-occurrence of each other,
Figure 718107DEST_PATH_IMAGE004
representing wordsaThe probability of occurrence in the corpus is,
Figure 899690DEST_PATH_IMAGE005
representing wordsbProbability of occurrence in the corpus.
4. The method of claim 1, wherein the edge weights in the semantic graph are calculated by the following formula during the generation of the semantic meaning:
Figure 653145DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 552968DEST_PATH_IMAGE007
representing words in a semantic graphabThe edge weight of the edge in between,
Figure 119078DEST_PATH_IMAGE008
representing wordsabThe number of the sliding windows with semantic relation,
Figure 483063DEST_PATH_IMAGE009
representing wordsabThe number of simultaneously occurring sliding windows.
5. The method of claim 1, wherein in the step of generating the syntax map, the edge weight in the syntax map is calculated by the following formula:
Figure 843638DEST_PATH_IMAGE010
wherein, the first and the second end of the pipe are connected with each other,
Figure 293073DEST_PATH_IMAGE011
representing words in a syntactic graphabThe edge weight of the edge in between,
Figure 662875DEST_PATH_IMAGE012
representing wordsabThe number of sliding windows with syntactic relations,
Figure 553471DEST_PATH_IMAGE013
representing wordsabThe number of simultaneously occurring sliding windows.
6. The method of claim 1, wherein the intra-graph information updating of the logic diagram, the linguistic intent and the syntactic diagram comprises:
for any target node in any graph, information is collected from the surrounding nodes of each target node in the graph by:
Figure 412842DEST_PATH_IMAGE014
wherein the content of the first and second substances,mnwhich is indicative of the information that was collected,maxthe representation takes the maximum value of each dimension in the surrounding node information,
Figure 21678DEST_PATH_IMAGE015
indicating connection to target nodepThe number of the nodes is equal to the number of the nodes,e c representing wordscThe weight between the node and the target node,
Figure 195170DEST_PATH_IMAGE017
representing wordscThe word vector of (2);
aggregating the collected information with the target node itself by:
Figure 766704DEST_PATH_IMAGE018
wherein, the first and the second end of the pipe are connected with each other,
Figure 734660DEST_PATH_IMAGE020
representing wordsaThe aggregated word vector is then used to generate a word vector,bindicating the extent to which the information is to be retained,
Figure 893109DEST_PATH_IMAGE021
7. the method of claim 1, wherein the expression of the loss function is:
Figure 604713DEST_PATH_IMAGE022
wherein, the first and the second end of the pipe are connected with each other,y i a prediction tag that represents a sample is provided,p i a prediction tag that represents a sample is provided,Nis the number of samples.
8. A text steganalysis system comprising:
a text acquisition module: the method comprises the steps of obtaining a text to be analyzed;
the text steganalysis module: the system comprises a text to be analyzed, a pre-trained multi-graph neural network and a network output device, wherein the text to be analyzed is input into the pre-trained multi-graph neural network to obtain the network output, if the network output is smaller than a preset threshold value, the text to be analyzed is a normal text, and otherwise, the text to be analyzed is a steganographic text;
the text steganalysis module comprises a network training unit and is used for training the multi-graph neural network by the following method:
acquiring a training sample set, and converting texts in the training sample set into word vectors;
inputting the word vector into a composition module during each training, generating three graphs comprising a logic graph, a language intention and a syntactic graph, and updating the information in the three graphs according to the information of each target node in the three graphs and the nodes around the target node;
carrying out inter-graph fusion on the three updated graphs to obtain a total graph;
performing graph pooling on the general graph to obtain a final representation of a text;
inputting the final representation of the text into a classifier to obtain classifier output;
and updating the multi-graph neural network by taking the cross entropy function as a loss function according to the output of the classifier, and repeatedly performing iterative training until the texts in the training sample set are used up to obtain the trained multi-graph neural network.
9. A text steganalysis device is characterized by comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate according to the instructions to perform the steps of the method according to any one of claims 1 to 7.
10. Computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of a method according to any one of claims 1 to 7.
CN202211068809.9A 2022-09-02 2022-09-02 Text steganalysis method, system, device and storage medium Pending CN115169293A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211068809.9A CN115169293A (en) 2022-09-02 2022-09-02 Text steganalysis method, system, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211068809.9A CN115169293A (en) 2022-09-02 2022-09-02 Text steganalysis method, system, device and storage medium

Publications (1)

Publication Number Publication Date
CN115169293A true CN115169293A (en) 2022-10-11

Family

ID=83482220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211068809.9A Pending CN115169293A (en) 2022-09-02 2022-09-02 Text steganalysis method, system, device and storage medium

Country Status (1)

Country Link
CN (1) CN115169293A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952528A (en) * 2023-03-14 2023-04-11 南京信息工程大学 Multi-scale combined text steganography method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110318A (en) * 2019-01-22 2019-08-09 清华大学 Text Stego-detection method and system based on Recognition with Recurrent Neural Network
CN114048314A (en) * 2021-11-11 2022-02-15 长沙理工大学 Natural language steganalysis method
CN114528374A (en) * 2022-01-19 2022-05-24 浙江工业大学 Movie comment emotion classification method and device based on graph neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110110318A (en) * 2019-01-22 2019-08-09 清华大学 Text Stego-detection method and system based on Recognition with Recurrent Neural Network
CN114048314A (en) * 2021-11-11 2022-02-15 长沙理工大学 Natural language steganalysis method
CN114528374A (en) * 2022-01-19 2022-05-24 浙江工业大学 Movie comment emotion classification method and device based on graph neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952528A (en) * 2023-03-14 2023-04-11 南京信息工程大学 Multi-scale combined text steganography method and system

Similar Documents

Publication Publication Date Title
CN111831790B (en) False news identification method based on low threshold integration and text content matching
CN110781668B (en) Text information type identification method and device
CN111737511B (en) Image description method based on self-adaptive local concept embedding
CN110956037B (en) Multimedia content repeated judgment method and device
CN111078876A (en) Short text classification method and system based on multi-model integration
CN113032001B (en) Intelligent contract classification method and device
CN110502742A (en) A kind of complexity entity abstracting method, device, medium and system
CN110909224A (en) Sensitive data automatic classification and identification method and system based on artificial intelligence
CN107145568A (en) A kind of quick media event clustering system and method
CN115037543A (en) Abnormal network flow detection method based on bidirectional time convolution neural network
CN116150651A (en) AI-based depth synthesis detection method and system
CN112492606A (en) Classification and identification method and device for spam messages, computer equipment and storage medium
CN115169293A (en) Text steganalysis method, system, device and storage medium
CN114254077A (en) Method for evaluating integrity of manuscript based on natural language
CN110929506A (en) Junk information detection method, device and equipment and readable storage medium
CN115314268B (en) Malicious encryption traffic detection method and system based on traffic fingerprint and behavior
CN116881408A (en) Visual question-answering fraud prevention method and system based on OCR and NLP
CN111464687A (en) Strange call request processing method and device
CN114881012A (en) Article title and content intelligent rewriting system and method based on natural language processing
CN115438645A (en) Text data enhancement method and system for sequence labeling task
CN116662557A (en) Entity relation extraction method and device in network security field
CN113626603A (en) Text classification method and device
CN112632229A (en) Text clustering method and device
CN112035670A (en) Multi-modal rumor detection method based on image emotional tendency
CN117786427B (en) Vehicle type main data matching method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20221011

RJ01 Rejection of invention patent application after publication