CN108509775B - Malicious PNG image identification method based on machine learning - Google Patents

Malicious PNG image identification method based on machine learning Download PDF

Info

Publication number
CN108509775B
CN108509775B CN201810128524.7A CN201810128524A CN108509775B CN 108509775 B CN108509775 B CN 108509775B CN 201810128524 A CN201810128524 A CN 201810128524A CN 108509775 B CN108509775 B CN 108509775B
Authority
CN
China
Prior art keywords
image
png
picture
steganography
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810128524.7A
Other languages
Chinese (zh)
Other versions
CN108509775A (en
Inventor
杨悉瑜
翁健
魏林锋
杨悉琪
潘冰
张悦
李明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN201810128524.7A priority Critical patent/CN108509775B/en
Publication of CN108509775A publication Critical patent/CN108509775A/en
Application granted granted Critical
Publication of CN108509775B publication Critical patent/CN108509775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/031Protect user input by software means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0065Extraction of an embedded watermark; Reliable detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a malicious PNG image identification method based on machine learning, which belongs to the technical field of network space security and comprises the steps of firstly establishing a PNG image feature library and a digital steganography identification model; the method comprises the steps that a request for uploading a picture file is examined at a server side, feature matching identification is carried out according to a PNG image feature library, whether the PNG picture is legal or not is preliminarily identified, if the PNG picture is legal, a digital steganography identification model is called to mine whether the PNG picture has information hiding or not, and if the PNG picture is illegal or has information hiding, uploading is refused; the method comprises the steps of monitoring PNG picture format file data in a webpage transmission process at a client, carrying out feature matching identification according to a PNG image feature library, calling a digital steganography identification model to find whether information hiding exists in a PNG picture if the PNG picture format file data are legal, and forbidding access to picture resources if the PNG picture format file data are illegal or the information hiding exists. The invention can prohibit the uploading of illegal pictures at the server side and prohibit the access to illegal pictures at the client side, thereby strengthening the network security.

Description

Malicious PNG image identification method based on machine learning
Technical Field
The invention belongs to the technical field of network space security, and particularly relates to a malicious PNG image identification method based on machine learning.
Background
With the rapid popularization and application of networks, the rapid development of the digital technology and the security problem of network space, people gradually come into the visual field of people and pay more and more attention to the network.
On one hand, the browser is used as a main medium for people to obtain internet information, and the safety problem is not easy to be overlooked. In recent years, due to reasons such as stricter JavaScript examination, more and more web pages are implanted with web page advertisements of different shapes and colors, which induce users to click and access malicious links on a light basis, and bypass computers and network defense systems by attaching malicious software and malicious Dynamic Link library files (DLLs) to web page pictures on a heavy basis, thereby directly causing adverse effects such as virus infection and information leakage on personal computers and mobile devices of users.
On the other hand, websites are unlawfully controlled, and a large amount of data leakage events are layered, and as one of the frequent attack techniques, malicious codes such as a sentence Trojan horse are uploaded through a file uploading function to further control a server, the harm is not a little great. Detection and bypassing of uploaded malicious code is a defense and attack that never stops for both gaming parties. In recent years, an attacker starts to use an uploaded legal PNG picture to avoid detection of an intrusion detection system, malicious codes are hidden in a forged legal PNG picture through digital steganography technologies such as coding and LSB steganography, and once successful uploading is completed, the attacker can remotely control a website server by accessing and analyzing an elaborately constructed attack load hidden in the PNG picture, so that more destructive attempts and operation behaviors are performed, such as stealing website user privacy data, and the remote control website server serving as a puppet engine to launch denial of access attack (DoS) on other servers.
At the end, whether on a client such as a browser or a server deploying a website server, a problem to be solved urgently is to audit pictures in a webpage to prevent hidden malicious behaviors. The PNG format picture is widely used in the web page due to its characteristics of small size, lossless compression, optimized network transmission display, etc., and the PNG picture is also a good information hiding carrier and should be an object of focused research.
If the server side processes the picture file uploading request of the user, the legal picture uploading request can be efficiently and accurately identified, and whether the picture uses a digital steganography technology and contains a malicious attack load or not is analyzed; the client can filter the picture resources in the webpage when accessing the webpage resources, and forbids the picture resources suspected to contain the malicious program files from being downloaded by self, so that the malicious behaviors can be restrained from occurring from the source.
To this end, we introduce machine learning techniques and digital steganography techniques to solve this problem.
The application of machine learning technology is spread in various fields of artificial intelligence, and is a core technology of artificial intelligence. At present, the machine learning technology also plays a great role in the network space security field due to the characteristics of autonomous learning, efficient learning and accurate learning.
The implementation of machine learning has an inseparable relationship with three components: an environment, a learning portion, and an execution portion. The environment provides some information to the learning part of the system, the learning part uses the information to modify the knowledge base to improve the efficiency of the system execution part to complete the task, the execution part completes the task according to the knowledge base, and simultaneously feeds back the obtained information to the learning part.
The following describes in detail three factors that influence the design of the machine learning system, taking the identification of PNG images as an example:
information provided by the environment to the system: the knowledge base stores general principles that direct the execution of part of the actions, but the environment provides a wide variety of information to the system. If the quality of the information is high, the difference from the general principle is small, and the learning part is easy to process. If the system is provided with the disordered specific information for guiding the execution of specific actions, the system needs to delete unnecessary details after obtaining enough data, summarize and popularize the unnecessary details, form a general principle of guiding the actions and put the general principle into a knowledge base, so that the task of learning part is relatively heavy and the design is relatively difficult.
A knowledge base: the knowledge is expressed in various forms such as a head mark of the PNG image, a storage manner of the PNG image, an end mark of the PNG image, and the like. These representations each have their own characteristics, and the following 4 aspects are satisfied when selecting a representation:
(1) the expression ability is strong;
(2) the reasoning is easy;
(3) the knowledge base is easy to modify;
(4) the knowledge representation is easily scalable.
An execution section: is the core of the whole system, because the action of the execution part is the action of the learning part aiming for improvement. In the process of identifying the PNG image, the content of the learning part is continuously adjusted according to the identification result so as to improve the accuracy in execution.
Digital steganography is a security technique that embeds secret information into a digital medium without compromising the quality of its carrier. By processing the secret information through the digital steganography technology, the third party can not perceive the existence of the secret information and can not know the content of the secret information. Steganographic carriers include images, audio, video, etc. In recent years, digital steganography has become the focus of information security technology by virtue of the characteristics of changeability, strong secrecy and the like. Because each Web site depends on various multimedia resources, such as audio, video, images and the like, an attacker can hide attack behaviors in the multimedia by applying a digital steganography technology to malicious software and malicious attack loads and can easily bypass anti-malicious software detection, thereby causing greater potential threats.
Taking an image of a multimedia resource as an example, the classic digital image steganography technology comprises two aspects, namely steganography based on a space domain and steganography based on a transformation domain. The spatial domain-Based steganography mainly includes Least Significant Bit (LSB) steganography, and the Transform domain-Based steganography mainly relates to Discrete Cosine Transform (DCT) coefficients of an image, including Jsteg steganography, F5 steganography, outgauge steganography, Model-Based (MB) steganography, and the like.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a malicious PNG image identification method based on machine learning, which adopts a PNG image feature library to carry out feature matching identification, and judges whether the PNG image has hidden information or not by means of a digital steganography identification model, so that uploading of illegal images is prohibited at a server side, access to the illegal images is prohibited at a client side, and network security is enhanced.
The invention is realized by adopting the following technical scheme: a malicious PNG image identification method based on machine learning comprises the following steps:
step one, establishing a PNG image feature library and a digital steganography recognition model through machine learning;
step two, checking all requests for uploading picture files at the server, performing feature matching identification on the PNG picture by contrasting the PNG picture feature library established in the step one, and rejecting the uploading request if an illegal PNG picture format is found; otherwise, the PNG picture is subjected to primary identification, and the step three is carried out;
step three, for the PNG picture format file passing the primary recognition, calling the digital steganography recognition model established in the step one, mining whether the PNG picture has information hiding, and if so, rejecting the uploading request; if not, allowing the uploading request;
monitoring PNG picture format file data in the webpage transmission process at the client, performing feature matching identification on the PNG picture by contrasting the PNG picture feature library established in the step one, and if an illegal PNG picture format is found, forbidding to access the picture resource; otherwise, entering the step five;
and step five, calling the digital steganography recognition model established in the step one, mining whether the PNG picture has information hiding, regarding the picture with the information hiding, considering that malicious information is possibly hidden, and forbidding to access the picture resource.
Preferably, the PNG image feature library established in the step one is as follows: firstly, providing batch PNG images as training set data to be imported into a machine learning system; secondly, a PNG image feature recognition library is established, and the PNG image feature recognition library comprises the following feature information: (1) PNG header feature; (2) PNG end flag IEND block; (3) an IHDR block recording PNG image information; (4) an IDAT block storing actual image data; (5) storing the image redundancy information block; and finally, selecting a support vector machine model for feature learning aiming at the recognition library to complete the recognition and classification of the target.
Preferably, the digital steganography recognition model in the step one is established by combining shallow learning and deep learning: on one hand, a feature library is established based on the steganographic features of a classical steganographic algorithm for feature learning; on the other hand, based on the characteristic that the quality of the image after steganography is liable to change slightly, filtering pretreatment is carried out on the PNG image containing steganography information and the PNG image without steganography information by using a high-pass filter respectively, the image display characteristic is enhanced, the obtained residual image is used as a training set, then a convolutional neural network model is selected for transfer learning, and finally the probability that the digital steganography exists in the image is output.
Preferably, the characteristic library is established based on the steganographic characteristics of the classical steganographic algorithm for characteristic learning, and the method is characterized in that an RS analysis algorithm is selected for supervised learning of PNG images:
firstly, dividing an image input into a model to be trained into a plurality of image blocks with the same size, and scanning and arranging each image block into a pixel vector G ═ x1,x2,...,xnAnd calculating the spatial correlation of each image block using the following formula:
Figure BDA0001574198460000051
wherein xiThe gray value of each pixel is represented, and the smaller the f value is, the smaller the change of the gray value between adjacent pixel points is, and the stronger the spatial correlation of the image block is;
then, a non-negative inversion operation is applied to randomly extracted part of pixels of each image block, wherein an inversion function is defined as follows:
note F1As a function of the pixel values 2i and 2i +1, i.e. as
Figure BDA0001574198460000052
Note F-1As a function of the mutual change of the pixel values 2i-1 and 2i, i.e.
Figure BDA0001574198460000053
Note F0The pixel values are in a constant relation;
calculating the ratio R of image blocks whose spatial correlation increasesMOr reduced proportion S of image blocksM
Similarly, a non-positive inversion operation is applied to randomly extracted partial pixels of each image block, and the proportion R of the image block with increased spatial correlation is calculated-MOr reduced proportion S of image blocks-M
If the chaos degree is increased by applying non-positive inversion more than the chaos degree by applying non-negative inversion, setting a label for existence of LSB steganography characteristics for the PNG image; otherwise, setting the label as having no LSB steganography characteristic, and outputting;
and finally, forming training data by the input object and the expected output and establishing a learning mode, and estimating whether the LSB steganography exists in the new PNG image according to the learning mode.
Compared with the prior art, the invention has the following beneficial effects: the method introduces a machine learning technology and a digital steganography technology, establishes a PNG image feature library for feature matching identification, preliminarily judges whether the PNG image has the hidden of malicious information, and further judges whether the PNG image has the hidden information by means of a digital steganography identification model, so that uploading of illegal images is forbidden at a service end, access to the illegal images is forbidden at a client end, and network security is enhanced. The PNG image is supervised-learnt by selecting an RS analysis algorithm in the digital steganography recognition model, whether the LSB steganography characteristic exists in the image is judged by judging whether the chaos degree of the image is equivalent through the positive and negative overturning operation of an overturning function, then the deep learning and judgment are carried out on the probability of the digital steganography existing in the image by means of a convolutional neural network, the accuracy is high, the design of the whole model is simple, and the realization is easy.
Drawings
Fig. 1 is a flowchart of a malicious PNG image identification method based on machine learning according to an embodiment of the present invention;
fig. 2 is a frame diagram of a digital steganography recognition model in a malicious PNG image recognition method based on machine learning according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in detail below with reference to examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention is realized based on two parts, namely a server side and a client side. When the technical scheme of the invention is applied to the server, if each request for uploading the picture file is recorded and sequentially enters the PNG characteristic recognition library and the digital steganography recognition model as test set data to be matched, the behavior of a hacker for controlling the server by uploading the attack load can be effectively inhibited. When the technical scheme of the invention is applied to the client, if each webpage resource containing the picture is recorded and sequentially enters the PNG feature recognition library and the digital steganography recognition model as test set data to be matched, the behavior of the user equipment controlled by malicious behavior can be effectively restrained from the source.
Firstly, establishing a PNG image feature recognition library through a large number of PNG image recognition training; a digital steganography recognition model is established by combining shallow learning and deep learning for a PNG image subjected to steganography by using various digital steganography technologies. In the server environment, whether the file in the client file uploading process is a PNG image is identified according to a PNG image feature identification library, if the file is confirmed to be the PNG image, the file is preliminarily determined to be legal and is subjected to next detection, and if the file is confirmed not to meet the PNG image format requirement, the file is considered to be illegal to upload, and the uploading request is rejected. After the PNG image is preliminarily determined to be legal, a digital steganography recognition model is further used for detecting whether the PNG image has information hiding, if yes, the file is considered to be a suspected malicious file, and a client side uploading request is rejected; and if the request does not exist, the request is considered to have no malicious behavior, and the request is allowed to be uploaded. In a client environment, a browser real-time monitoring plug-in or other real-time monitoring tools are used for monitoring data of webpage pictures (particularly named PNG images) browsed by a user in real time, a PNG image feature recognition library is used for carrying out image feature recognition, and if abnormal images are found (namely the result is the PNG images which do not meet the standard after machine recognition), the user is forbidden to access the image resources; if no image abnormality is found, further detecting whether the image has information hiding by using a digital steganography recognition model, and if the image has information hiding, forbidding a user to access the image resource; if not, the user can normally access the picture resource. As shown in fig. 1, the method specifically comprises the following steps:
step one, a PNG image feature library and a digital steganography recognition model are established through machine learning.
For the establishment of the PNG image feature library, the uniformity of the PNG image format is considered, so that only shallow learning is adopted: firstly, batch PNG images are provided as training set data to be imported into a machine learning system. Secondly, a PNG image feature recognition library is established, and the PNG image feature recognition library comprises the following feature information: (1) PNG header feature; (2) PNG end flag IEND block; (3) an IHDR block recording PNG image information; (4) an IDAT block storing actual image data; (5) store the image redundancy information block (e.g., the tExt block), etc. And finally, performing feature learning aiming at the manually designed recognition library, and considering that the learning aims at completing recognition and classification of the target, selecting a Support Vector Machine (SVM) for supervised learning.
For the establishment of the digital steganography recognition model, in consideration of the characteristics that in addition to some classical steganography algorithms, the steganography algorithm based on the transformation of the classical steganography algorithm or the independent design is difficult to detect, the method adopts a mode of combining shallow learning and deep learning:
on one hand, a feature library is established based on the hidden writing features of a classic hidden writing algorithm for feature learning, wherein the classic hidden writing algorithm refers to a hidden writing algorithm in a space domain, such as Least Significant Bit (LSB) hidden writing. Considering that the RS (regular and Singular groups method) analysis algorithm detects the secret information based on the change of smoothness of the image before and after steganography, the random LSB steganography algorithm (i.e. the secret information selects the least significant bits of the image in a random order for steganography) is very robust, so the RS analysis algorithm is selected to perform Supervised learning (Supervised learning) on the PNG image, which is as follows:
firstly, dividing an image input into a model to be trained into a plurality of image blocks with the same size, and scanning and arranging each image block into a pixel vector G (x) in a Zigzag mode1,x2,...,xnAnd calculating the spatial correlation of each image block using the following formula:
Figure BDA0001574198460000081
wherein xiThe gray value of each pixel is represented, and the smaller the f value is, the smaller the gray value change between adjacent pixel points is, and the stronger the spatial correlation of the image block is.
Then applying a non-negative inversion (F) to randomly decimated partial pixels of each image block1And F0) Operation, wherein the roll-over function is defined as follows:
note F1As a function of the pixel values 2i and 2i +1, i.e. as
Figure BDA0001574198460000084
Note F-1As a function of the mutual change of the pixel values 2i-1 and 2i, i.e.
Figure BDA0001574198460000085
Note F0Is a pixel value invariant relationship.
Calculating the proportion of image blocks whose spatial correlation increases(as R)M) Or reduced proportion of image blocks (denoted as S)M):
Figure BDA0001574198460000082
Figure BDA0001574198460000083
(RM+SM≤1)
Also, a non-positive inversion (F) is applied to randomly decimating a portion of the pixels for each image block-1And F0) Operation of calculating the proportion (denoted R) of image blocks whose spatial correlation increases-M) Or reduced proportion of image blocks (denoted as S)-M):
Figure BDA0001574198460000091
Figure BDA0001574198460000092
(R-M+S-M≤1)
Statistically, if the image is not subjected to LSB steganography, then performing non-negative inversion or non-positive inversion on the image would destroy the spatial correlation of the image blocks to the same extent, i.e. increase the chaos of the image blocks equally, and there is R at this timeM≈R-M,SM≈S-MAnd R isM>SM,R-M>S-M
Therefore, if the increase of the degree of disorder caused by applying the non-positive inversion operation to the image is larger than the increase of the degree of disorder caused by applying the non-negative inversion operation, the PNG image is considered to have the LSB steganography very likely, and the label is set to have the LSB steganography characteristic; otherwise, setting the label as the LSB steganography characteristic does not exist, and outputting. Finally, the input object (PNG image) and the expected output (whether LSB steganography characteristics exist) form training data, a Learning mode (Learning mode) is established, and whether LSB steganography exists in the new PNG image or not is presumed according to the Learning mode.
On the other hand, based on the characteristic that the quality of the image after steganography is liable to have slight change, firstly, respectively using a high-pass filter to carry out filtering pretreatment on the PNG image containing steganography information and the PNG image not containing steganography information, enhancing the image display characteristic, and taking the obtained residual image as a training set; considering the superiority of the Convolutional Neural Network model in spatial mapping, which is suitable for processing images, and the migratory learning helps to reduce the requirement for constructing Neural Network data in the case of insufficient data amount, a Convolutional Neural Network (CNN) model based on improvement of Lionel bridge et al is selected for the migratory learning, and the main idea is as follows:
the convolutional neural network model pre-trained by Lionel Pibre and the like is used as a feature extraction operator, the last layer of the convolutional neural network is changed into a classifier of the convolutional neural network, and then the weights of other layers are fixed and the whole convolutional neural network is trained.
Referring to fig. 2, the convolutional neural network model structure is as follows:
inputting: all pixel point values of the processed residual image;
the characteristic structural layer: using a pre-trained model as a feature extractor;
a classifier: including a Connected Fully Connected Layer (full Connected Layer) and a classification function (softmax);
and (3) outputting: the probability of digital steganography of the image; when the output probability is greater than 0.8, the image is considered to have digital steganography.
The classifier is constructed by using an Image Quality Metrics (IQM) based blind detection method proposed by Avcibas, and specifically comprises the following steps:
1. feature vectors are selected by defining various measures of image quality, where Analysis of Variance (ANOVA) techniques are used in order to extract more vivid features; taking the Minkowsky feature as an example, the norm of the dissimilarity of two images can be represented by the Minkowsky average of the pixel differences taken spatially and then in chromaticity (i.e., over the entire band):
Figure BDA0001574198460000101
where γ is 1 or MγDenotes the absolute average error, when γ is 2, MγRepresenting mean square error, Ck(i, j) represents the multispectral components of the normal image at pixel location i, j and pixel k,
Figure BDA0001574198460000102
representing the multispectral components of the steganographic image at pixel locations i, j and pixel k, with N representing the total number of image pixels;
2. the selected IQM (Image Quality Metrics) forms a multi-dimensional feature space in which normal images are more distinguishable from stego images;
3. after a proper feature set is selected, a multiple linear regression model is established on a large amount of experimental data, and a classifier for distinguishing normal images from steganographic images is established on the basis of the regression model.
Step two, checking all requests for uploading picture files at the server, firstly carrying out decoding pretreatment on data, then carrying out feature matching identification on the PNG picture by comparing with the PNG picture feature library established in the step one, and if an illegal PNG picture format is found, rejecting the uploading request; otherwise, the PNG picture is subjected to primary identification, and the step three is carried out.
In this step, the request for uploading the picture file is examined, and the examination information includes the following: (1) file suffix name; (2) content style-type declared by message header of HTTP message; (3) whether the transmission content is encoded; (4) whether the transmission content is legitimate.
Step three, for the PNG picture format file passing the primary recognition, calling the digital steganography recognition model established in the step one, mining whether the PNG picture has information hiding, and if so, rejecting the uploading request; if not, the upload request is allowed.
Monitoring PNG picture format file data in the webpage transmission process in forms of real-time monitoring plug-in of a browser and the like at a client, preprocessing the data such as decoding, performing feature matching identification on the PNG picture by referring to the PNG picture feature library established in the step one, and forbidding to access the picture resource if an illegal PNG picture format is found; otherwise, go to step five.
The method comprises the steps that a client monitors webpage PNG image data, specifically, whether information hiding exists in the PNG image data or not is monitored, and the condition of malicious links with implicit inducibility of pictures is not considered.
And step five, a synchronization step three, calling the digital steganography recognition model established in the step one, mining whether the PNG picture has information hiding, regarding the picture with the information hiding, considering that malicious information is possibly hidden, and forbidding to access the picture resource.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (4)

1. A malicious PNG image identification method based on machine learning is characterized by comprising the following steps:
step one, establishing a PNG image feature library and a digital steganography recognition model through machine learning;
step two, checking all requests for uploading picture files at the server, performing feature matching identification on the PNG picture by contrasting the PNG picture feature library established in the step one, and rejecting the uploading request if an illegal PNG picture format is found; otherwise, the PNG picture is subjected to primary identification, and the step three is carried out;
step three, for the PNG picture format file passing the primary recognition, calling the digital steganography recognition model established in the step one, mining whether the PNG picture has information hiding, and if so, rejecting the uploading request; if not, allowing the uploading request;
monitoring PNG picture format file data in the webpage transmission process at the client, performing feature matching identification on the PNG picture by contrasting the PNG picture feature library established in the step one, and if an illegal PNG picture format is found, forbidding to access the picture resource; otherwise, entering the step five;
step five, calling the digital steganography recognition model established in the step one, mining whether the PNG picture has information hiding, regarding the picture with the information hiding, considering that malicious information is possibly hidden, and forbidding to access the picture resource;
in the second step, the request for uploading the picture file is examined, and the examination information comprises the following information: (1) file suffix name; (2) content style-type declared by message header of HTTP message; (3) whether the transmission content is encoded; (4) whether the transmission content is legitimate.
2. The method for identifying malicious PNG based on machine learning according to claim 1, wherein the PNG image feature library is created in step one by the following process: firstly, providing batch PNG images as training set data to be imported into a machine learning system; secondly, a PNG image feature recognition library is established, and the PNG image feature recognition library comprises the following feature information: (1) PNG header feature; (2) PNG end flag IEND block; (3) an IHDR block recording PNG image information; (4) an IDAT block storing actual image data; (5) storing the image redundancy information block; and finally, selecting a support vector machine model for feature learning aiming at the recognition library to complete the recognition and classification of the target.
3. The malicious PNG image recognition method based on machine learning according to claim 1, wherein the digital steganography recognition model of the first step is established by combining shallow learning and deep learning: on one hand, a feature library is established based on the steganographic features of a classical steganographic algorithm for feature learning; on the other hand, based on the characteristic that the quality of the image after steganography is liable to have slight change, filtering pretreatment is respectively carried out on the PNG image containing steganography information and the PNG image without steganography information by using a high-pass filter, the image display characteristic is enhanced, the obtained residual image is used as a training set, then a convolutional neural network model is selected for transfer learning, and finally the probability that the digital steganography exists in the image is output;
the structure of the convolutional neural network model comprises:
inputting: all pixel point values of the processed residual image;
the characteristic structural layer: using a pre-trained model as a feature extractor;
a classifier: the method comprises the steps of connecting a full connection layer and a classification function;
and (3) outputting: the probability of digital steganography of the image; when the output probability is greater than 0.8, the image is considered to have digital steganography;
the classifier is constructed using a blind detection method based on image quality metrics:
selecting a feature vector by defining a plurality of measures of image quality using an analysis of variance technique; the norm of the dissimilarity of the two images is represented by the Minkowsky average of the pixel differences taken spatially and then expressed in chroma:
Figure FDA0002443749670000031
where γ is 1 or MγDenotes the absolute average error, when γ is 2, MγRepresenting mean square error, Ck(i, j) represents the multispectral components of the normal image at pixel location i, j and pixel k,
Figure FDA0002443749670000032
representing the multispectral components of the steganographic image at pixel locations i, j and pixel k, with N representing the total number of image pixels;
the selected image quality metrics form a multi-dimensional feature space;
after a proper characteristic set is selected, a multiple linear regression model is established on a large amount of experimental data, and a classifier for distinguishing normal images from steganographic images is established on the basis of the regression model.
4. The machine learning-based malicious PNG image recognition method according to claim 3, wherein the characteristic library is established for characteristic learning based on the steganographic characteristics of the classical steganographic algorithm, and in order to select the RS analysis algorithm for supervised learning of the PNG image:
firstly, dividing an image input into a model to be trained into a plurality of image blocks with the same size, and scanning and arranging each image block into a pixel vector G ═ x1,x2,...,xnAnd calculating the spatial correlation of each image block using the following formula:
Figure FDA0002443749670000033
wherein xiThe gray value of each pixel is represented, and the smaller the f value is, the smaller the change of the gray value between adjacent pixel points is, and the stronger the spatial correlation of the image block is;
then, a non-negative inversion operation is applied to randomly extracted part of pixels of each image block, wherein an inversion function is defined as follows:
note F1As a function of the pixel values 2i and 2i +1, i.e. as
Figure FDA0002443749670000041
Note F-1As a function of the mutual change of the pixel values 2i-1 and 2i, i.e.
Figure FDA0002443749670000042
Note F0The pixel values are in a constant relation;
calculating the ratio R of image blocks whose spatial correlation increasesMOr reduced proportion S of image blocksM
Similarly, a non-positive inversion operation is applied to randomly extracted partial pixels of each image block, and the proportion R of the image block with increased spatial correlation is calculated-MOr reduced proportion S of image blocks-M
If the increase of the chaos degree caused by applying the non-positive flip operation to the image is larger than the increase of the chaos degree caused by applying the non-negative flip operation, setting a label as having LSB steganography characteristics to the PNG image; otherwise, setting the label as having no LSB steganography characteristic, and outputting;
and finally, forming training data by the input object and the expected output and establishing a learning mode, and estimating whether the LSB steganography exists in the new PNG image according to the learning mode.
CN201810128524.7A 2018-02-08 2018-02-08 Malicious PNG image identification method based on machine learning Active CN108509775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810128524.7A CN108509775B (en) 2018-02-08 2018-02-08 Malicious PNG image identification method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810128524.7A CN108509775B (en) 2018-02-08 2018-02-08 Malicious PNG image identification method based on machine learning

Publications (2)

Publication Number Publication Date
CN108509775A CN108509775A (en) 2018-09-07
CN108509775B true CN108509775B (en) 2020-11-13

Family

ID=63375310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810128524.7A Active CN108509775B (en) 2018-02-08 2018-02-08 Malicious PNG image identification method based on machine learning

Country Status (1)

Country Link
CN (1) CN108509775B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754015B (en) * 2019-01-02 2021-01-26 京东方科技集团股份有限公司 Neural networks for drawing multi-label recognition and related methods, media and devices
CN111488623A (en) * 2019-01-25 2020-08-04 深信服科技股份有限公司 Webpage tampering detection method and related device
CN109992967A (en) * 2019-03-12 2019-07-09 福建拓尔通软件有限公司 A kind of method and system for realizing automatic detection file security when file uploads
CN110309654A (en) * 2019-06-28 2019-10-08 四川长虹电器股份有限公司 The safety detection method and device that picture uploads
CN110995954B (en) * 2019-10-11 2022-10-04 中国平安财产保险股份有限公司 Method and device for detecting picture steganography, computer equipment and storage medium
CN110942034A (en) * 2019-11-28 2020-03-31 中国科学院自动化研究所 Method, system and device for detecting multi-type depth network generated image
GB2590917A (en) * 2020-01-05 2021-07-14 British Telecomm Steganographic malware identification
GB2590916A (en) * 2020-01-05 2021-07-14 British Telecomm Steganographic malware detection
CN112632475B (en) * 2020-12-30 2024-03-29 郑州轻工业大学 Picture copyright protection system and method based on national password and picture steganography
CN113111200B (en) * 2021-04-09 2024-05-24 百度在线网络技术(北京)有限公司 Method, device, electronic equipment and storage medium for auditing picture files
CN113112472B (en) * 2021-04-09 2023-08-29 百度在线网络技术(北京)有限公司 Image processing method and device
CN113806747B (en) * 2021-11-18 2022-02-25 浙江鹏信信息科技股份有限公司 Trojan horse picture detection method and system and computer readable storage medium
WO2023136775A2 (en) * 2021-12-17 2023-07-20 Grabtaxi Holdings Pte. Ltd. Method for filtering images and image hosting server
CN115296823B (en) * 2022-09-29 2023-02-03 佛山蚕成科技有限公司 Credible digital badge security authentication method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106874936A (en) * 2017-01-17 2017-06-20 腾讯科技(上海)有限公司 Image propagates monitoring method and device
CN107292315A (en) * 2016-04-11 2017-10-24 北京大学 Steganalysis method and hidden information analysis device based on multiple dimensioned LTP features

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4967488B2 (en) * 2006-07-11 2012-07-04 富士通株式会社 Code image processing method, code image processing apparatus, and code image processing program

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292315A (en) * 2016-04-11 2017-10-24 北京大学 Steganalysis method and hidden information analysis device based on multiple dimensioned LTP features
CN106874936A (en) * 2017-01-17 2017-06-20 腾讯科技(上海)有限公司 Image propagates monitoring method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JPEG Image Steganalysis Improvement Via Image-to-Image Variation Minimization;Chiew Kang Leng 等;《2008 International Conference on Advanced Computer Theory and Engineering》;20090106;全文 *
基于稀疏编码的图像隐写检测技术研究;李雨 等;《通信技术》;20170531;第50卷(第5期);全文 *

Also Published As

Publication number Publication date
CN108509775A (en) 2018-09-07

Similar Documents

Publication Publication Date Title
CN108509775B (en) Malicious PNG image identification method based on machine learning
Li et al. How to prove your model belongs to you: A blind-watermark based framework to protect intellectual property of DNN
Hassaballah et al. A novel image steganography method for industrial internet of things security
Benrhouma et al. Chaotic watermark for blind forgery detection in images
KR102093275B1 (en) Malicious code infection inducing information discrimination system, storage medium in which program is recorded and method
Ghai et al. A deep-learning-based image forgery detection framework for controlling the spread of misinformation
Liao et al. Content‐adaptive steganalysis for color images
Wang et al. HidingGAN: High capacity information hiding with generative adversarial network
Zhu et al. Fragile neural network watermarking with trigger image set
Cohen et al. ASSAF: Advanced and Slim StegAnalysis Detection Framework for JPEG images based on deep convolutional denoising autoencoder and Siamese networks
Khan et al. Digital forensics and cyber forensics investigation: security challenges, limitations, open issues, and future direction
CN108446543B (en) Mail processing method, system and mail proxy gateway
Megías et al. Architecture of a fake news detection system combining digital watermarking, signal processing, and machine learning
Chaganti et al. Stegomalware: A Systematic Survey of MalwareHiding and Detection in Images, Machine LearningModels and Research Challenges
Steinebach et al. Channel steganalysis
Bachrach et al. Image steganography and steganalysis
Ying et al. Learning to immunize images for tamper localization and self-recovery
Hao et al. Multimedia communication security in 5G/6G coverless steganography based on image text semantic association
Natarajan et al. Multilevel analysis to detect covert social botnet in multimedia social networks
Robinette et al. SUDS: Sanitizing Universal and Dependent Steganography.
Jabbar et al. Digital watermarking by utilizing the properties of self-organization map based on least significant bit and most significant bit
Steinebach et al. The need for steganalysis in image distribution channels
Krishnagopal et al. Image encryption and steganography using chaotic maps with a double key protection
Sinhal et al. A source and ownership identification framework for Mobile-based messenger applications
Jung et al. A holistic cyber-physical security protocol for authenticating the provenance and integrity of structural health monitoring imagery data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant