CN113360911A - Malicious code homologous analysis method and device, computer equipment and storage medium - Google Patents

Malicious code homologous analysis method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN113360911A
CN113360911A CN202110830366.1A CN202110830366A CN113360911A CN 113360911 A CN113360911 A CN 113360911A CN 202110830366 A CN202110830366 A CN 202110830366A CN 113360911 A CN113360911 A CN 113360911A
Authority
CN
China
Prior art keywords
malicious code
pixel value
homologous
value matrix
code image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110830366.1A
Other languages
Chinese (zh)
Inventor
黄娜
薛智慧
余小军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202110830366.1A priority Critical patent/CN113360911A/en
Publication of CN113360911A publication Critical patent/CN113360911A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Virology (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to the technical field of computer security, and provides a malicious code homologous analysis method and device, computer equipment and a computer readable storage medium. The method comprises the following steps: acquiring a malicious code image, wherein the malicious code image is acquired according to a malicious code file, the malicious code image comprises at least two malicious code families, and the corresponding family marks of the at least two malicious code families are different; inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis model comprises: at least one convolution structure, a global average pooling layer, and an output layer. By adding the global average pooling layer in the malicious code homologous analysis model and replacing the full connection layer with the global average pooling layer, the complexity and the calculated amount of a network structure of the malicious code homologous analysis model are reduced, and therefore the malicious code homologous analysis efficiency is improved.

Description

Malicious code homologous analysis method and device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of computer security technologies, and in particular, to a malicious code homology analysis method and apparatus, a computer device, and a computer-readable storage medium.
Background
The malicious code homologous analysis is to analyze the derived relevance among the malicious codes according to the internal characteristics and the external characteristics of the malicious codes and the generation and propagation rules of the malicious codes, so that an attack source or an attacker can be quickly tracked and positioned, and the problem of omission of malicious code detection software in the detection process is avoided.
In the prior art, malicious code files are subjected to imaging processing to obtain malicious code images, a spatial pyramid pooling layer is added in front of a full connection layer in a neural network model, and the neural network is trained by using the malicious code images, so that the problem that the sizes of input images of the neural network are necessarily the same is solved, and the homologous analysis of the malicious codes is realized.
However, the method in the prior art for performing malicious code homologous analysis has a large calculation amount and low malicious code homologous analysis efficiency.
Disclosure of Invention
In view of the above, it is necessary to provide a malicious code homologous analysis method, apparatus, computer device and computer readable storage medium for solving the above technical problems
The embodiment of the disclosure provides a malicious code homologous analysis method, which comprises the following steps:
acquiring a malicious code image, wherein the malicious code image is obtained according to a malicious code file, the malicious code image comprises at least two malicious code ethnic groups, and the ethnic group marks corresponding to the at least two malicious code ethnic groups are different;
inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis model comprises: at least one convolution structure, a global average pooling layer, and an output layer.
In one embodiment, the acquiring the malicious code image includes:
acquiring binary streams corresponding to the malicious code files respectively;
dividing the binary stream according to bytes and converting the binary stream into a pixel value matrix;
and acquiring the malicious code image according to the pixel value matrix.
In one embodiment, the segmenting the binary stream into bytes and converting the binary stream into a pixel value matrix includes:
acquiring byte numbers corresponding to the malicious code files respectively;
determining the size of the pixel value matrix according to the byte number;
obtaining decimal values corresponding to each byte respectively;
and determining the pixel value matrix according to the size of the pixel value matrix and the decimal value corresponding to each byte.
In one embodiment, said determining said pixel value matrix size according to said number of bytes comprises:
presetting the byte number, and determining the width of the pixel value matrix;
and determining the length of the pixel value matrix according to the ratio of the number of bytes to the width of the pixel value matrix.
In an embodiment, the inputting the malicious code image into a malicious code homology analysis model to obtain a malicious code homology analysis result includes:
acquiring feature vectors respectively corresponding to feature maps, wherein the feature maps are obtained according to the convolution structure;
and inputting the feature vectors respectively corresponding to the feature graphs into the output layer to obtain the malicious code homologous analysis result.
In one embodiment, the obtaining feature vectors corresponding to the feature maps respectively includes:
inputting the feature map to the global average pooling layer;
and the global average pooling layer is used for averaging the pixel values in the feature map to obtain the average value of the pixel values in the feature map.
In one embodiment, the convolution structure consists of one convolution layer and one max-pooling layer.
The embodiment of the present disclosure provides a malicious code homology analysis device, where the device includes:
the malicious code image acquisition module is used for acquiring a malicious code image, the malicious code image is obtained according to a malicious code file, the malicious code image comprises at least two malicious code families, and the corresponding family marks of the at least two malicious code families are different;
a malicious code homologous analysis result obtaining module, configured to input the malicious code image into a malicious code homologous analysis model, and obtain a malicious code homologous analysis result, where the malicious code homologous analysis model includes: at least one convolution structure, a global average pooling layer, and an output layer.
The embodiment of the present disclosure provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the malicious code homology analysis method provided in any embodiment of the present disclosure when executing the computer program.
The embodiment of the disclosure provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of a malicious code homology analysis method provided by any embodiment of the disclosure are implemented.
According to the malicious code homologous analysis method, the malicious code homologous analysis device, the computer equipment and the computer readable storage medium, the malicious code image is obtained according to the malicious code file, the malicious code image comprises at least two malicious code families, and the corresponding family marks of the at least two malicious code families are different; inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis model comprises: at least one convolution structure, a global average pooling layer, and an output layer. The global average pooling layer is added in the malicious code homologous analysis model, and the global average pooling layer is used for replacing a full connection layer, so that the complexity and the calculated amount of a network structure of the malicious code homologous analysis model are reduced, and the malicious code homologous analysis efficiency is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a malicious code homology analysis method according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of another malicious code homology analysis method according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of another malicious code homology analysis method according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of another malicious code homology analysis method according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a malicious code homology analysis apparatus according to an embodiment of the present disclosure.
Detailed Description
In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced in other ways than those described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.
The malicious code homologous analysis is to analyze the derived relevance among the malicious codes according to the internal characteristics and the external characteristics of the malicious codes and the generation and propagation rules of the malicious codes, so that the attack sources or attackers can be quickly tracked and positioned, and meanwhile, the problem of omission of malicious code detection software in the detection process can be avoided.
At present, malicious code files are subjected to imaging processing to obtain malicious code images, and a spatial pyramid pooling layer is added in front of a full connection layer in a neural network model, so that the problem that the sizes of input images of a neural network are required to be the same is solved, and the homologous analysis of malicious codes is realized. However, when the method in the prior art is used for malicious code homologous analysis, the neural network structure is complex, the calculation amount is large, and the malicious code homologous analysis efficiency is low.
The present disclosure provides a malicious code homologous analysis method, apparatus, device and computer storage medium, in which a malicious code image is obtained according to a malicious code file, the malicious code image includes at least two malicious code families, and the respective corresponding family marks of the at least two malicious code families are different; inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis model comprises: at least one convolution structure, a global average pooling layer, and an output layer. The global average pooling layer is added in the malicious code homologous analysis model, and the global average pooling layer is used for replacing a full connection layer, so that the complexity and the calculated amount of a network structure of the malicious code homologous analysis model are reduced, and the efficiency of malicious code homologous analysis is improved.
In an embodiment, as shown in fig. 1, fig. 1 is a schematic flow chart of a malicious code homology analysis method provided by an embodiment of the present disclosure, which specifically includes the following steps:
s101: and acquiring a malicious code image.
The malicious code image is obtained according to the malicious code file, the malicious code image comprises at least two malicious code families, and the corresponding family marks of the at least two malicious code families are different. Malicious code refers to code that has no practical application but is dangerous, i.e., computer code that poses a threat or potential threat to a network or system, such as a computer virus, trojan horse, computer worm, backdoor, logical bomb, etc. In the present embodiment, there are mainly 9 malicious code groups, which are Ramnit (Ramnit), Lollipop (Lollipop), Kelihos third generation (Kellihos _ ver3), Vundo (Vundo), sinda (Simda), terkurl (Tracur), crihols first generation (Kelihos _ ver1), ACY obfuscator (obfuscator. ACY), and Gatak (Gatak), and each of the malicious code groups corresponds to a different tag, and exemplarily, the 9 malicious code groups are sequentially added with their respective belonging group tags by using numbers 0-8, but not limited thereto. It should be noted that each malicious code group includes a plurality of malicious code files, and one malicious code file may be mapped to one malicious code image, but is not limited thereto, and the disclosure is not limited in particular.
Specifically, a plurality of malicious code groups in a malicious code library are extracted, each malicious code group comprises a plurality of malicious code files, the plurality of malicious code files respectively included in the plurality of malicious code groups are obtained according to the plurality of malicious code groups, and the malicious code image respectively corresponding to each malicious code file is obtained based on the plurality of malicious code files.
Illustratively, each malicious code group includes a plurality of malicious code files by extracting 9 malicious code groups in the malicious code library, such as Ramnit (Ramnit), Lollipop (Lollipop), kelithos (Kellihos _ ver3), Vundo (Vundo), sindda (Simda), trekull (Tracur), krill-hous (Kelihos _ ver1), ACY obfuscator (obfuscator. ACY), and Gatak (Gatak), obtaining a plurality of malicious code files included in the 9 malicious code groups respectively according to the 9 malicious code groups, and obtaining a malicious code image corresponding to each malicious code file respectively based on the plurality of malicious code files.
S102: and inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result.
The malicious code homology analysis model comprises the following steps: the system comprises at least one convolution structure, a global average pooling layer and an output layer, wherein a malicious code homologous analysis model is obtained by training according to a malicious code image, and the malicious code homologous analysis model is a convolution neural network, but the system is not limited thereto, and the disclosure is not particularly limited.
Exemplarily, in this embodiment, a convolutional neural network model is established as a malicious code homology analysis model, where the convolutional neural network model may include at least one convolutional structure, a global average pooling layer and an output layer, and parameters and training weights of each layer are set, and further, according to 9 malicious code populations that have been subjected to population tagging, a plurality of malicious code files included in the 9 malicious code populations are obtained, and malicious code images corresponding to the plurality of malicious code files are used as a training malicious sample set, and according to 8: 1: the method comprises the following steps of 1, dividing the convolutional neural network model into a training set, a verification set and a test set in proportion, training the convolutional neural network model by using the training set, and taking the trained convolutional neural network model as a malicious code homologous analysis model, wherein the convolutional structure sequentially comprises a convolutional layer and a maximum pooling layer.
Specifically, a plurality of malicious code families in a malicious code library are extracted, a malicious code image corresponding to each malicious code file is obtained according to a plurality of malicious code files included in each malicious code family, and the malicious code image is input into a trained malicious code homologous analysis model, so that a malicious code homologous analysis result is obtained.
Illustratively, by extracting 9 malicious code groups in the malicious code library, such as Ramnit (Ramnit), Lollipop (Lollipop), Kelihos three (Kellihos _ ver3), Vundo (Vundo), sinda (Simda), trekull (Tracur), crihous (Kelihos _ ver1), ACY obfuscator (obfuscator. ACY), and Gatak (Gatak), according to a plurality of malicious code files included in each malicious code group, a malicious code image corresponding to each malicious code file is obtained, the malicious code image is input into a trained malicious code homology analysis model, the malicious code homology analysis model outputs probability values of the 9 malicious code groups corresponding to the malicious code image, a probability value corresponding to the highest probability value among the 9 malicious code groups is selected as a result of classification of the malicious code image, and an example, inputting a malicious code image corresponding to a malicious code file into a malicious code homologous analysis model, and obtaining that the probability values of 9 malicious code groups corresponding to the malicious code image are {0.1, 0, 0.2, 0, 0.5, 0.1, 0, 0, 0.1}, namely 0.5 is the maximum probability value, the malicious code group corresponding to 0.5 is Xinda, namely the malicious code image is Xinda virus. But is not limited thereto, and the present disclosure is not particularly limited.
In this way, in the embodiment, the malicious code image is obtained according to the malicious code file, the malicious code image includes at least two malicious code groups, and the respective corresponding group markers of the at least two malicious code groups are different; inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis model comprises: the global average pooling layer is added in the malicious code homologous analysis model, and is used for replacing a full connection layer, so that the complexity and the calculated amount of a network structure of the malicious code homologous analysis model are reduced, and the malicious code homologous analysis efficiency is improved.
Fig. 2 is a schematic flowchart of another malicious code homology analysis method provided in the embodiment of the present disclosure, and fig. 2 is a description of a possible implementation manner of S101 based on the embodiment shown in fig. 1, as shown in fig. 2:
s1011: and acquiring binary streams corresponding to the malicious code files respectively.
In this embodiment, the malicious code file is an executable file, that is, a file that can be loaded and executed by an operating system, and the presentation modes of the executable program are different in different operating system environments. For example, under the windows system, the executable file may be a file type such as an exe file, a sys file, a com file, etc., it should be noted that the malicious code file is stored in the terminal device in a binary form, but is not limited thereto, and the disclosure is not limited in particular.
Specifically, the binary stream corresponding to each malicious code file is read by acquiring the malicious code file from the malicious code library.
S1012: the binary stream is divided by bytes and converted into a pixel value matrix.
In most computer systems, a Byte is an 8-bit data unit, the stored value range is 0-255, that is, 1Byte is 8 bits, and for binary content, for example, "01110001111100101011010" may be divided according to the length of 8 bits, that is, "001110001111100101011010" may be divided into "00111000", "11111001" and "01011010", where each 8 bits corresponds to one Byte, but the disclosure is not limited thereto.
Specifically, a pixel value matrix is obtained by obtaining malicious code files from a malicious code library, reading a binary stream corresponding to each malicious code file, dividing the binary stream corresponding to each malicious code file according to bytes, obtaining decimal values corresponding to the binary streams with 8 bits length by corresponding to each byte, and storing the decimal values in the pixel value matrix.
S1013: and acquiring a malicious code image according to the pixel value matrix.
Specifically, binary streams corresponding to each malicious code file are segmented according to bytes to obtain decimal values corresponding to the binary streams with 8 bits, the decimal values are stored in a pixel value matrix to obtain a pixel value matrix, and the pixel value matrix can be visualized as a malicious code image.
In this way, in the embodiment, the binary streams respectively corresponding to the malicious code files are obtained, the binary streams are divided according to bytes and converted into the pixel value matrix, the malicious code image is obtained according to the pixel value matrix, the original executable malicious code file is directly used for imaging processing, and the malicious code image is obtained.
Fig. 3 is a schematic flowchart of another malicious code homology analysis method provided in the embodiment of the present disclosure, and fig. 3 is a description of a possible implementation manner of S1012 based on the embodiment shown in fig. 2, as shown in fig. 3:
s1012 a: and acquiring the byte number corresponding to the malicious code file respectively.
Specifically, the binary stream corresponding to each malicious code file is divided according to bytes, that is, each 8-bit-length binary stream is a byte, and correspondingly, the binary stream corresponding to the malicious code file is divided according to bytes, and according to that each 8-bit-length binary stream is a byte, the number of 8-bit-length binary streams, which are obtained after each malicious code file is divided according to bytes, is counted.
S1012 b: and determining the size of the pixel value matrix according to the number of bytes.
Specifically, the size of the pixel value matrix is determined according to the number of bytes.
Optionally, one possible implementation manner is: and presetting the number of bytes to determine the width of the pixel value matrix.
It should be noted that, in this embodiment, the preset processing refers to performing root number processing on the byte number and then rounding down the processed byte number to obtain the width of the pixel value matrix, but is not limited thereto, and the present disclosure is not limited specifically.
Specifically, the width of the pixel value matrix is determined by determining the byte number corresponding to the malicious code file extracted from the malicious code library, performing root number processing on the byte number, and rounding down.
Illustratively, a malicious code file is extracted from a malicious code library, a binary stream corresponding to the malicious code file is divided according to bytes, the memory size of the malicious code file is determined to be 707KB, and further, the number of bytes of the malicious code file is determined according to the memory size of the malicious code file, that is, the length of the malicious code file is: 707 × 1024 × 723968Byte, according to the number of bytes l, the root number processing is performed, that is, the number of bytes l is set to the initial value
Figure BDA0003175301180000101
And performs a rounding-down process to determine that the width of the pixel value matrix is w ═ 850, but is not limited thereto, and the present disclosure is not particularly limited.
And determining the length of the pixel value matrix according to the ratio of the number of bytes to the width of the pixel value matrix.
Specifically, after the root number processing is performed on the byte number corresponding to the malicious code file, the byte number is rounded down to determine the width of the pixel value matrix, and further, the length of the pixel value matrix is determined according to the ratio of the byte number corresponding to the malicious code file to the width of the pixel value matrix, and the rounding up, that is, the size of the pixel value matrix is determined according to the determined width and length of the pixel value matrix.
Illustratively, a malicious code file is extracted from a malicious code library, and the number of bytes of the malicious code file is determined, that is, the length of the malicious code file is: and l is 723968Byte, and according to the Byte number l, the width of the pixel value matrix is determined as follows: and further, according to the ratio of the number of bytes l corresponding to the malicious code file to the width w of the pixel value matrix: l/w ≈ 851.73 and rounds up to determine the length of the pixel value matrix: d is 852, so as to determine the size of the pixel value matrix as 852 × 851, it should be noted that, when the value sequence formed by the decimal numbers corresponding to each byte is converted into a matrix after determining the size of the pixel value matrix, if the matrix element is insufficient, the value is supplemented as 0, but the disclosure is not limited thereto.
S1012 c: and obtaining decimal values corresponding to each byte respectively.
Specifically, the binary stream corresponding to each malicious code file is divided according to bytes, namely, each 8-bit-length binary stream is a byte, the 8-bit binary corresponding to each byte is converted into decimal values of 0-255, 0 represents black, and 255 represents white.
S1012 d: and determining a pixel value matrix according to the size of the pixel value matrix and the decimal value corresponding to each byte.
Specifically, the binary stream corresponding to each malicious code file is segmented according to bytes, the size of a pixel value matrix corresponding to the malicious code file and a decimal value corresponding to each byte are determined, and the decimal value corresponding to each byte is stored in the pixel value matrix, so that the pixel value matrix corresponding to the malicious code file is determined.
In this way, in the embodiment, the binary streams respectively corresponding to the malicious code files are obtained, the binary streams are divided according to the bytes, the size of the pixel value matrix corresponding to the malicious code files is determined, the pixel value matrix is converted into the pixel value matrix according to the decimal value respectively corresponding to each byte, the pixel value matrix is determined, and the malicious code images are further obtained according to the pixel value matrix, so that the characteristic information of the malicious code files is not lost when the malicious code files are subjected to imaging processing, and the accuracy of malicious code analysis is improved.
Fig. 4 is a schematic flowchart of another malicious code homology analysis method provided by the embodiment of the present disclosure, and fig. 4 is a description of a possible implementation manner of S102 based on the embodiment shown in fig. 3, as shown in fig. 4:
s1021: and acquiring the feature vectors respectively corresponding to the feature maps.
Wherein the feature map is obtained from a convolution structure.
Specifically, a malicious code image is input into a malicious code homologous analysis model, the malicious code homologous analysis model is a convolutional neural network, the convolutional neural network comprises at least one convolutional structure, each convolutional structure sequentially comprises a convolutional layer and a maximum pooling layer, the malicious code image is used as the input of the convolutional structure, the convolutional layer in the convolutional structure is used for extracting a feature map corresponding to the malicious code image, and further, a feature vector corresponding to the feature map is obtained according to the feature map output by the last convolutional structure.
Optionally, one possible implementation manner is: the feature map is input to a global average pooling layer.
And the global average pooling layer is used for averaging the pixel values in the feature map to obtain the average value of the pixel values in the feature map.
The global average pooling layer is used for replacing a full connection layer in a malicious code homologous analysis model, the global average pooling layer is used for replacing the full connection layer, the feature graph output by the last convolution structure is processed to generate a feature vector, full connection work is not needed, namely the global average pooling layer is used for averaging all pixel values in each feature graph, and the average value of the pixel values corresponding to each feature graph is obtained, so that the problem that input malicious code image samples are different in size when the malicious code homologous analysis model is trained is solved, calculation parameters are reduced, and the analysis speed is accelerated.
Specifically, a malicious code image is input into a malicious code homologous analysis model, a feature map corresponding to the malicious code image is extracted according to a convolution structure in the malicious code homologous analysis model, further, the feature map output by the last convolution structure is input into a global average pooling layer, and the global average pooling layer performs averaging processing on pixel values in the feature map, so that an average value corresponding to the pixel values in the feature map is obtained, namely, a feature vector corresponding to the feature map.
S1022: and inputting the feature vectors respectively corresponding to the feature graphs into an output layer to obtain the malicious code homologous analysis result.
In this embodiment, the activation function of the output layer adopts a Softmax function, but is not limited thereto, and the disclosure is not particularly limited thereto.
Specifically, a malicious code image is input into a malicious code homologous analysis model, a feature map corresponding to the malicious code image is extracted according to a convolution structure in the malicious code homologous analysis model, the feature map output by the last convolution structure is input into a global average pooling layer to obtain a feature vector corresponding to the feature map, the feature vector corresponding to the feature map is input into an output layer, probability values respectively corresponding to a plurality of malicious code groups corresponding to the feature map are obtained according to an activation function of the output layer, and the group corresponding to the maximum value in the probability values is used as a malicious code homologous analysis result, so that the malicious code group to which the malicious code image belongs can be obtained.
In this way, in the embodiment, the global average pooling layer is used to replace the full connection layer, the feature map output by the last convolution structure is processed to generate a feature vector, that is, the average processing is performed on all pixel values in each feature map, and the average value of the pixel values corresponding to each feature map is obtained, so that the problem that the sizes of input malicious code image samples are different when a malicious code homologous analysis model is trained is solved, the calculation parameters are reduced, and the analysis speed is increased.
Based on the above embodiments, in some embodiments of the present disclosure, the convolution structure is composed of one convolution layer and one maximum pooling layer. But is not limited thereto, and the present disclosure is not particularly limited.
It should be understood that although the various steps in the flowcharts of fig. 1-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 1-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 5, there is provided a malicious code isogeny analysis apparatus, including: the malicious code image obtaining module 110 and the malicious code homologous analysis result obtaining module 120 are configured, where the malicious code image obtaining module 110 is configured to obtain a malicious code image, the malicious code image is obtained according to a malicious code file, the malicious code image includes at least two malicious code families, and the respective corresponding family markers of the at least two malicious code families are different. A malicious code homologous analysis result obtaining module 120, configured to input a malicious code image into a malicious code homologous analysis model, and obtain a malicious code homologous analysis result, where the malicious code homologous analysis model includes: at least one convolution structure, a global average pooling layer, and an output layer.
In an embodiment of the present disclosure, the malicious code image obtaining module 110 is specifically configured to obtain binary streams corresponding to malicious code files respectively; dividing the binary stream according to bytes, and converting the binary stream into a pixel value matrix; and acquiring a malicious code image according to the pixel value matrix.
In an embodiment of the present disclosure, the malicious code image obtaining module 110 is further configured to obtain byte numbers corresponding to the malicious code files respectively; determining the size of a pixel value matrix according to the number of bytes; obtaining decimal values corresponding to each byte respectively; and determining a pixel value matrix according to the size of the pixel value matrix and the decimal value corresponding to each byte.
In an embodiment of the present disclosure, the malicious code image obtaining module 110 is further configured to perform preset processing on a number of bytes, and determine a width of a pixel value matrix; and determining the length of the pixel value matrix according to the ratio of the number of bytes to the width of the pixel value matrix.
In an embodiment of the present disclosure, the malicious code homology analysis result obtaining module 120 is specifically configured to obtain feature vectors corresponding to feature maps respectively, where the feature maps are obtained according to the convolution structure; and inputting the feature vectors respectively corresponding to the feature graphs into an output layer to obtain the malicious code homologous analysis result.
In an embodiment of the present disclosure, the malicious code homologous analysis result obtaining module 120 is further configured to input a feature map into the global average pooling layer; and the global average pooling layer is used for averaging the pixel values in the feature map to obtain the average value of the pixel values in the feature map.
In one embodiment of the disclosed embodiment, the convolution structure is composed of one convolution layer and one maximum pooling layer.
In the above embodiment, the malicious code image obtaining module 110 is configured to obtain a malicious code image, where the malicious code image is obtained according to a malicious code file, the malicious code image includes at least two malicious code groups, and the respective corresponding group markers of the at least two malicious code groups are different. A malicious code homologous analysis result obtaining module 120, configured to input a malicious code image into a malicious code homologous analysis model, and obtain a malicious code homologous analysis result, where the malicious code homologous analysis model includes: at least one convolution structure, a global average pooling layer, and an output layer. By adopting the method, the global average pooling layer is added in the malicious code homologous analysis model, and the global average pooling layer is used for replacing the full connection layer, so that the complexity and the calculated amount of the network structure of the malicious code homologous analysis model are reduced, and the malicious code homologous analysis efficiency is improved.
For specific limitations of the malicious code homologous analysis apparatus, reference may be made to the above limitations of the malicious code homologous analysis method, which is not described herein again. The modules in the malicious code homologous analysis device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: acquiring a malicious code image, wherein the malicious code image is acquired according to a malicious code file, the malicious code image comprises at least two malicious code families, and the corresponding family marks of the at least two malicious code families are different; inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis model comprises: at least one convolution structure, a global average pooling layer, and an output layer.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a malicious code image, comprising: acquiring binary streams corresponding to the malicious code files respectively; dividing the binary stream according to bytes, and converting the binary stream into a pixel value matrix; and acquiring a malicious code image according to the pixel value matrix.
In one embodiment, the processor, when executing the computer program, further performs the steps of: segmenting the binary stream by bytes and converting the segmented binary stream into a pixel value matrix, wherein the pixel value matrix comprises the following steps: acquiring byte numbers respectively corresponding to the malicious code files; determining the size of a pixel value matrix according to the number of bytes; obtaining decimal values corresponding to each byte respectively; and determining a pixel value matrix according to the size of the pixel value matrix and the decimal value corresponding to each byte.
In one embodiment, the processor, when executing the computer program, further performs the steps of: determining the size of a pixel value matrix according to the number of bytes, comprising: presetting the number of bytes, and determining the width of a pixel value matrix; and determining the length of the pixel value matrix according to the ratio of the number of bytes to the width of the pixel value matrix.
In one embodiment, the processor, when executing the computer program, further performs the steps of: inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis result comprises the following steps: acquiring feature vectors respectively corresponding to the feature maps, wherein the feature maps are obtained according to a convolution structure; and inputting the feature vectors respectively corresponding to the feature graphs into an output layer to obtain the malicious code homologous analysis result.
In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring the feature vectors respectively corresponding to the feature maps, wherein the method comprises the following steps: inputting a feature map into the global average pooling layer; and the global average pooling layer is used for averaging the pixel values in the feature map to obtain the average value of the pixel values in the feature map.
In one embodiment, the processor, when executing the computer program, further performs the steps of: the convolution structure consists of one convolution layer and one maximum pooling layer,
in the above embodiment, the malicious code image is obtained according to the malicious code file, the malicious code image includes at least two malicious code groups, and the respective corresponding group markers of the at least two malicious code groups are different; inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis model comprises: at least one convolution structure, a global average pooling layer, and an output layer. The global average pooling layer is added in the malicious code homologous analysis model, and the global average pooling layer is used for replacing a full connection layer, so that the complexity and the calculated amount of a network structure of the malicious code homologous analysis model are reduced, and the malicious code homologous analysis efficiency is improved.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring a malicious code image, wherein the malicious code image is acquired according to a malicious code file, the malicious code image comprises at least two malicious code families, and the corresponding family marks of the at least two malicious code families are different; inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis model comprises: at least one convolution structure, a global average pooling layer, and an output layer.
In one embodiment, the computer program when executed by the processor implements the steps of: acquiring a malicious code image, comprising: acquiring binary streams corresponding to the malicious code files respectively; dividing the binary stream according to bytes, and converting the binary stream into a pixel value matrix; and acquiring a malicious code image according to the pixel value matrix.
In one embodiment, the computer program when executed by the processor implements the steps of: segmenting the binary stream by bytes and converting the segmented binary stream into a pixel value matrix, wherein the pixel value matrix comprises the following steps: acquiring byte numbers respectively corresponding to the malicious code files; determining the size of a pixel value matrix according to the number of bytes; obtaining decimal values corresponding to each byte respectively; and determining a pixel value matrix according to the size of the pixel value matrix and the decimal value corresponding to each byte.
In one embodiment, the computer program when executed by the processor implements the steps of: determining the size of a pixel value matrix according to the number of bytes, comprising: presetting the number of bytes, and determining the width of a pixel value matrix; and determining the length of the pixel value matrix according to the ratio of the number of bytes to the width of the pixel value matrix.
In one embodiment, the computer program when executed by the processor implements the steps of: inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis result comprises the following steps: acquiring feature vectors respectively corresponding to the feature maps, wherein the feature maps are obtained according to a convolution structure; and inputting the feature vectors respectively corresponding to the feature graphs into an output layer to obtain the malicious code homologous analysis result.
In one embodiment, the computer program when executed by the processor implements the steps of: acquiring the feature vectors respectively corresponding to the feature maps, wherein the method comprises the following steps: inputting a feature map into the global average pooling layer; and the global average pooling layer is used for averaging the pixel values in the feature map to obtain the average value of the pixel values in the feature map.
In one embodiment, the computer program when executed by the processor implements the steps of: the convolution structure consists of one convolution layer and one maximum pooling layer,
in the above embodiment, the malicious code image is obtained according to the malicious code file, the malicious code image includes at least two malicious code groups, and the respective corresponding group markers of the at least two malicious code groups are different; inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis model comprises: at least one convolution structure, a global average pooling layer, and an output layer. The global average pooling layer is added in the malicious code homologous analysis model, and the global average pooling layer is used for replacing a full connection layer, so that the complexity and the calculated amount of a network structure of the malicious code homologous analysis model are reduced, and the malicious code homologous analysis efficiency is improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided by the present disclosure may include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM is available in many forms, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), and the like.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present disclosure, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the concept of the present disclosure, and these changes and modifications are all within the scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the appended claims.

Claims (10)

1. A malicious code homology analysis method is characterized by comprising the following steps:
acquiring a malicious code image, wherein the malicious code image is obtained according to a malicious code file, the malicious code image comprises at least two malicious code ethnic groups, and the ethnic group marks corresponding to the at least two malicious code ethnic groups are different;
inputting the malicious code image into a malicious code homologous analysis model to obtain a malicious code homologous analysis result, wherein the malicious code homologous analysis model comprises: at least one convolution structure, a global average pooling layer, and an output layer.
2. The method of claim 1, wherein the obtaining the malicious code image comprises:
acquiring binary streams corresponding to the malicious code files respectively;
dividing the binary stream according to bytes and converting the binary stream into a pixel value matrix;
and acquiring the malicious code image according to the pixel value matrix.
3. The method of claim 2, wherein said segmenting said binary stream into a matrix of pixel values by byte comprises:
acquiring byte numbers corresponding to the malicious code files respectively;
determining the size of the pixel value matrix according to the byte number;
obtaining decimal values corresponding to each byte respectively;
and determining the pixel value matrix according to the size of the pixel value matrix and the decimal value corresponding to each byte.
4. The method of claim 3, wherein determining the pixel value matrix size according to the number of bytes comprises:
presetting the byte number, and determining the width of the pixel value matrix;
and determining the length of the pixel value matrix according to the ratio of the number of bytes to the width of the pixel value matrix.
5. The method according to claim 1, wherein the inputting the malicious code image into a malicious code homology analysis model to obtain a malicious code homology analysis result comprises:
acquiring feature vectors respectively corresponding to feature maps, wherein the feature maps are obtained according to the convolution structure;
and inputting the feature vectors respectively corresponding to the feature graphs into the output layer to obtain the malicious code homologous analysis result.
6. The method according to claim 5, wherein the obtaining the feature vectors corresponding to the feature maps respectively comprises:
inputting the feature map to the global average pooling layer;
and the global average pooling layer is used for averaging the pixel values in the feature map to obtain the average value of the pixel values in the feature map.
7. The method of claim 1, wherein the convolution structure consists of one convolution layer and one maximum pooling layer.
8. A malicious code homology analysis apparatus, comprising:
the malicious code image acquisition module is used for acquiring a malicious code image, the malicious code image is obtained according to a malicious code file, the malicious code image comprises at least two malicious code families, and the corresponding family marks of the at least two malicious code families are different;
a malicious code homologous analysis result obtaining module, configured to input the malicious code image into a malicious code homologous analysis model, and obtain a malicious code homologous analysis result, where the malicious code homologous analysis model includes: at least one convolution structure, a global average pooling layer, and an output layer.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program implements the steps of the malicious code homology analysis method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the malicious code homology analysis method according to any one of claims 1 to 7.
CN202110830366.1A 2021-07-22 2021-07-22 Malicious code homologous analysis method and device, computer equipment and storage medium Pending CN113360911A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110830366.1A CN113360911A (en) 2021-07-22 2021-07-22 Malicious code homologous analysis method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110830366.1A CN113360911A (en) 2021-07-22 2021-07-22 Malicious code homologous analysis method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113360911A true CN113360911A (en) 2021-09-07

Family

ID=77540093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110830366.1A Pending CN113360911A (en) 2021-07-22 2021-07-22 Malicious code homologous analysis method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113360911A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806746A (en) * 2021-09-24 2021-12-17 沈阳理工大学 Malicious code detection method based on improved CNN network
CN114139153A (en) * 2021-11-02 2022-03-04 武汉大学 Graph representation learning-based malware interpretability classification method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989288A (en) * 2015-12-31 2016-10-05 武汉安天信息技术有限责任公司 Deep learning-based malicious code sample classification method and system
CN107392019A (en) * 2017-07-05 2017-11-24 北京金睛云华科技有限公司 A kind of training of malicious code family and detection method and device
CN108804919A (en) * 2018-05-03 2018-11-13 上海交通大学 The homologous determination method of malicious code based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989288A (en) * 2015-12-31 2016-10-05 武汉安天信息技术有限责任公司 Deep learning-based malicious code sample classification method and system
CN107392019A (en) * 2017-07-05 2017-11-24 北京金睛云华科技有限公司 A kind of training of malicious code family and detection method and device
CN108804919A (en) * 2018-05-03 2018-11-13 上海交通大学 The homologous determination method of malicious code based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗晓曙等: "人工智能技术及应用", 西安电子科技大学出版社, pages: 29 - 32 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113806746A (en) * 2021-09-24 2021-12-17 沈阳理工大学 Malicious code detection method based on improved CNN network
CN113806746B (en) * 2021-09-24 2024-03-22 沈阳理工大学 Malicious code detection method based on improved CNN (CNN) network
CN114139153A (en) * 2021-11-02 2022-03-04 武汉大学 Graph representation learning-based malware interpretability classification method

Similar Documents

Publication Publication Date Title
US10644721B2 (en) Processing core data compression and storage system
CN109344618B (en) Malicious code classification method based on deep forest
CN102880726B (en) A kind of image filtering method and system
CN104978521B (en) A kind of method and system for realizing malicious code mark
CN113360911A (en) Malicious code homologous analysis method and device, computer equipment and storage medium
CN111259397B (en) Malware classification method based on Markov graph and deep learning
WO2020125100A1 (en) Image search method, apparatus, and device
CN110222511A (en) The recognition methods of Malware family, device and electronic equipment
CN111373393B (en) Image retrieval method and device and image library generation method and device
AU2009347563B2 (en) Detection of objects represented in images
CN111488574B (en) Malicious software classification method, system, computer equipment and storage medium
CN111639523B (en) Target detection method, device, computer equipment and storage medium
CN116975864A (en) Malicious code detection method and device, electronic equipment and storage medium
CN116595525A (en) Threshold mechanism malicious software detection method and system based on software map
KR102242904B1 (en) Method and apparatus for estimating parameters of compression algorithm
CN107766863B (en) Image characterization method and server
CN116226854B (en) Malware detection method, system, readable storage medium and computer
CN115828248B (en) Malicious code detection method and device based on interpretive deep learning
CN110728615B (en) Steganalysis method based on sequential hypothesis testing, terminal device and storage medium
CN116992445A (en) Method for acquiring malicious code file classification model and file classification method
CN116881905A (en) Program security detection method, device, computer equipment and storage medium
Challa et al. EFFICIENT COMPRESSION OF BINARIZED TAINTED DOCUMENTS.
CN117496246A (en) Malicious software classification method based on convolutional neural network
CN116257856A (en) Source code detection method, source code detection device, computer equipment and storage medium
CN118051908A (en) Malicious code homology detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210907