CN110135160A - The method, apparatus and system of software detection - Google Patents
The method, apparatus and system of software detection Download PDFInfo
- Publication number
- CN110135160A CN110135160A CN201910353079.9A CN201910353079A CN110135160A CN 110135160 A CN110135160 A CN 110135160A CN 201910353079 A CN201910353079 A CN 201910353079A CN 110135160 A CN110135160 A CN 110135160A
- Authority
- CN
- China
- Prior art keywords
- api
- sequence
- malware
- multithreading
- thread
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/52—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
- G06F21/53—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/56—Computer malware detection or handling, e.g. anti-virus arrangements
- G06F21/562—Static detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Virology (AREA)
- Stored Programmes (AREA)
Abstract
The present invention provides the method, apparatus and system of a kind of software detection, this method, comprising: obtains software data to be detected;System file thread is filtered out from the software data;The API sequence in the system file thread is extracted by sandbox;The API sequence is truncated, and the API sequence after truncation is recompiled according to multithreading, obtains API multithreading sequence;The API multithreading sequence is tested and analyzed by target detection model, obtains software detection result.More efficient, accurate malware detection may be implemented, a greater variety of malware detection results can also be obtained.
Description
Technical field
The present invention relates to field of information security technology more particularly to the method, apparatus and system of a kind of software detection.
Background technique
With information technology, the high speed development of internet, network security problem is more and more paid close attention to by people, malice
Software, which is undoubtedly, to be endangered maximum, and Malware deliberately executes the virus of malice task, worm to refer on the computer systems
Or the program of Trojan Horse etc..Implement control by destroying software process.N-Gram traditional detection method is in malice
It is widely used in software sequences detection.
With the continuous development of Malware countermeasure techniques, malware detection techniques are also from static detection gradually to sound
Bonding position development.
However, the software detecting method of static sequence extraction scheme is used mostly, when being based on N-Gram feature extraction, because
Dynamic sequence is there are multithreading, the problems such as sequence length is extremely uneven, it is inaccurate to cause testing result, while calculating being brought to open
Pin is big, and receptive field is too small.
Summary of the invention
The present invention provides the method, apparatus and system of a kind of software detection, to realize more efficient, accurate Malware
Detection, can also obtain a greater variety of malware detection results.
In a first aspect, a kind of method of software detection provided in an embodiment of the present invention, comprising:
Obtain software data to be detected;
System file thread is filtered out from the software data;
The API sequence in the system file thread is extracted by sandbox;
The API sequence is truncated, and the API sequence after truncation is recompiled according to multithreading, is obtained
API multithreading sequence;
The API multithreading sequence is tested and analyzed by target detection model, obtains software detection result.
In a kind of possible design, the API multithreading sequence is being tested and analyzed by target detection model,
Before obtaining software detection result, further includes:
Obtain malware data;The malware data includes: infection type virus, trojan horse program, digs mine program, strangles
Rope virus;
From the malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of the Malware is extracted by sandbox, obtains the API sequence of Malware
Column;
The API sequence of the Malware is truncated, and the API sequence after truncation is carried out again according to multithreading
Coding, obtains the API multithreading sequence of Malware;
Construct initial detecting model;The initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to the initial detecting mould
Type is iterated training, obtains the target detection model.
In a kind of possible design, the API sequence in the system file thread is extracted by sandbox, comprising:
Dynamic Execution is carried out to the system file thread by sandbox, obtains API Name, API thread that file calls
The serial number of API Calls in number, API return value, thread.
In a kind of possible design, the API sequence is truncated, and according to multithreading to the API sequence after truncation
Column are recompiled, and API multithreading sequence is obtained, comprising:
When calling the quantity of documents of API to be more than preset threshold value, then API sequence corresponding to the thread in a certain thread
It is truncated, saves the API record of preset quantity, the API sequence being truncated;
According to multithreading, the API sequence of truncation is recompiled, obtains API multithreading sequence.
In a kind of possible design, the API multithreading sequence is tested and analyzed by target detection model, is obtained
To software detection result, comprising:
The API multithreading sequence is detected by target detection model, judges that the API multithreading sequence is corresponding
Software data whether be malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
Second aspect, a kind of device of software detection provided in an embodiment of the present invention, comprising:
Module is obtained, for obtaining software data to be detected;
Screening module, for filtering out system file thread from the software data;
Extraction module, for extracting the API sequence in the system file thread by sandbox;
Coding module carries out the API sequence after truncation for the API sequence to be truncated, and according to multithreading
It recompiles, obtains API multithreading sequence;
Module is obtained, for testing and analyzing by target detection model to the API multithreading sequence, obtains software
Testing result.
In a kind of possible design, the API multithreading sequence is being tested and analyzed by target detection model,
Before obtaining software detection result, further includes:
Obtain malware data;The malware data includes: infection type virus, trojan horse program, digs mine program, strangles
Rope virus;
From the malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of the Malware is extracted by sandbox, obtains the API sequence of Malware
Column;
The API sequence of the Malware is truncated, and the API sequence after truncation is carried out again according to multithreading
Coding, obtains the API multithreading sequence of Malware;
Construct initial detecting model;The initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to the initial detecting mould
Type is iterated training, obtains the target detection model.
In a kind of possible design, extraction module is specifically used for:
Dynamic Execution is carried out to the system file thread by sandbox, obtains API Name, API thread that file calls
The serial number of API Calls in number, API return value, thread.
In a kind of possible design, coding module is specifically used for:
When calling the quantity of documents of API to be more than preset threshold value, then API sequence corresponding to the thread in a certain thread
It is truncated, saves the API record of preset quantity, the API sequence being truncated;
According to multithreading, the API sequence of truncation is recompiled, obtains API multithreading sequence.
In a kind of possible design, module is obtained, is specifically used for:
The API multithreading sequence is detected by target detection model, judges that the API multithreading sequence is corresponding
Software data whether be malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
The third aspect, a kind of system of software detection provided in an embodiment of the present invention, including memory and processor, storage
The executable instruction of the processor is stored in device;Wherein, the processor is configured to next via executing instruction described in execution
The method for executing software detection described in any one of first aspect.
A kind of fourth aspect, computer readable storage medium provided in an embodiment of the present invention, is stored thereon with computer journey
Sequence realizes the method for the described in any item software detections of first aspect when the program is executed by processor.
The present invention provides the method, apparatus and system of a kind of software detection, this method, comprising: obtains software to be detected
Data;System file thread is filtered out from the software data;The API sequence in the system file thread is extracted by sandbox
Column;The API sequence is truncated, and the API sequence after truncation is recompiled according to multithreading, it is more to obtain API
Thread sequence;The API multithreading sequence is tested and analyzed by target detection model, obtains software detection result.It can
To realize more efficient, accurate malware detection, a greater variety of malware detection results can also be obtained.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with
It obtains other drawings based on these drawings.
Fig. 1 is an application scenarios schematic diagram of the invention;
Fig. 2 is the method flow diagram for the software detection that the embodiment of the present invention one provides;
Fig. 3 is the schematic diagram of empty convolution in the method for the software detection that the embodiment of the present invention one provides;
Fig. 4 is the apparatus structure schematic diagram of software detection provided by Embodiment 2 of the present invention;
Fig. 5 is the system structure diagram for the software detection that the embodiment of the present invention three provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
Description and claims of this specification and term " first ", " second ", " third " " in above-mentioned attached drawing
The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage
The data that solution uses in this way are interchangeable under appropriate circumstances, so that the embodiment of the present invention described herein for example can be to remove
Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any
Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production
Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this
A little process, methods, the other step or units of product or equipment inherently.
How to be solved with technical solution of the specifically embodiment to technical solution of the present invention and the application below above-mentioned
Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept
Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, the embodiment of the present invention is described.
Fig. 1 is an application scenarios schematic diagram of the invention, as shown in Figure 1, being obtained in the present invention using software detection systems 11
Software data 12 to be detected filters out system file thread from these software datas to be detected, and being extracted by sandbox should
API sequence in system file thread, is truncated API sequence, and carries out weight to the API sequence after truncation according to multithreading
It is newly encoded, obtain API multithreading sequence;The API multithreading sequence is tested and analyzed by target detection model, is obtained soft
Part testing result 13.In an alternative embodiment, software detection systems include target detection model.Using above-mentioned detection side
The detection of more efficient, accurate Malware may be implemented in method, can also obtain a greater variety of malware detection knots
Fruit.
Fig. 2 is the method flow diagram for the software detection that the embodiment of the present invention one provides, as shown in Fig. 2, the software detection
Method may include:
S201, software data to be detected is obtained.
In the present embodiment, software detection systems carry out obtaining software data to be detected using module is obtained, in one kind
In optional embodiment, the operating status program information obtained of program can be captured when program is run by obtaining module, this
A little operating statuses may include that the CPU of program executes instruction sequence, and system calls (system call), application programming interfaces
(API, Application Programming Interface) or the higher system service of the level of abstraction etc..A kind of optional
Embodiment in, software detection systems use API Monitor (i.e. API Calls monitoring software), can monitor and show application
The calling that program carries out can also track any derived API, such as Win32API and other third party API etc..
APIMonitor is supported to show information abundant, be may include function name, calling sequence, output and input parameter, function returns
Value etc. is returned, and 82 DLL ((Dynamic Link Library, application program are expanded) and about 4000 can be predefined
The prototype of API.
In multidate information in the present embodiment in the available operating status of software detection systems, with acquisition operating status
Static information is lower compared to redundancy, and can capture in real time, and in addition the multidate information is not influenced by shell adding encryption, Huo Zhe
The denaturation technique of some instruction-levels is called to the higher system of the system level of abstraction or API (Application
Programming Interface, application programming interface) in vain, it is influenced by denaturation smaller.
S202, system file thread is filtered out from software data.
In an alternative embodiment, system file thread is filtered out from the software data, wherein system file line
Journey may include windows platform software document thread, or may include simply being adapted to improvement by API sequence to be suitable for
The software document thread of Android, Linux platform.
S203, pass through the API sequence in sandbox extraction system file thread.
Specifically, carrying out Dynamic Execution to system file thread by sandbox, API Name, API line that file calls are obtained
The serial number of API Calls in journey number, API return value, thread.
In an alternative embodiment, software detection systems may include different sandboxs, by sandbox to system text
Part thread carries out Dynamic Execution, extracts the API sequence in the system file thread, API Name that main document retaining calls,
API thread number, API return value, in thread API Calls serial number.In an alternative embodiment, software detection system
System uses extraction module, such as sandbox (Sanbox) that can be placed on insincere software data to be detected in isolation environment automatically
Ground Dynamic Execution extracts the dynamic behaviours such as process behavior, network behavior, the file behavior in its operational process.In the present embodiment
It, only need to API sequence after sandbox dynamic analysis in extraction system file thread without the specific restriction of sandbox.Wherein example
If Cuckoo is using the automation malware analysis system of the Python open source write, it is all that Malware can be tracked and recorded
Calling situation;Malware file behavior may include creating new file in Malware implementation procedure, modification file, deleting
File reads file or downloads the behavior of file, the memory mirror of available Malware, with PCAP (Process
Characterization Analysis Package, process characteristic analysis software package) format record Malware network flow
Amount;The screenshot capture etc. in Malware implementation procedure can also be obtained.And then it can be according to the Dynamic Execution of sandbox as a result, right
Software data carries out deep analysis.
S204, API sequence is truncated, and the API sequence after truncation is recompiled according to multithreading, obtained
API multithreading sequence.
Specifically, calling the quantity of documents of API in a certain thread is more than preset threshold value, then API corresponding to thread
Sequence is truncated, and the API record of preset quantity, the API sequence being truncated are saved;According to multithreading, to the API sequence of truncation
Column are recompiled, and API multithreading sequence is obtained.
Have precision limitation since sandbox executes the time in the present embodiment, it may appear that occur on an index with thread or
Different threads are all the case where executing multiple API, although can guarantee same TID (ThreadIdentifier, thread command character)
Internal sequence, but cannot be guaranteed continuous.It, can be to the thread when calling more than 5000 API files in a thread TID
Corresponding API sequence is truncated, and retains the record of 5000 API before each TID in sequence, the API sequence being truncated
Column.And then according to multithreading, the API sequence of truncation is recompiled, API multithreading sequence is obtained.A kind of optional
In embodiment, there is no ordinal relation between different threads TID, the index in the same TID is ascending to represent the successive of calling
Relationship.In an alternative embodiment, software detection systems can be expanded by different coding modes, or can be with
Term vector technology is introduced to be recompiled.
In the present embodiment using above-mentioned steps S204 can it is excessive to avoid API sequence length, caused by computing cost it is excessive
The problem of, and the API sequence after truncation is recompiled, it can obtain with apparent timing, the relevance of height
Sequence, and then obtain a greater variety of malware detection results.
S205, API multithreading sequence is tested and analyzed by target detection model, obtains software detection result.
Specifically, detecting by target detection model to API multithreading sequence, judge that API multithreading sequence is corresponding
Software data whether be malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
In an alternative embodiment, the taxonomic structure of Malware may include: infectious virus, trojan horse program,
Dig mine program, DDOS (Distributed Denial of Service, distributed denial of service) wooden horse, extort virus etc.,
Quantity of classifying can be with up to 600,000,000.
In the present embodiment, by the way that API Calls and data packet reconstruct, which can submit copy version, i.e.,
Malware data copy, and this is exactly the typical behaviour of Vflooder wooden horse family.Therefore judge the API multithreading sequence pair
The software data answered is malware data.Wherein Vflooder is a kind of Flooder (worm) wooden horse of specific type, and
Flooder wooden horse can send the normal operating that a large amount of information carrys out interrupt targets to target.And then export malware data
Type label: worm wooden horse.In an alternative embodiment, target detection model examines the API multithreading sequence
It surveys, in an alternative embodiment, software detection systems obtain standard, the data of safe operation, and judgement is not Malware
Data, the software detection systems then prompt software data safe.
In an alternative embodiment, API multithreading sequence is being tested and analyzed by target detection model, is being obtained
To before software detection result, further includes:
Obtain malware data;Malware data includes: infection type virus, trojan horse program, digs mine program, extorts disease
Poison;
From malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of Malware is extracted by sandbox, obtains the API sequence of Malware;
The API sequence of Malware is truncated, and the API sequence after truncation is compiled again according to multithreading
Code, obtains the API multithreading sequence of Malware;
Construct initial detecting model;Initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to initial detecting model into
Row iteration training, obtains target detection model.
In an alternative embodiment, software detection systems obtain malware data, wherein the malware data
It may include: infectious virus, trojan horse program, dig mine program, extort virus etc..Software detection systems are from the Malware number
In, the system file thread of Malware is filtered out.The API in the system file thread of the Malware is extracted by sandbox
Sequence obtains the API sequence of Malware, wherein obtaining API Name, the API thread number, API return value, line that file calls
The serial number of API Calls in journey.The API sequence of the Malware is truncated, and API sequence after being truncated according to multithreading
It is recompiled, obtains the API multithreading sequence of Malware.To coding mode without limiting in the present embodiment, ability
Field technique personnel can specifically limit according to actual needs, such as can be expanded using different coding modes, or
Term vector technology can also be introduced recompile etc..
And then initial detecting model is constructed, which is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, instruction is iterated to initial detecting model by the API multithreading sequence of Malware
Practice, obtains target detection model.
In an alternative embodiment, software detection systems construct initial detecting model, which is based on
The disaggregated model of empty convolution sum TextCNN, wherein empty convolution also known as expansion convolution, introduce one into convolutional layer and be known as
The new parameter of " spreading rate (dilation rate) ", the parameter define the spacing being respectively worth when convolution kernel handles data.Fig. 3 is
The schematic diagram of empty convolution in the method for the software detection that the embodiment of the present invention one provides.With reference to Fig. 3, the 2- of corresponding 3x3
Dilated conv, actual convolution kernel size or 3x3, but cavity is 1, the i.e. image block for a 7x7, only 9
Convolution operation occurs for the point of a red and the convolution kernel of 3x3, remaining point skips over.It can be appreciated that the size of convolution kernel is
7x7, but the weight of 9 dots only in figure is not 0, remaining is all 0.It can be seen that although convolution kernel only has 3x3,
The receptive field of this convolution has had increased to 7x7.In an alternative embodiment, if it is considered that this 2-dilated
If the preceding layer of conv is a 1-dilated conv, then each dot is exactly the convolution output of 1-dilated, so
Receptive field is the convolution effect that 3x3, i.e. 1-dilated and 2-dilated can reach 7x7 altogether, the impression of empty convolution
Open country is exponential growth.
In an alternative embodiment, empty convolution can fill in space between convolution nuclear element when convolution,
Here a new hyper parameter d being introduced, the value of (d-1) is then the space number filled in, it is assumed that convolution kernel size originally is k,
Convolution kernel size n after so having filled in a space (d-1) are as follows: n=k+ (k-1) * (d-1) is in turn, it is assumed that input empty convolution
Size be i, step-length s, the calculation formula of characteristic pattern size o after empty convolution are as follows:
In an alternative embodiment, TextCNN uses text classification convolutional neural networks, due to its structure is simple,
The advantages that effect is good, by using stratification convolution kernel cavity convolution, i.e., be added in convolution process Dilated_size=[1,
2,3,4] and two windows of Kernel_size=[2,3,4,5], thus TextCNN neural network can be to maximum in the present invention
The convolution window of 20 length models.
In an alternative embodiment, it is evaluation goal with the logloss that more classifies (Log loss, logarithm loss), leads to
The API multithreading sequence for crossing Malware is iterated training to initial detecting model, obtains target detection model.Wherein
Logloss are as follows:M represents classification number, N generation
Table test set sample number, yijRepresent whether i-th of sample is classification j (be~1, no~0), PijI-th of sample is represented to be predicted
For the probability (prob) of classification j, final logloss retains after decimal point 6.
More efficient, accurate malware detection may be implemented in the method for software detection in the present embodiment, can also obtain
Obtain a greater variety of malware detection results.
Fig. 4 is the apparatus structure schematic diagram of software detection provided by Embodiment 2 of the present invention, as shown in figure 4, the device can
To include:
Module 31 is obtained, for obtaining software data to be detected;
Screening module 32, for filtering out system file thread from software data;
Extraction module 33, for passing through the API sequence in sandbox extraction system file thread;
Coding module 34 carries out weight to the API sequence after truncation for API sequence to be truncated, and according to multithreading
It is newly encoded, obtain API multithreading sequence;
Module 35 is obtained, for testing and analyzing by target detection model to API multithreading sequence, obtains software inspection
Survey result.
In an alternative embodiment, API multithreading sequence is being tested and analyzed by target detection model, is being obtained
To before software detection result, further includes:
Obtain malware data;Malware data includes: infection type virus, trojan horse program, digs mine program, extorts disease
Poison;
From malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of Malware is extracted by sandbox, obtains the API sequence of Malware;
The API sequence of Malware is truncated, and the API sequence after truncation is compiled again according to multithreading
Code, obtains the API multithreading sequence of Malware;
Construct initial detecting model;Initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to initial detecting model into
Row iteration training, obtains target detection model.
In an alternative embodiment, extraction module 33 are specifically used for:
By sandbox to system file thread carry out Dynamic Execution, obtain file call API Name, API thread number,
The serial number of API Calls in API return value, thread.
In an alternative embodiment, coding module 34 are specifically used for:
When calling the quantity of documents of API to be more than preset threshold value in a certain thread, then API sequence corresponding to thread carries out
Truncation saves the API record of preset quantity, the API sequence being truncated;
According to multithreading, the API sequence of truncation is recompiled, obtains API multithreading sequence.
In an alternative embodiment, module 35 is obtained, is specifically used for:
API multithreading sequence is detected by target detection model, judges the corresponding software number of API multithreading sequence
According to whether being malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
The device of software detection in the present embodiment, can execute the technical solution in method shown in Fig. 2, implement
Associated description in journey and technical principle method shown in Figure 2, details are not described herein again.
Fig. 5 is the system structure diagram for the software detection that the embodiment of the present invention three provides, as shown in figure 5, the present embodiment
The system 40 of software detection may include: processor 41 and memory 42.
Memory 42 (such as realizes application program, the function mould of the method for above-mentioned software detection for storing computer program
Block etc.), computer instruction etc.;
Above-mentioned computer program, computer instruction etc. can be with partitioned storages in one or more memories 42.And
Above-mentioned computer program, computer instruction, data etc. can be called with device 41 processed.
Processor 41, for executing the computer program of the storage of memory 42, to realize method that above-described embodiment is related to
In each step.
It specifically may refer to the associated description in previous methods embodiment.
Processor 41 and memory 42 can be absolute construction, be also possible to the integrated morphology integrated.Work as processing
When device 41 and memory 42 are absolute construction, memory 42, processor 41 can be of coupled connections by bus 43.
The server of the present embodiment can execute the technical solution in method shown in Fig. 2, implement process and technology
Associated description in principle method shown in Figure 2, details are not described herein again.
In addition, the embodiment of the present application also provides a kind of computer readable storage medium, deposited in computer readable storage medium
Computer executed instructions are contained, when at least one processor of user equipment executes the computer executed instructions, user equipment
Execute above-mentioned various possible methods.
Wherein, computer-readable medium includes computer storage media and communication media, and wherein communication media includes being convenient for
From a place to any medium of another place transmission computer program.Storage medium can be general or specialized computer
Any usable medium that can be accessed.A kind of illustrative storage medium is coupled to processor, to enable a processor to from this
Read information, and information can be written to the storage medium.Certainly, storage medium is also possible to the composition portion of processor
Point.Pocessor and storage media can be located in ASIC.In addition, the ASIC can be located in user equipment.Certainly, processor and
Storage medium can also be used as discrete assembly and be present in communication equipment.
Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to
The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey
When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned include: ROM, RAM, magnetic disk or
The various media that can store program code such as person's CD.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (12)
1. a kind of method of software detection characterized by comprising
Obtain software data to be detected;
System file thread is filtered out from the software data;
The API sequence in the system file thread is extracted by sandbox;
The API sequence is truncated, and the API sequence after truncation is recompiled according to multithreading, it is more to obtain API
Thread sequence;
The API multithreading sequence is tested and analyzed by target detection model, obtains software detection result.
2. the method according to claim 1, wherein passing through target detection model to the multi-thread program of the API
Column are tested and analyzed, before obtaining software detection result, further includes:
Obtain malware data;The malware data includes: infection type virus, trojan horse program, digs mine program, extorts disease
Poison;
From the malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of the Malware is extracted by sandbox, obtains the API sequence of Malware;
The API sequence of the Malware is truncated, and the API sequence after truncation is compiled again according to multithreading
Code, obtains the API multithreading sequence of Malware;
Construct initial detecting model;The initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to the initial detecting model into
Row iteration training, obtains the target detection model.
3. the method according to claim 1, wherein extracting the API in the system file thread by sandbox
Sequence, comprising:
By sandbox to the system file thread carry out Dynamic Execution, obtain file call API Name, API thread number,
The serial number of API Calls in API return value, thread.
4. the method according to claim 1, wherein the API sequence is truncated, and according to multithreading pair
API sequence after truncation is recompiled, and API multithreading sequence is obtained, comprising:
When calling the quantity of documents of API to be more than preset threshold value in a certain thread, then API sequence corresponding to the thread carries out
Truncation saves the API record of preset quantity, the API sequence being truncated;
According to multithreading, the API sequence of truncation is recompiled, obtains API multithreading sequence.
5. method according to any of claims 1-4, which is characterized in that by target detection model to the API
Multithreading sequence is tested and analyzed, and software detection result is obtained, comprising:
The API multithreading sequence is detected by target detection model, judges that the API multithreading sequence is corresponding soft
Whether number of packages evidence is malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
6. a kind of device of software detection characterized by comprising
Module is obtained, for obtaining software data to be detected;
Screening module, for filtering out system file thread from the software data;
Extraction module, for extracting the API sequence in the system file thread by sandbox;
Coding module carries out again the API sequence after truncation for the API sequence to be truncated, and according to multithreading
Coding, obtains API multithreading sequence;
Module is obtained, for testing and analyzing by target detection model to the API multithreading sequence, obtains software detection
As a result.
7. device according to claim 6, which is characterized in that passing through target detection model to the multi-thread program of the API
Column are tested and analyzed, before obtaining software detection result, further includes:
Obtain malware data;The malware data includes: infection type virus, trojan horse program, digs mine program, extorts disease
Poison;
From the malware data, the system file thread of Malware is filtered out;
The API sequence in the system file thread of the Malware is extracted by sandbox, obtains the API sequence of Malware;
The API sequence of the Malware is truncated, and the API sequence after truncation is compiled again according to multithreading
Code, obtains the API multithreading sequence of Malware;
Construct initial detecting model;The initial detecting model is the disaggregated model based on empty convolution sum TextCNN;
Using the logloss that more classifies as evaluation goal, by the API multithreading sequence of Malware to the initial detecting model into
Row iteration training, obtains the target detection model.
8. device according to claim 6, which is characterized in that extraction module is specifically used for:
By sandbox to the system file thread carry out Dynamic Execution, obtain file call API Name, API thread number,
The serial number of API Calls in API return value, thread.
9. device according to claim 6, which is characterized in that coding module is specifically used for:
When calling the quantity of documents of API to be more than preset threshold value in a certain thread, then API sequence corresponding to the thread carries out
Truncation saves the API record of preset quantity, the API sequence being truncated;
According to multithreading, the API sequence of truncation is recompiled, obtains API multithreading sequence.
10. the device according to any one of claim 6-9, which is characterized in that obtain module, be specifically used for:
The API multithreading sequence is detected by target detection model, judges that the API multithreading sequence is corresponding soft
Whether number of packages evidence is malware data;
If malware data, then the type label of malware data is exported;
If not malware data, then prompt software data safe.
11. a kind of system of software detection characterized by comprising memory and processor are stored with the place in memory
Manage the executable instruction of device;Wherein, the processor is configured to carry out perform claim requirement 1-5 via the execution executable instruction
Any one of described in software detection method.
12. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The method of software detection of any of claims 1-5 is realized when execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910353079.9A CN110135160B (en) | 2019-04-29 | 2019-04-29 | Software detection method, device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910353079.9A CN110135160B (en) | 2019-04-29 | 2019-04-29 | Software detection method, device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110135160A true CN110135160A (en) | 2019-08-16 |
CN110135160B CN110135160B (en) | 2021-11-30 |
Family
ID=67575625
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910353079.9A Active CN110135160B (en) | 2019-04-29 | 2019-04-29 | Software detection method, device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110135160B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111475808A (en) * | 2020-04-08 | 2020-07-31 | 苏州浪潮智能科技有限公司 | Software security analysis method, system, equipment and computer storage medium |
CN111797393A (en) * | 2020-06-23 | 2020-10-20 | 哈尔滨安天科技集团股份有限公司 | Detection method and device for malicious mining behavior based on GPU |
CN112000954A (en) * | 2020-08-25 | 2020-11-27 | 莫毓昌 | Malicious software detection method based on feature sequence mining and simplification |
CN112507330A (en) * | 2020-11-04 | 2021-03-16 | 北京航空航天大学 | Malicious software detection system based on distributed sandbox |
CN112528284A (en) * | 2020-12-18 | 2021-03-19 | 北京明略软件***有限公司 | Malicious program detection method and device, storage medium and electronic equipment |
CN113139187A (en) * | 2021-04-22 | 2021-07-20 | 北京启明星辰信息安全技术有限公司 | Method and device for generating and detecting pre-training language model |
CN113571199A (en) * | 2021-09-26 | 2021-10-29 | 成都健康医联信息产业有限公司 | Medical data classification and classification method, computer equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140115718A1 (en) * | 2012-10-18 | 2014-04-24 | Broadcom Corporation | Set Top Box Architecture With Application Based Security Definitions |
CN107609396A (en) * | 2017-09-22 | 2018-01-19 | 杭州安恒信息技术有限公司 | A kind of escape detection method based on sandbox virtual machine |
CN108133139A (en) * | 2017-11-28 | 2018-06-08 | 西安交通大学 | A kind of Android malicious application detecting system compared based on more running environment behaviors |
CN108376220A (en) * | 2018-02-01 | 2018-08-07 | 东巽科技(北京)有限公司 | A kind of malice sample program sorting technique and system based on deep learning |
CN108734012A (en) * | 2018-05-21 | 2018-11-02 | 上海戎磐网络科技有限公司 | Malware recognition methods, device and electronic equipment |
CN108830077A (en) * | 2018-06-14 | 2018-11-16 | 腾讯科技(深圳)有限公司 | A kind of script detection method, device and terminal |
CN108874658A (en) * | 2017-12-25 | 2018-11-23 | 北京安天网络安全技术有限公司 | A kind of sandbox analysis method, device, electronic equipment and storage medium |
CN109635523A (en) * | 2018-11-29 | 2019-04-16 | 北京奇虎科技有限公司 | Application program detection method, device and computer readable storage medium |
CN109657468A (en) * | 2018-11-29 | 2019-04-19 | 北京奇虎科技有限公司 | Virus behavior detection method, device and computer readable storage medium |
US20190121977A1 (en) * | 2017-10-19 | 2019-04-25 | AO Kaspersky Lab | System and method of detecting a malicious file |
-
2019
- 2019-04-29 CN CN201910353079.9A patent/CN110135160B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140115718A1 (en) * | 2012-10-18 | 2014-04-24 | Broadcom Corporation | Set Top Box Architecture With Application Based Security Definitions |
CN107609396A (en) * | 2017-09-22 | 2018-01-19 | 杭州安恒信息技术有限公司 | A kind of escape detection method based on sandbox virtual machine |
US20190121977A1 (en) * | 2017-10-19 | 2019-04-25 | AO Kaspersky Lab | System and method of detecting a malicious file |
CN108133139A (en) * | 2017-11-28 | 2018-06-08 | 西安交通大学 | A kind of Android malicious application detecting system compared based on more running environment behaviors |
CN108874658A (en) * | 2017-12-25 | 2018-11-23 | 北京安天网络安全技术有限公司 | A kind of sandbox analysis method, device, electronic equipment and storage medium |
CN108376220A (en) * | 2018-02-01 | 2018-08-07 | 东巽科技(北京)有限公司 | A kind of malice sample program sorting technique and system based on deep learning |
CN108734012A (en) * | 2018-05-21 | 2018-11-02 | 上海戎磐网络科技有限公司 | Malware recognition methods, device and electronic equipment |
CN108830077A (en) * | 2018-06-14 | 2018-11-16 | 腾讯科技(深圳)有限公司 | A kind of script detection method, device and terminal |
CN109635523A (en) * | 2018-11-29 | 2019-04-16 | 北京奇虎科技有限公司 | Application program detection method, device and computer readable storage medium |
CN109657468A (en) * | 2018-11-29 | 2019-04-19 | 北京奇虎科技有限公司 | Virus behavior detection method, device and computer readable storage medium |
Non-Patent Citations (2)
Title |
---|
OMAR INVERSO ET AL: "Lazy-CSeq: A Context-Bounded Model Checking Tool for Multi-threaded C-Programs", 《IEEE》 * |
芦效峰 等: "基于API序列特征和统计特征组合的恶意样本检测框架", 《清华大学学报(自然科学版)》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111475808A (en) * | 2020-04-08 | 2020-07-31 | 苏州浪潮智能科技有限公司 | Software security analysis method, system, equipment and computer storage medium |
CN111475808B (en) * | 2020-04-08 | 2022-07-08 | 苏州浪潮智能科技有限公司 | Software security analysis method, system, equipment and computer storage medium |
CN111797393A (en) * | 2020-06-23 | 2020-10-20 | 哈尔滨安天科技集团股份有限公司 | Detection method and device for malicious mining behavior based on GPU |
CN111797393B (en) * | 2020-06-23 | 2023-05-23 | 安天科技集团股份有限公司 | Method and device for detecting malicious mining behavior based on GPU |
CN112000954A (en) * | 2020-08-25 | 2020-11-27 | 莫毓昌 | Malicious software detection method based on feature sequence mining and simplification |
CN112000954B (en) * | 2020-08-25 | 2024-01-30 | 华侨大学 | Malicious software detection method based on feature sequence mining and simplification |
CN112507330A (en) * | 2020-11-04 | 2021-03-16 | 北京航空航天大学 | Malicious software detection system based on distributed sandbox |
CN112507330B (en) * | 2020-11-04 | 2022-06-28 | 北京航空航天大学 | Malicious software detection system based on distributed sandbox |
CN112528284A (en) * | 2020-12-18 | 2021-03-19 | 北京明略软件***有限公司 | Malicious program detection method and device, storage medium and electronic equipment |
CN113139187A (en) * | 2021-04-22 | 2021-07-20 | 北京启明星辰信息安全技术有限公司 | Method and device for generating and detecting pre-training language model |
CN113139187B (en) * | 2021-04-22 | 2023-12-19 | 北京启明星辰信息安全技术有限公司 | Method and device for generating and detecting pre-training language model |
CN113571199A (en) * | 2021-09-26 | 2021-10-29 | 成都健康医联信息产业有限公司 | Medical data classification and classification method, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110135160B (en) | 2021-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135160A (en) | The method, apparatus and system of software detection | |
Raff et al. | Malware detection by eating a whole exe | |
Li et al. | Deeppayload: Black-box backdoor attack on deep learning models through neural payload injection | |
CN110008703B (en) | System and method for statically detecting malicious software in container | |
CN109858239B (en) | Dynamic and static combined detection method for CPU vulnerability attack program in container | |
CN110287702B (en) | Binary vulnerability clone detection method and device | |
CN107590388A (en) | Malicious code detection method and device | |
CN111753290B (en) | Software type detection method and related equipment | |
US11106801B1 (en) | Utilizing orchestration and augmented vulnerability triage for software security testing | |
KR102151318B1 (en) | Method and apparatus for malicious detection based on heterogeneous information network | |
CN110618854B (en) | Virtual machine behavior analysis system based on deep learning and memory mirror image analysis | |
CN108491228A (en) | A kind of binary vulnerability Code Clones detection method and system | |
CN111931179A (en) | Cloud malicious program detection system and method based on deep learning | |
CN110837641A (en) | Malicious software detection method and detection system based on memory analysis | |
US20200335124A1 (en) | Neural representation of automated conversational agents (chatbots) | |
Alrabaee et al. | On leveraging coding habits for effective binary authorship attribution | |
CN111651768B (en) | Method and device for identifying link library function name of computer binary program | |
CN109240807A (en) | A kind of malicious program detection system and method based on VMI | |
CN114491523A (en) | Malicious software detection method and device, electronic equipment, medium and product | |
Lin et al. | Towards interpreting ML-based automated malware detection models: A survey | |
Grover et al. | Malware threat analysis of IoT devices using deep learning neural network methodologies | |
CN110414233A (en) | Malicious code detection method and device | |
Vahedi et al. | Cloud based malware detection through behavioral entropy | |
Waghmare et al. | A review on malware detection methods | |
CN113971282A (en) | AI model-based malicious application program detection method and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |