Summary of the invention
The purpose of the present invention is overcome in the prior art extract mathematical feature part when, caused by feature reservation excessively cause
Inefficiency the problem of.
To achieve the above object, on the one hand, the present invention provides a kind of detection method of distributed denial of service, this method
Include: the projection being iterated to training data, determines the first projector space.The test data received is projected to first and is thrown
Shadow space determines the projection of test data.According to the projection of test data at a distance from training data in the first projector space, really
Determine the safety of test data.The system resource of method provided in an embodiment of the present invention is few, can effective guarantee firewall with
Faster rate distinguishes the data source for initiating ddos attack.
In an optional implementation manner, above-mentioned " to the projection that training data is iterated, to determine the first projection sky
Between " the step of in may include: according to the maximized principle of relative entropy using projection function iterative method to training data carry out
Projection, obtains new projector space.When new projector space no longer deviates, new projector space is the first projector space.
In another optional implementation, above-mentioned " iterative method of projection function " may include: fixed point iteration method.
In another optional implementation, the group number that may include: each group of test data is fixed value;If receiving
To test data be greater than the group number of test data received, then randomly select the test data of fixed group number.
It is above-mentioned " according to the projection of test data and the instruction in the first projector space in another optional implementation
Practice data distance, determine the safety of test data " the step of in may include: estimating according to Euclidean distance, determine survey
Measure the safety of data.
It is above-mentioned " to the projection operation that training data is iterated, to determine first in another optional implementation
Before the step of projector space ", it can also include: to be screened to training data, determine the connection features of training data;Make to instruct
Practice data center, determines the mathematical feature of training data centralization.
In another optional implementation, above-mentioned " connection features of training data " may include in following at least
One: transmission control protocol TCP connection features and traffic statistics feature.Wherein, traffic statistics feature includes: the net of Intrusion Detection based on host
The network flow statistic feature of network traffic statistics feature and time.
It is above-mentioned " to make training data centralization, determine the number of training data centralization in another optional implementation
It may include: that training data is amplified using spherization operation in the step of feature ".
On the other hand, the present invention provides a kind of detection device of distributed denial of service, the apparatus may include: it calculates
Module, the projection for being iterated to training data, determines the first projector space.Projection module, the survey for will receive
Data projection is tried to the first projector space, determines the projection of test data;Processing module, for according to the projection of test data with
The distance of training data described in first projector space determines the safety of test data.
In an optional implementation manner, above-mentioned " computing module " specifically can be used for: maximized according to relative entropy
Principle projects training data using the iterative method of projection function, obtains new projector space.When new projector space not
When deviating again, new projector space is the first projector space.
In another optional implementation, above-mentioned " iterative method of projection function " may include: fixed point iteration method.
In another optional implementation, the group number that may include: each group of test data is fixed value;If receiving
To test data be greater than the group number of test data received, then randomly select the test data of fixed group number.
In another optional implementation, above-mentioned " processing module " specifically can be used for: according to the survey of Euclidean distance
Degree, determines the safety of measurement data.
In another optional implementation, above-mentioned apparatus can also include: selecting module, for training data into
Row screening, determines the connection features of training data;Make training data centralization, determines the mathematical feature of training data centralization.
In another optional implementation, above-mentioned " connection features of training data " may include in following at least
One: transmission control protocol TCP connection features and traffic statistics feature.Wherein, traffic statistics feature includes: the net of Intrusion Detection based on host
The network flow statistic feature of network traffic statistics feature and time.
In another optional implementation, above-mentioned " selecting module " specifically can be used for: be put using spherization operation
Big training data.
Specific embodiment
Below by drawings and examples, technical scheme of the present invention will be described in further detail.
Fig. 1 is a kind of flow chart of the detection method of distributed denial of service provided in an embodiment of the present invention, and this method needs
Want the attack traffic data of known type as training sample, with spherization operation.This method comprises:
As shown in Figure 1, this method includes S101-S107:
S101: input training data.
Specifically, the input of detection function includes one of following or a variety of: attack data set, the test number of training
The number of axle of projection and the form of iteration function can be controlled according to collection and setting apparatus, the setting apparatus.
Such as: when inputting training data to function entrance, it is desirable that the number of training data can neither be too many, so that occupying
More memory spaces drag slow processing speed, can not be very little, so that judgment accuracy is too low;In method provided in this embodiment,
For each certain types of attack, the number of training data is 5, in other embodiments, to every a kind of attack distribution
Training data number is also possible to other numbers between 5 to 10.
S102: screening the training data, determines the connection features of training data.
Specifically, which is screened, determines the connection features of training data, and extend the number near mean value
Statistical nature is protruded accordingly, and training data is then added to iteration queue.
The connection features of training data include at least one of following: transmission control protocol TCP connection features and flow
Statistical nature.Wherein, traffic statistics feature includes: the network flow statistic of host-based network traffic statistics feature and time
Feature, such as: the REJ packet quantity received in the stipulated time.
Such as: the data of garbled input include following four major class totally 15 groups:
The first kind: the connection essential characteristic of TCP may include: the connection duration, be continuous type, be single with the second
Position;The data word joint number exchanged between starting host and destination host, is discrete type, in seconds.
Second class: the connection content characteristic of TCP may include: time to access to the catalogue and file of system sensitive
Number, discrete value;Log in failure or successful ratio, successive value;Create the number of operations of file, discrete value.
Third class: traffic statistics feature related to time may include: in nearest 2 seconds and current connection has phase
The connection number of same destination host, successive value;In nearest 2 seconds, and current connection has the connection number of same services, continuously
Value;In nearest 2 seconds, and current connection has in the connection of same target or service, the percentage of SYN/REJ incorrect link occurs
Than successive value;In nearest 2 seconds, and current connection has in the connection of same target, occur with connection at present with identical or
The percentage of difference service connection, successive value;In nearest 2 seconds, and current connection has in the connection of same services, occur with
Mesh connection has the percentage of identical or different target connection, successive value.
4th class: traffic statistics feature related with destination host, may include: it is nearest 1000 connection in, and at present
Connect the connection number with same target, discrete value;In nearest 1000 connections, and connection has same target and identical at present
Or the connection percentage of different services, successive value;In nearest 1000 connections, and connection has same target and tool at present
There are the connection percentage of homologous or different source ports, successive value;In nearest 1000 connections, and connection at present is with identical
In the connection of target, there is the percentage of SYN or REJ incorrect link, successive value;In nearest 1000 connections, and connection at present
In connection with same target and same services, there is the percentage of SYN or REJ incorrect link, successive value.
S103: making training data centralization, determines the mathematical feature of training data centralization.
Specifically, make training data centralization, and amplified by inequality proportion, to protrude the mathematics spy of its immediate vicinity
Sign.Wherein, should specially can be amplified using spherization operation by inequality proportion amplification, such as: use spherization operation
Amplify the data near 0 value, the feature for avoiding the sparse place of remote data is excessively acquired, so that squeezing the data near 0
Feature is showed.Method provided in an embodiment of the present invention is studied the data characteristics near 0 value, so needing to use
The spherization method of data makes the data close to mean value be expanded, and the data characteristics of the inside is allowed to show.This is spherization
Operation the radius of a ball be with the positively related moderate value of data variance, in the present embodiment, the radius of a ball is 1, other implementation
In example, it can also be adjusted according to variance size.Specifically, referring to shown in Fig. 2 and Fig. 3, before Fig. 2 is no variation, Fig. 3
For by the schematic diagram after spherization operation.
S104: the projection being iterated to the training data determines the first projector space.
Specifically, training data is projected using the iterative method of projection function according to the maximized principle of relative entropy,
Obtain a new projector space.Too low projecting direction is estimated ignoring, and is reduced the dimension of projector space, is judged projector space
It whether is no longer to deviate.The maximized direction of the relative entropy refers to the smallest direction of data association information, the present embodiment provides
Method in, classify according to the direction, may make that the degree of correlation between the data projected is minimum, in other embodiments
In, axis of projection can also be taken different directions according to the requirement different to the degree of correlation.
When new projector space no longer deviates, new projector space is the first projector space;When new projector space again
When secondary offset, repetitive operation S104, until the projector space no longer deviates.
The iterative method of above-mentioned projection function may include: fixed point iteration method.Specifically the iteration of projection function is selected
Fixed point iteration method, in order to keep fitting result more preferable, iteration function in addition to meet non-linear and convergence, also to meet as far as possible and
The similitude of initial data distribution.Iteration function is a function close with the Probability Characteristics of data, in the present embodiment
In, it is compared through making difference to different function acquired results, is chosen to be g (x)=xe-x, in other embodiments, table also can be selected
Now preferably have the function of heavy-tailed property.
Training data is projected using the iterative method of projection function according to the maximized principle of relative entropy, each round is all
Judge whether the first projector space has the possibility for further deviating and compressing, ginseng of such repeated compression until the first projector space
Number is stablized in a suitable value, wherein includes enough data characteristicses in the space.
In addition, the relative entropy between any two direction is all as big as possible in the first projector space of selection, so that base
Annoyance level is minimum between any two for signal, can also reduce the dimension of projector space as far as possible without losing information.
S105: the test data received is projected to the projection that test data is determined to the first projector space.
Specifically, the group number of each group of test data is fixed value;If the test data received is greater than the survey received
The group number for trying data, then randomly select the test data of fixed group number, forges secure data in specific position to prevent attacker.
S106: according to the projection of test data at a distance from training data in the first projector space, test data is determined
Safety.
Specifically, according in the projection of test data and the first projector space training data (being properly termed as base vector) away from
From to determine whether safety, if training data more than one set, test vector needs to be constituted from all training vectors super vertical
Cube is all remote enough just to calculate safety.
Furthermore it is also possible to which estimating according to Euclidean distance, determines the safety of measurement data.Embodiment provided by the invention
Middle use in projector space Euclidean distance estimate rather than the size of angle judges the risk of data, slightly increase
Calculation amount, but the erroneous judgement to the impulse flow in same direction is avoided, it is misjudged as safety.Euclidean distance close enough is
One as far as possible can distinguish risk data and reduce the distance of False Rate, in the present embodiment, the threshold size set as
1.0500, when being less than threshold value, then test data is judged to be safe, when being greater than threshold value, then judges test data for danger
's.In other embodiments, it is also adjusted according to specific attack type and network environment.
S107: new test request is determined whether.
Specifically, if there is new test request, S105 is returned, is repeated, until not new test request.
If not new test request, terminates.
Fig. 4 is a kind of structural schematic diagram of the detection device of distributed denial of service provided in an embodiment of the present invention.Such as Fig. 4
Shown, the apparatus may include computing modules 401, the projection for being iterated to training data, determine first projection sky
Between.Projection module 402, for the test data received to be projected to the projection for determining test data to the first projector space;Place
Manage module 403, for the projection according to test data with the first projector space described at a distance from training data, determine and test number
According to safety.
Above-mentioned computing module 401 specifically can be used for: the iteration of projection function is used according to the maximized principle of relative entropy
Method projects training data, obtains new projector space.When new projector space no longer deviates, new projector space is
First projector space.
Wherein, the iterative method of projection function may include: fixed point iteration method.The group number of each group of test data is to fix
Value;If the test data received is greater than the group number of the test data received, the test data of fixed group number is randomly selected.
Above-mentioned processing module 403 specifically can be used for: according to estimating for Euclidean distance, determine the safety of measurement data.
Above-mentioned apparatus can also include: that selecting module 404 for screening to training data determines training data
Connection features;Make training data centralization, determines the mathematical feature of training data centralization.
Wherein, the connection features of training data may include at least one of following: transmission control protocol TCP connection is special
It seeks peace traffic statistics feature.Wherein, traffic statistics feature includes: the network of host-based network traffic statistics feature and time
Traffic statistics feature.
Selecting module 404 specifically can be used for: amplify training data using spherization operation.
The system resource of method provided in an embodiment of the present invention is few, can effective guarantee firewall distinguished with faster rate
Not Fa Qi ddos attack data source.This method innovative point includes: spherization processing average point nearby data and upright projection
Obtain feature and iteration convergence.This method needs a certain number of attack data as training set, first carries out at centralization to it
The characteristics of managing, concentrating on zero then for data characteristics carries out spheroidising to data and extends the feature of low discharge part.
Next according to the maximized principle of relative entropy, more features is obtained using upright projection, selects iteration function similar in feature
The data are iterated, iteration obtains a new lower dimensional space about the training data after stablizing, according to test number
According to the position inside the space determine its whether risk data.This method the number of iterations is few, and the speed of service is fast, attacks in KDD99
It hits on data set and achieves 90% or more correct judgement rate, have clear improvement compared to general PCA dimension-reduction treatment.
Those of ordinary skill in the art should further appreciate that, describe in conjunction with the embodiments described herein
Each exemplary unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clear
Illustrate to Chu the interchangeability of hardware and software, generally describes each exemplary group according to function in the above description
At and step.These functions hold track actually with hardware or software mode, depending on technical solution specific application and set
Count constraint condition.Those of ordinary skill in the art can realize each specific application using distinct methods described
Function, but this realization is it is not considered that exceed scope of the present application.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can hold track with hardware, processor
Software module or the combination of the two implement.Software module can be placed in random access memory (RAM), memory, read-only storage
Device (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology neck
In any other form of storage medium well known in domain.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.