Disclosure of Invention
The embodiment of the application provides an abnormal access detection method and device, which are used for detecting a URL with abnormal access.
The embodiment of the application provides an abnormal access detection method, which comprises the following steps:
determining a bypass rate of a first Uniform Resource Locator (URL) according to the out-degree and the in-degree of the first URL; wherein the out-degree of the first URL refers to the number of visits from the first URL to a downstream URL of the first URL, and the in-degree of the first URL refers to the number of visits from an upstream URL of the first URL to the first URL; the bypass rate of the first URL reflects the condition that a downstream URL of the first URL is accessed directly without passing through the first URL;
if the bypass rate of the first URL is larger than a set bypass rate threshold, judging whether the downstream URL of the first URL is a URL accessed by normal polling or not; if the downstream URL is the URL accessed by normal polling, determining that the downstream URL has no abnormal access, and if the downstream URL is not the URL accessed by normal polling, determining that the downstream URL has abnormal access.
Optionally, the determining whether the downstream URL of the first URL is a URL visited by normal polling includes:
determining an average access time interval corresponding to the downstream URL according to access time intervals corresponding to a plurality of internet protocol IP addresses accessing the downstream URL respectively;
if the average access time interval is smaller than the set duration, determining that the downstream URL is not the URL accessed by normal polling;
if the average access time interval is greater than or equal to a set duration, determining standard deviations of access time intervals corresponding to the multiple IP addresses respectively; if the standard deviation is larger than a set standard deviation threshold value, determining that the downstream URL is not a URL accessed by normal polling; and if the standard deviation is less than or equal to a set standard deviation threshold value, determining that the downstream URL is a URL accessed by normal polling.
Optionally, the access time interval corresponding to each IP address in the plurality of IP addresses is determined according to the following steps:
and for each IP address in the plurality of IP addresses, determining the mode in the plurality of access time intervals corresponding to the IP address as the access time interval corresponding to the IP address.
Optionally, selecting a plurality of IP addresses for accessing the downstream URL according to:
and selecting the IP addresses with the access times larger than a first threshold value and smaller than a second threshold value from the recorded IP addresses for accessing the downstream URL.
Optionally, the bypass rate μ of the first URL is determined according to the following formula:
μ=(λ1-λ2)/λ2
wherein λ 1 is the out degree of the first URL, and λ 2 is the in degree of the first URL.
An embodiment of the present application provides an abnormal access detection apparatus, including:
the bypass rate determining module is used for determining the bypass rate of the first Uniform Resource Locator (URL) according to the out-degree and the in-degree of the first URL; wherein the out-degree of the first URL refers to the number of visits from the first URL to a downstream URL of the first URL, and the in-degree of the first URL refers to the number of visits from an upstream URL of the first URL to the first URL; the bypass rate of the first URL reflects the condition that a downstream URL of the first URL is accessed directly without passing through the first URL;
the abnormal access determining module is used for judging whether the downstream URL of the first URL is a URL accessed by normal polling or not if the bypass rate of the first URL is greater than a set bypass rate threshold value; if the downstream URL is the URL accessed by normal polling, determining that the downstream URL has no abnormal access, and if the downstream URL is not the URL accessed by normal polling, determining that the downstream URL has abnormal access.
According to the output degree and the input degree of the first URL, determining the bypass rate of the first URL; if the bypass rate of the first URL is larger than a set bypass rate threshold, judging whether the downstream URL of the first URL is a URL accessed by normal polling or not; if the downstream URL is the URL accessed by normal polling, determining that the downstream URL has no abnormal access, and if the downstream URL is not the URL accessed by normal polling, determining that the downstream URL has abnormal access. In the embodiment of the application, if the bypass rate of a URL is greater than the set bypass rate threshold, and the downstream URL of the URL is not the URL accessed by normal polling, it is indicated that the downstream URL has abnormal access. According to the method and the device, for malicious users, normal business logic is bypassed, the URLs accessed at a low speed by using a plurality of IP addresses are used, or the URLs accessed by replacing the IP addresses after being accessed for hundreds of times at a high speed are bypassed, so that the URLs can be effectively detected.
Detailed Description
Assume that a traffic link contains three URLs: A. b, C, the link call order is: a → B → C. Since the service link access may fail to complete due to failure, active exit of the user, and the like, the number of calls of a is greater than or equal to that of B and greater than that of C in general. Assuming that there is a call for B ≧ A ≧ C, then B, this heavily visited URL, may be bypassed normal business logic that can only enter A. Based on this, for a service link, the embodiment of the present application first determines whether there is a possibility of abnormal access to a downstream URL of the URL based on the bypass rate of the URL: if the bypass rate of one URL is larger than the set bypass rate threshold value, the downstream URL of the URL may have abnormal access, and at this time, whether the downstream URL is the URL accessed by normal polling is further judged, and if not, the downstream URL is indicated to have abnormal access. According to the method and the device, the malicious user bypasses normal service logic, the URLs accessed slowly by using the plurality of IP addresses or the URLs accessed by replacing the IP addresses after accessing for hundreds of times quickly can be effectively detected.
The embodiments of the present application will be described in further detail with reference to the drawings attached hereto.
As shown in fig. 1, a flowchart of an abnormal access detection method provided in the embodiment of the present application includes the following steps:
s101: and determining the bypass rate of the first URL according to the out-degree and the in-degree of the first URL.
In the specific implementation, for a URL in a service link, the number of visits from the URL to a downstream URL of the URL is defined as the out-degree of the URL, the number of visits from an upstream URL of the URL to the URL is defined as the in-degree of the URL, and the bypass rate of the URL is corresponding to the condition that the downstream URL of the URL is directly visited without passing through the URL.
As shown in fig. 2, in a traffic link, only the URL with the downstream URL without the upstream URL is the start URL (e.g., URL1 in fig. 2), the URL with both the upstream URL and the downstream URL is the middle URL (e.g., URL2 in fig. 2), and the URL with only the upstream URL without the downstream URL is the leaf URL (e.g., URL3 in fig. 2). The out-degree of the upstream URL of a URL is the in-degree of the URL, and the in-degree of the downstream URL is the out-degree of the URL. The out-degree of the start URL is greater than 0 and the in-degree is 0, as shown in FIG. 2 where URL1 has an out-degree of 1000 and an in-degree of 0. The in-degree of the leaf URL is greater than 0, and the out-degree is 0, such as 100000 of the URL3 in fig. 2. Both the in-degree and out-degree of the middle URL are greater than 0, such as the in-degree of URL2 of fig. 2 is 1000 and the out-degree is 100000.
In particular implementations, a bypass rate of a URL is determined based on the out-degree and in-degree of the URL, where the bypass rate reflects a situation where a downstream URL of the URL is accessed directly without passing through the URL, and thus the bypass rate may be determined based on a difference between the out-degree and in-degree of the URL. For example, the bypass rate may be represented as a ratio of out-degree to in-degree, in which case the bypass rate of the start URL is infinity and the bypass rate of the leaf URL is 0. For another example, the bypass rate may be represented as a ratio between a difference between the out-degree and the in-degree, that is, the bypass rate μ of the URL conforms to the formula μ ═ λ 1- λ 2)/λ 2, where λ 1 is the out-degree of the URL and λ 2 is the in-degree of the URL, and at this time, the bypass rate of the start URL is infinity and the bypass rate of the leaf URL is-1.
In this embodiment, the first URL is an intermediate URL, and the URL downstream of the first URL may be the intermediate URL or a leaf URL.
S102: comparing the bypass rate of the first URL with a set bypass rate threshold; if the bypass rate of the first URL is less than or equal to the set bypass rate threshold, step S104 is performed, that is, it is determined that there is no abnormal access to the downstream URL of the first URL.
In particular implementations, for a URL with a low bypass rate, it may be assumed that there is no anomalous access for URLs downstream of the URL.
S103: if the bypass rate of the first URL is larger than a set bypass rate threshold, judging whether the downstream URL of the first URL is a URL accessed by normal polling or not; if the downstream URL is a URL that is normally polled for access, the process proceeds to S104, that is, it is determined that the downstream URL has no abnormal access, and if the downstream URL is not a URL that is normally polled for access, the process proceeds to S105, that is, it is determined that the downstream URL has abnormal access.
In an implementation, if the bypass rate of the first URL is greater than a set bypass rate threshold, there are two cases:
1. downstream URLs of the first URL are heavily visited bypassing normal business logic, i.e., there is an abnormal visit.
2. The downstream URL of the first URL is a URL that is accessed by normal polling.
Here, for the first URL with a high bypass rate, if the downstream URL of the first URL is a URL that is normally polled to access, the direct access to the downstream URL that bypasses the first URL also belongs to the normal access. For example, if the user stays on a certain page (e.g., an inbox page of a mailbox) displaying unread messages, the system refreshes the unread messages every 5s, and it is obvious that the auto-polling access in this case belongs to normal access.
Since the visit interval of a URL that is normally polled for access is typically system-set, such a URL typically has one distinct characteristic: the access time interval of different IP addresses is relatively fixed. For example, the system reading unread messages is generally read at regular time intervals. Therefore, whether the URL is accessed by normal polling can be judged according to the time interval of accessing the URL by different IP addresses.
As an embodiment, the specific implementation steps of determining whether the downstream URL of the first URL is a URL visited by normal polling include:
s103 a: and determining the average access time interval corresponding to the downstream URL according to the access time intervals corresponding to the plurality of IP addresses of the downstream URL accessing the first URL.
In actual implementation, there are cases where multiple users share the same IP address within an enterprise, and in these cases, the number of times a URL is accessed by the same IP address is usually very large (usually more than 1000 times); in addition, the IP address having a smaller number of accesses contributes less to the recognition of the polling access. Based on this, in order to further improve the recognition rate of normal polling access, the embodiments of the present application filter the IP addresses that access the downstream URL, specifically, select a plurality of IP addresses whose access times are greater than a first threshold (for example, 10 times) and less than a second threshold (for example, 1000 times) from the recorded IP addresses that access the downstream URL, and temporarily use users using these IP addresses as normal personal users; and then judging whether the downstream URL is a URL accessed by normal polling or not based on the plurality of selected IP addresses.
In a specific implementation, for each screened IP address, a mode of a plurality of access time intervals corresponding to the IP address may be determined as the access time interval corresponding to the IP address. The mode here means that, among a plurality of access time intervals of the recorded IP address, an access time interval with the largest number of occurrences, and if there are a plurality of access time intervals with the largest number of occurrences, one of the access time intervals may be selected as the access time interval corresponding to the IP address.
S103 b: comparing the average access time interval with a set duration; if the average visit time interval is less than the set time length, the process proceeds to S103e, i.e., it is determined that the downstream URL is not a URL visited by normal polling.
In a specific implementation, it may be possible that a malicious user accesses the downstream URL hundreds of times (after accessing hundreds of times intensively, the malicious user is intercepted or replaces the IP address to continue accessing) in a short time (for example, 1s) by using the same or different IP address, and if the access time interval is recorded to be only 1s, the recorded access time interval of the IP address is 0. In this case, the recorded access time interval for accessing the IP address of the downstream URL is relatively fixed (both are 0), but obviously does not belong to the polling access. Therefore, in the embodiment of the present application, URLs with average access time intervals of a plurality of corresponding IP addresses being smaller than a set time length (for example, 1s) are directly classified as URLs with abnormal access.
S103 c: and if the average access time interval is greater than or equal to the set duration, determining the standard deviation of the access time intervals corresponding to the IP addresses respectively.
Here, the standard deviation σ of the access time interval corresponding to each of the selected plurality of IP addresses satisfies the following equation:
wherein x isiAnd μ is the access time interval of the ith IP address, μ is the average access time interval corresponding to the selected multiple IP addresses, and N is the number of the selected multiple IP addresses.
S103 d: comparing the standard deviation with a set standard deviation threshold, and if the standard deviation is greater than the set standard deviation threshold (for example, 1000), proceeding to S103e, i.e., determining that the downstream URL is not a URL accessed by normal polling. If the standard deviation is less than or equal to the set standard deviation threshold, the process proceeds to S103f, i.e., it is determined that the downstream URL is a URL that is accessed by normal polling.
Based on the same inventive concept, the embodiment of the present application further provides an abnormal access detection apparatus corresponding to the abnormal access detection method, and as the principle of the apparatus for solving the problem is similar to that of the abnormal access detection method in the embodiment of the present application, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 3, a schematic structural diagram of an abnormal access detection apparatus provided in the embodiment of the present application includes:
a bypass rate determining module 31, configured to determine a bypass rate of a first uniform resource locator URL according to an out-degree and an in-degree of the first URL; wherein the out-degree of the first URL refers to the number of visits from the first URL to a downstream URL of the first URL, and the in-degree of the first URL refers to the number of visits from an upstream URL of the first URL to the first URL; the bypass rate of the first URL reflects a case where a downstream URL of the first URL is accessed directly without passing through the first URL.
An abnormal access determining module 32, configured to determine whether a downstream URL of the first URL is a URL that is normally polled to access if a bypass rate of the first URL is greater than a set bypass rate threshold; if the downstream URL is the URL accessed by normal polling, determining that the downstream URL has no abnormal access, and if the downstream URL is not the URL accessed by normal polling, determining that the downstream URL has abnormal access.
Optionally, the abnormal access determining module 32 is specifically configured to:
determining an average access time interval corresponding to the downstream URL according to access time intervals corresponding to a plurality of internet protocol IP addresses accessing the downstream URL respectively; if the average access time interval is smaller than the set duration, determining that the downstream URL is not the URL accessed by normal polling; if the average access time interval is greater than or equal to a set duration, determining standard deviations of access time intervals corresponding to the multiple IP addresses respectively; if the standard deviation is larger than a set standard deviation threshold value, determining that the downstream URL is not a URL accessed by normal polling; and if the standard deviation is less than or equal to a set standard deviation threshold value, determining that the downstream URL is a URL accessed by normal polling.
Optionally, the abnormal access determining module 32 is specifically configured to determine an access time interval corresponding to each IP address of the multiple IP addresses according to the following steps:
and for each IP address in the plurality of IP addresses, determining the mode in the plurality of access time intervals corresponding to the IP address as the access time interval corresponding to the IP address.
Optionally, the abnormal access determining module 32 is specifically configured to select multiple IP addresses for accessing the downstream URL according to the following steps:
and selecting the IP addresses with the access times larger than a first threshold value and smaller than a second threshold value from the recorded IP addresses for accessing the downstream URL.
Optionally, the bypass rate determining module 31 is specifically configured to determine the bypass rate μ of the first URL according to the following formula:
μ=(λ1-λ2)/λ2
wherein λ 1 is the out degree of the first URL, and λ 2 is the in degree of the first URL.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.