WO2020258670A1 - Network access abnormality determination method and apparatus, server, and storage medium - Google Patents

Network access abnormality determination method and apparatus, server, and storage medium Download PDF

Info

Publication number
WO2020258670A1
WO2020258670A1 PCT/CN2019/118378 CN2019118378W WO2020258670A1 WO 2020258670 A1 WO2020258670 A1 WO 2020258670A1 CN 2019118378 W CN2019118378 W CN 2019118378W WO 2020258670 A1 WO2020258670 A1 WO 2020258670A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
chi
value
terminal device
characteristic
Prior art date
Application number
PCT/CN2019/118378
Other languages
French (fr)
Chinese (zh)
Inventor
黎立桂
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020258670A1 publication Critical patent/WO2020258670A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Definitions

  • This application relates to the technical field of security detection. Specifically, this application relates to a method, device, server, and storage medium for determining abnormality of network access.
  • the current method is to collect data such as click time and mouse drag trajectory during the user verification process to identify the user type based on this behavior data.
  • This type of method has a high error rate and is easy to identify a real user as an abnormal user. The accuracy is low.
  • this application provides a method for determining abnormality of network access, which includes the following steps:
  • the network access is abnormal access
  • the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination
  • the steps of the feature set include:
  • the feature value of the effective derived feature information for identifying outliers is obtained.
  • this application also provides an abnormality determination device for network access, which includes:
  • the network access is abnormal access
  • the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination
  • the steps of the feature set include:
  • the feature value of the effective derived feature information for identifying outliers is obtained.
  • this application also provides a server, which includes:
  • One or more processors are One or more processors;
  • One or more computer readable instructions are provided.
  • the one or more computer-readable instructions are stored in the memory and configured to be executed by the one or more processors, and the one or more computer-readable instructions implement the following steps when executed:
  • the network access is abnormal access
  • the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination
  • the steps of the feature set include:
  • the feature value of the effective derived feature information for identifying outliers is obtained.
  • Computer-readable instructions which, when executed by a processor, implement the following steps:
  • the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination
  • the steps of the feature set include:
  • the feature value of the effective derived feature information for identifying outliers is obtained.
  • the technical solution provided in this application uses a detection algorithm using chi-square statistics, and compares the value of the judgment basis with the critical value to obtain the corresponding judgment result, and does not need to mark the characteristic information data of the terminal device to initiate network access This saves the workload of post-statistics and analysis; moreover, the analysis process of the scheme is simple, the result is intuitive, and the judgment result with higher accuracy can be easily obtained, and finally the effect of abnormal judgment of the network access of the terminal device is improved.
  • FIG. 1 is an application environment diagram of an abnormality determination scheme for executing network access in an embodiment of the present application
  • Fig. 2 is a flowchart of a method for determining abnormality of network access according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an abnormality determination device for network access according to an embodiment of the application.
  • FIG. 4 is a schematic structural diagram of a server according to an embodiment of the application.
  • terminal and “terminal equipment” used herein include both wireless signal receiver equipment, equipment that only has wireless signal receivers without transmitting capability, and equipment receiving and transmitting hardware.
  • Such equipment may include: cellular or other communication equipment, which has a single-line display or multi-line display or cellular or other communication equipment without a multi-line display; PCS (Personal Communications Service), which can combine voice and data Processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notepads, calendars and/or GPS (Global Positioning System (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device, which has and/or includes a radio frequency receiver, a conventional laptop and/or palmtop computer or other device.
  • PCS Personal Communications Service
  • PDA Personal Digital Assistant
  • terminal and terminal equipment used here may be portable, transportable, installed in vehicles (aviation, sea and/or land), or suitable and/or configured to operate locally, and/or In a distributed form, it runs on the earth and/or any other location in space.
  • the "terminal” and “terminal device” used here can also be communication terminals, internet terminals, music/video playback terminals, such as PDA, MID (Mobile Internet Device, mobile Internet device) and/or music/video playback Functional mobile phones can also be devices such as smart TVs and set-top boxes.
  • the remote network device used here includes but is not limited to a computer, a network host, a single network server, a set of multiple network servers, or a cloud composed of multiple servers.
  • the cloud is composed of a large number of computers or network servers based on Cloud Computing.
  • Cloud computing is a type of distributed computing, a super virtual computer composed of a group of loosely coupled computer sets.
  • the remote network equipment, terminal equipment and WNS server can communicate through any communication method, including but not limited to mobile communication based on 3GPP, LTE, WIMAX, and TCP/IP, UDP protocol-based mobile communications. Computer network communication and short-range wireless transmission based on Bluetooth and infrared transmission standards.
  • FIG. 1 is an application environment diagram of an abnormality determination solution for network access in an embodiment of the present application; in this embodiment, the technical solution of the present application can be implemented on a server.
  • terminal devices 110 and 120 The server 130 can be accessed through the internet.
  • the terminal device 110 and/or 120 sends a network request to the server 130, and the server 130 performs data interaction according to the network request.
  • the server 130 obtains the access data and attribute data of the terminal device 110 and/or 120 according to the request information of the terminal device 110 and/or 120, and determines abnormality of the terminal device according to the data.
  • FIG. 2 is a flowchart of a method for determining abnormality of network access according to an embodiment. The method includes the following steps:
  • S210 In response to a network access request sent by a terminal device, use a script to obtain multiple non-linear feature values of the terminal device, and form a non-linear combination feature set.
  • the server When the server interacts with the terminal device, it obtains the relevant parameters of the terminal device according to the network request sent by the terminal device.
  • the user sends registration and verification requests, and the front end uses JavaScript scripts to obtain the relevant feature values of the terminal device, including device type (IPone, Mac, Andriod), system information (OS type, version, resolution), IP
  • device type IPone, Mac, Andriod
  • OS type OS type, version, resolution
  • IP IP
  • the relationship between the characteristic values of the characteristic values is non-linear with each other.
  • the multiple feature values form a non-linear combined feature set regarding the network access request of the terminal device.
  • the feature value may specifically include the feature value user_agent, resolution, pixel ratio, and touch screen event (the maximum number of touch screen events supported, whether touch is supported or not) of the device acquired through the front end.
  • the feature value user_agent By parsing the string information in user_agent to obtain the device type, brand, model, and operating system version number, the brand and model of the terminal device currently issuing the network access request through the above analysis is associated with the same device brand and model in the basic library Resolution, pixel ratio, touch screen event information.
  • the basic library is the real information about the resolution, pixel ratio, and touch screen events of all device models obtained through authoritative websites.
  • the feature information values in each feature set are standardized.
  • the feature set of each access record obtained may include a variable with a percentile system and a variable with a value of 5 points. Only when all the data are standardized can they be compared in the same standard.
  • S220 Calculate the average value of all the corresponding characteristic values in the set time period for the characteristic values of the nonlinear combination characteristic set, and calculate the chi-square statistics of the characteristic values of the terminal device.
  • step S210 according to the nonlinear combination feature set obtained from step S210, corresponding feature values are extracted correspondingly, and the chi-square corresponding to the feature value is calculated according to other associated values obtained from the feature value. Statistics are used to obtain the feature value distribution formed when the terminal device initiates a network access request.
  • the average value of all corresponding characteristic values in a set time period may be further calculated, and the chi-square statistic of the characteristic value of the terminal device may be calculated.
  • the terminal device when the terminal device initiates a network access request, within a certain continuous period of time, under normal circumstances, it generally sends multiple network access requests. Each time the terminal device initiates a network access request, the formed set of characteristic values forms a set of the nonlinear combination characteristic set. In order to be able to judge abnormal access based on the feature values of several items, corresponding feature values are extracted from the non-linear combined feature set as needed.
  • the average value of all corresponding characteristic values formed when the terminal device initiates a network access request within the above set time period is calculated, and the chi-square statistics of the characteristic value of the terminal device are calculated .
  • o i is the value of the i-th characteristic formed when the terminal device initiates a network access request at a certain detection time point
  • E i is the value formed by the network access request initiated by the terminal device within a set time period
  • the average value of all the i-th features, n is the total number of all the i-th features formed by the network access request initiated by the terminal device within a set time period.
  • the chi-square statistic calculated according to the characteristic value formed when the terminal device initiates the network access request reflects the distribution of the characteristic value corresponding to the detection time point. According to the respective conditions of the characteristic values, an outlier corresponding to the characteristic value is obtained, and according to the position of the outlier, a determination result of the corresponding network access as an abnormal access is obtained.
  • the method for determining abnormality of network access obtains a plurality of non-linear characteristic values through a network access request initiated by the terminal device, and obtains the corresponding chi-square statistics according to the extreme characteristic values mentioned above, and uses the chi-square statistics In the case where the corresponding characteristic value is obtained as an outlier, the determination result that the current network access of the terminal device is abnormal access is obtained.
  • the technical solution of the present application obtains the network access request initiated by the terminal device as the characteristic value corresponding to the outlier, avoiding the use record generated by the user using the terminal device in the prior art, such as clicking and dragging in the user authentication process
  • the data of the trajectory is used as the basis for abnormal judgment, which causes the problem of easily identifying real users as abnormal users, and more accurately reflects the current status of network access requests initiated by terminal devices to the server, and compares with simpler and more intuitive data
  • the way to obtain the judgment result of abnormal access is helpful to improve the efficiency of abnormal judgment of terminal equipment network access.
  • the average value of all corresponding characteristic values formed when the terminal device initiates a network access request within the above set time period is calculated to obtain the chi-square statistics of the characteristic value of the terminal device.
  • the steps may further include:
  • A221. Calculate a first average value of the necessary characteristic value according to a specific necessary characteristic value of the nonlinear combination characteristic set for each network access request within a set time period;
  • one of the characteristic values is used to determine the abnormality of the network access of the terminal device.
  • a specific necessary feature value is extracted from the non-linear combination feature set formed within a set time period, and a specific necessary feature value is generated for each network access request within the set time period Calculate the first average of the values of the same essential feature.
  • the necessary characteristic value is a necessary characteristic value generated by the terminal device initiating a network access request, and corresponding real information can be found through the basic database for subsequent reference.
  • Information that can be measured in numerical form such as resolution, pixel ratio, and the total number of logical processors available to the user agent by the system.
  • the first chi-square statistic of the necessary characteristic value is calculated.
  • step S230 that is, before determining the abnormal access of the network access, the method further includes:
  • the upper limit value of the interquartile range of the first chi-square statistic is calculated as the first judgment threshold corresponding to the characteristic value.
  • the first determination threshold may also be a critical value obtained based on historical data obtained for the characteristic value, and a corresponding first determination threshold may be set according to the critical value.
  • step S220 may form another embodiment, which further includes:
  • a second chi-square statistic of the feature value of the non-linear combination feature set of the terminal device is calculated.
  • a plurality of feature values, or even all feature values of the non-linear combination feature set are used as a whole to determine abnormal access to the network access of the terminal device.
  • a plurality of feature values are extracted from the non-linear combination feature set formed by each network access request initiated by the terminal device. If it is confirmed whether the IP address of the terminal device is an IDC (Internet Data Center, Internet Data Center) or a computer room server, then the characteristic values of the user agent, resolution, pixel ratio, and touch screen events can be checked. As a value of a dimension, the dimension vectors corresponding to the multiple feature values are obtained for evaluation. And use the dimension vector as the feature value of the nonlinear combination feature set.
  • IDC Internet Data Center, Internet Data Center
  • the corresponding second average value is obtained.
  • the above formula (1) is used to calculate the second chi-square statistic of the non-linear combination feature set of the terminal device , Get the data distribution about the characteristic value of the IP address.
  • step S230 that is, before determining the abnormal access of the network access, the method further includes:
  • a curve formed according to the second chi-square statistic corresponding to all network access requests within a set time period, and the second chi-square statistic corresponding to the maximum slope of the curve is obtained as the second judgment threshold.
  • the second judgment threshold may also be a critical value obtained according to historical data obtained with respect to the characteristic values of the nonlinear combination feature set, and a corresponding second judgment threshold is set according to the critical value.
  • step S230 it further includes:
  • the first or second chi-square statistic is compared with the corresponding first or second judgment threshold. According to the result of the comparison, it is determined whether the characteristic value of the corresponding nonlinear combination characteristic set is an outlier, so as to determine whether the corresponding network access is an abnormal access. If the first or second chi-square statistic is greater than the corresponding first or second judgment threshold, and the corresponding feature value is an outlier, it is determined that the network access that generates the corresponding feature value is an abnormal access.
  • the server directly rejects the request or re-requires the terminal device to perform access verification; if the network request currently initiated by the terminal device is determined to be a normal access request, then Respond directly to requests.
  • the judgment threshold mentioned in the above embodiment is a critical point representing the relevant feature value formed when the terminal device initiates a normal network access request.
  • the chi-square statistic is compared with the judgment threshold, and if the chi-square statistic exceeds the range of the judgment threshold, the characteristic value corresponding to the chi-square statistic is determined in the case of abnormal access. It is formed to obtain a determination result of whether the network access initiated by the terminal device at the corresponding detection time point is an abnormal access.
  • the non-linear combination feature set mentioned above can also include feature information of the original category and valid derived feature information.
  • the feature information of the original category is first obtained by using the script, and then the feature information of the original category is measured and distributed. Calculate effective derived feature information to identify outliers.
  • the non-linear combination feature set may include browser language, pixel ratio, color depth, whether audio stack fingerprints are provided, parameter information of audio stack fingerprints, the total number of logical processors available to the user agent by the system, and the browser manufacturer Whether it is other, whether the operating system manufacturer is other, and whether the browser type is the feature information of the original category such as robot.
  • effective derivative features for identifying outliers can be obtained, including whether AdBlock is installed, whether the user has tampered with the language, whether the user has tampered with the screen resolution, whether the user has tampered with the operating system, and browser manufacturer , Operating system manufacturer, access device type, operating system family.
  • the metric data distribution calculation includes calculating the range, the quartile, the quartile range, and the five-number summary corresponding to the characteristic information data, and the five-number summary is the minimum, the upper quartile, and the median in order. , Lower quartile, maximum value.
  • an embodiment of the present application also provides a device for determining abnormality of network access, as shown in FIG. 3, including:
  • the obtaining module 310 is configured to respond to a network access request sent by a terminal device, obtain multiple non-linear feature values of the terminal device using a script, and form a non-linear combination feature set;
  • the calculation module 320 is configured to calculate the average value of all corresponding characteristic values in a set time period for the characteristic values of the nonlinear combination characteristic set, and calculate the chi-square statistics of the characteristic values of the terminal device;
  • the judging module 330 is configured to determine that the corresponding feature value is an outlier when the chi-square statistic of the feature value of the terminal device is greater than the corresponding judgment threshold; and the obtained feature value of the outlier is determined to obtain the Network access is abnormal access;
  • the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information
  • the acquisition module is also used to obtain the feature value of the feature information of the original category with a script, and compare the original
  • the feature value of the feature information of the category is calculated by spreading the measurement data, and the feature value of the effective derived feature information for identifying outliers is obtained.
  • the chi-square statistic includes a first chi-square statistic
  • the calculation module is further configured to determine a specific necessary feature of the non-linear combination feature set for each network access request within a set time period. Value, the first average value of the necessary characteristic value is obtained; according to the necessary characteristic value and the first average value, the first chi-square statistic corresponding to the necessary characteristic value of the terminal device is calculated.
  • the judgment threshold value includes a first judgment threshold value
  • the calculation module is further configured to calculate the fourth value of the first chi-square statistic according to the first chi-square statistic obtained in the set time period.
  • the upper limit of the quantile distance serves as the first judgment threshold corresponding to the characteristic value.
  • the chi-square statistic includes a second chi-square statistic
  • the calculation module is further configured to, according to multiple feature values of the nonlinear combination feature set for each network access request within a set time period, Obtain the dimensional vectors of the multiple feature values of the corresponding nonlinear combination feature set as the feature values of the nonlinear combination feature set; obtain the corresponding second average value according to the feature values of all the nonlinear combination feature sets; According to the feature value of the nonlinear combination feature set and the second average value, a second chi-square statistic of the feature value of the nonlinear combination feature set of the terminal device is calculated.
  • the judgment threshold includes a second judgment threshold
  • the calculation module is further configured to form a curve based on the second chi-square statistics corresponding to all network access requests within a set time period, and obtain the curve The second chi-square statistic corresponding to the maximum slope is used as the second judgment threshold.
  • FIG. 4 is a schematic diagram of the internal structure of the server in an embodiment.
  • the server includes a processor 410, a storage medium 420, a memory 430, and a network interface 440 connected through a system bus.
  • the storage medium 420 of the server stores an operating system, a database, and computer-readable instructions.
  • the database may store control information sequences.
  • the processor 410 can implement a network
  • the processor 410 can implement the functions of the acquisition module 310, the calculation module 320, and the determination module 330 in a network access abnormality determination apparatus in the embodiment shown in FIG. 3.
  • the processor 410 of the server is used to provide computing and control capabilities to support the operation of the entire server.
  • the memory 430 of the server may store computer-readable instructions. When the computer-readable instructions are executed by the processor 410, the processor 410 can execute a method for determining an abnormality of network access.
  • the network interface 440 of the server is used to connect and communicate with the terminal.
  • this application also proposes a computer-readable storage medium storing computer-readable instructions.
  • the computer-readable storage medium may be a non-volatile readable storage medium, and the computer-readable instruction is When one or more processors are executed, one or more processors are made to perform the following steps: in response to a network access request sent by a terminal device, use a script to obtain multiple nonlinear characteristic values of the terminal device, and form a nonlinear combination Feature set; calculate the average value of all the corresponding feature values in a set time period for the feature values of the non-linear combined feature set, and calculate the chi-square statistics of the feature value of the terminal device; use the When the chi-square statistics obtain the corresponding characteristic value as an outlier, it is determined that the corresponding network visit is an abnormal visit.
  • the method, device, server, and storage medium for determining abnormality of network access acquire multiple corresponding feature values from the network access request initiated by the terminal device and form a non-linear combination feature set according to
  • the abnormality determination requires that the corresponding feature value is extracted from the non-linear combination feature set to obtain the corresponding chi-square statistic, and the outlier corresponding to the feature value is obtained from the chi-square statistic, so that the corresponding network access is abnormal The judgment result of the visit.
  • the outlier is obtained when the chi-square statistic of the terminal device is greater than the corresponding judgment threshold, and the chi-square statistic is compared with the judgment threshold to obtain the judgment result of abnormal network access.
  • the technical solution provided in this application uses a detection algorithm using chi-square statistics, and compares the value of the judgment basis with the critical value to obtain the corresponding judgment result, and does not need to mark the characteristic information data of the terminal device to initiate network access , Saves the workload of later statistics and analysis; and the analysis process of the scheme is simple, the result is intuitive, and the judgment result with higher accuracy can be easily obtained, and finally the judgment effect of the abnormal judgment of the network access is improved.
  • the method, device, server, and storage medium for determining abnormality of network access in this application directly analyze the characteristic information data generated by terminal device network access through the outlier detection algorithm of unsupervised clustering, and determine whether The technical solution for determining the result of abnormal access solves the problem in the prior art that the user's usage trace data when logging in to the network through a terminal device can easily identify a real user as an abnormal user, and improves the ability to determine abnormal access to a terminal device.
  • the computer program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present application relates to the field of security detection technology. The present application provides a network access abnormality determination method and apparatus, a server and a storage medium. Said method comprises: in response to a network access request sent by a terminal device, acquiring, using a script, a plurality of nonlinear feature values of the terminal device, and forming a nonlinear combined feature set; calculating, for the feature values of the nonlinear combined feature set, an average value of all the corresponding feature values within a set time period, and calculating a Chi-square statistic of the feature values of the terminal device; when the Chi-square statistic of the feature values of the terminal device is greater than a corresponding determination threshold, determining the corresponding feature values as outliers; and performing determination, according to the feature values of the obtained outliers, to obtain that the network access is an abnormal access. The present application is beneficial to improving the effect of network access abnormality determination.

Description

网络访问的异常判定方法、装置、服务器和存储介质Method, device, server and storage medium for determining abnormality of network access
本申请要求于2019年6月28日提交中国专利局、申请号为201910580047.2、发明名称为“终端设备网络访问的异常判定方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 28, 2019, the application number is 201910580047.2, and the invention title is "Method and Apparatus for Determining Abnormality of Terminal Equipment Network Access". In this application.
技术领域Technical field
本申请涉及安全检测技术领域,具体而言,本申请涉及一种网络访问的异常判定方法、装置、服务器及存储介质。This application relates to the technical field of security detection. Specifically, this application relates to a method, device, server, and storage medium for determining abnormality of network access.
背景技术Background technique
在目前网络访问中,威胁网站安全的主要手段之一是通过网络爬虫访问网站,导致网站不能做出正确的判断,从而容易造成反应错误。针对该问题,目前的方法是通过采集用户验证过程中的点击时间、鼠标拖动轨迹等数据,针对此行为数据判别用户种类,此类方法错误率较高,容易将真实用户判别为异常用户,准确性低。In the current network visits, one of the main means to threaten the security of a website is to visit the website through a web crawler, which causes the website to fail to make correct judgments, which can easily lead to incorrect responses. In response to this problem, the current method is to collect data such as click time and mouse drag trajectory during the user verification process to identify the user type based on this behavior data. This type of method has a high error rate and is easy to identify a real user as an abnormal user. The accuracy is low.
发明内容Summary of the invention
为克服以上技术问题,特别是现有技术中通过终端设备登录网络时,根据用户的使用痕迹数据容易将真实用户判别为异常用户的问题,特提出以下技术方案:In order to overcome the above technical problems, especially when logging on to the network through a terminal device in the prior art, the real user can be easily identified as an abnormal user based on the user's usage trace data, the following technical solutions are proposed:
第一方面,本申请提供一种网络访问的异常判定方法,其包括以下步骤:In the first aspect, this application provides a method for determining abnormality of network access, which includes the following steps:
响应终端设备发送的网络访问请求,利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集;In response to a network access request sent by a terminal device, use a script to obtain multiple non-linear feature values of the terminal device, and form a non-linear combination feature set;
对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量;Calculating the average value of all corresponding feature values within a set time period for the feature values of the nonlinear combination feature set, and calculating the chi-square statistics of the feature values of the terminal device;
当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点;When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, determining that the corresponding characteristic value is an outlier;
根据得到的离群点的特征值,判定得到所述网络访问为异常访问;According to the obtained characteristic value of the outlier, it is determined that the network access is abnormal access;
其中,所述非线性组合特征集包括原始类别的特征信息的特征值和有效衍生特征信息的特征值,所述利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集的步骤包括:Wherein, the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination The steps of the feature set include:
利用脚本获取原始类别的特征信息的特征值;Use the script to obtain the feature value of the feature information of the original category;
通过对原始类别的特征信息的特征值进行度量数据散布计算,得到识别离群点的有效衍生特征信息的特征值。By performing measurement data distribution calculation on the feature value of the feature information of the original category, the feature value of the effective derived feature information for identifying outliers is obtained.
第二方面,本申请还提供一种网络访问的异常判定装置,其包括:In the second aspect, this application also provides an abnormality determination device for network access, which includes:
响应终端设备发送的网络访问请求,利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集;In response to a network access request sent by a terminal device, use a script to obtain multiple non-linear feature values of the terminal device, and form a non-linear combination feature set;
对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量;Calculating the average value of all corresponding feature values within a set time period for the feature values of the nonlinear combination feature set, and calculating the chi-square statistics of the feature values of the terminal device;
当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点;When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, determining that the corresponding characteristic value is an outlier;
根据得到的离群点的特征值,判定得到所述网络访问为异常访问;According to the obtained characteristic value of the outlier, it is determined that the network access is abnormal access;
其中,所述非线性组合特征集包括原始类别的特征信息的特征值和有效衍生特征信息的特征值,所述利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集的步骤包括:Wherein, the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination The steps of the feature set include:
利用脚本获取原始类别的特征信息的特征值;Use the script to obtain the feature value of the feature information of the original category;
通过对原始类别的特征信息的特征值进行度量数据散布计算,得到识别离群点的有效衍生特征信息的特征值。By performing measurement data distribution calculation on the feature value of the feature information of the original category, the feature value of the effective derived feature information for identifying outliers is obtained.
第三方面,本申请还提供一种服务器,其包括:In the third aspect, this application also provides a server, which includes:
一个或多个处理器;One or more processors;
存储器;Memory
一个或多个计算机可读指令,One or more computer readable instructions,
其中所述一个或多个计算机可读指令被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机可读指令被执行时实现以下步骤:The one or more computer-readable instructions are stored in the memory and configured to be executed by the one or more processors, and the one or more computer-readable instructions implement the following steps when executed:
响应终端设备发送的网络访问请求,利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集;In response to a network access request sent by a terminal device, use a script to obtain multiple non-linear feature values of the terminal device, and form a non-linear combination feature set;
对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量;Calculating the average value of all corresponding feature values within a set time period for the feature values of the nonlinear combination feature set, and calculating the chi-square statistics of the feature values of the terminal device;
当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点;When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, determining that the corresponding characteristic value is an outlier;
根据得到的离群点的特征值,判定得到所述网络访问为异常访问;According to the obtained characteristic value of the outlier, it is determined that the network access is abnormal access;
其中,所述非线性组合特征集包括原始类别的特征信息的特征值和有效衍生特征信息的特征值,所述利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集的步骤包括:Wherein, the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination The steps of the feature set include:
利用脚本获取原始类别的特征信息的特征值;Use the script to obtain the feature value of the feature information of the original category;
通过对原始类别的特征信息的特征值进行度量数据散布计算,得到识别离群点的有效衍生特征信息的特征值。By performing measurement data distribution calculation on the feature value of the feature information of the original category, the feature value of the effective derived feature information for identifying outliers is obtained.
第四方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有In a fourth aspect, the present application also provides a computer-readable storage medium, the computer-readable storage medium stores
计算机可读指令,所述计算机可读指令被处理器执行时实现以下步骤:Computer-readable instructions, which, when executed by a processor, implement the following steps:
响应终端设备发送的网络访问请求,利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集;In response to a network access request sent by a terminal device, use a script to obtain multiple non-linear feature values of the terminal device, and form a non-linear combination feature set;
对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量;Calculating the average value of all corresponding feature values within a set time period for the feature values of the nonlinear combination feature set, and calculating the chi-square statistics of the feature values of the terminal device;
当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点;When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, determining that the corresponding characteristic value is an outlier;
根据得到的离群点的特征值,判定得到所述网络访问为异常访问;According to the obtained characteristic value of the outlier, it is determined that the network access is abnormal access;
其中,所述非线性组合特征集包括原始类别的特征信息的特征值和有效衍生特征信息的特征值,所述利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集的步骤包括:Wherein, the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination The steps of the feature set include:
利用脚本获取原始类别的特征信息的特征值;Use the script to obtain the feature value of the feature information of the original category;
通过对原始类别的特征信息的特征值进行度量数据散布计算,得到识别离群点的有效衍生特征信息的特征值。By performing measurement data distribution calculation on the feature value of the feature information of the original category, the feature value of the effective derived feature information for identifying outliers is obtained.
本申请所提供的技术方案运用了采用卡方统计量的检测算法,得到判定依据的值与临界值进行对比后得到相应的判定结果,且不需要对终端设备发起网络访问的特征信息数据进行标注,节省了后期统计和分析的工作 量;而且该方案分析的过程简单,结果直观,可容易得到准确率较高的判定结果,最终提高所述终端设备网络访问的异常判定效果。The technical solution provided in this application uses a detection algorithm using chi-square statistics, and compares the value of the judgment basis with the critical value to obtain the corresponding judgment result, and does not need to mark the characteristic information data of the terminal device to initiate network access This saves the workload of post-statistics and analysis; moreover, the analysis process of the scheme is simple, the result is intuitive, and the judgment result with higher accuracy can be easily obtained, and finally the effect of abnormal judgment of the network access of the terminal device is improved.
本申请附加的方面和优点将在下面的描述中部分给出,这些将从下面的描述中变得明显,或通过本申请的实践了解到。The additional aspects and advantages of this application will be partly given in the following description, which will become obvious from the following description, or be understood through the practice of this application.
附图说明Description of the drawings
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become obvious and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, in which:
图1是本申请中的实施例执行网络访问的异常判定方案的应用环境图;FIG. 1 is an application environment diagram of an abnormality determination scheme for executing network access in an embodiment of the present application;
图2是本申请中的一个实施例的网络访问的异常判定方法的流程图;Fig. 2 is a flowchart of a method for determining abnormality of network access according to an embodiment of the present application;
图3为本申请中的一个实施例的网络访问的异常判定装置的示意图;FIG. 3 is a schematic diagram of an abnormality determination device for network access according to an embodiment of the application;
图4为本申请中的一个实施例的服务器的结构示意图。FIG. 4 is a schematic structural diagram of a server according to an embodiment of the application.
具体实施方式Detailed ways
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能解释为对本申请的限制。The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary, and are only used to explain the present application, and cannot be construed as a limitation to the present application.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。Those skilled in the art can understand that, unless specifically stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the term "comprising" used in the specification of this application refers to the presence of the described features, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or groups thereof. It should be understood that when we refer to an element as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element, or intervening elements may also be present. In addition, “connected” or “coupled” used herein may include wireless connection or wireless coupling. The term "and/or" as used herein includes all or any unit and all combinations of one or more associated listed items.
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本申请所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术 语,应该被理解为具有与现有技术的上下文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。Those skilled in the art can understand that, unless otherwise defined, all terms (including technical terms and scientific terms) used herein have the same meaning as those commonly understood by those of ordinary skill in the art to which this application belongs. It should also be understood that terms such as those defined in general dictionaries should be understood to have a meaning consistent with the meaning in the context of the prior art, and unless specifically defined as here, they will not be idealized or overly Explain the formal meaning.
本技术领域技术人员可以理解,这里所使用的“终端”、“终端设备”既包括无线信号接收器的设备,其仅具备无发射能力的无线信号接收器的设备,又包括接收和发射硬件的设备,其具有能够在双向通讯链路上,执行双向通讯的接收和发射硬件的设备。这种设备可以包括:蜂窝或其他通讯设备,其具有单线路显示器或多线路显示器或没有多线路显示器的蜂窝或其他通讯设备;PCS(Personal Communications Service,个人通讯***),其可以组合语音、数据处理、传真和/或数据通讯能力;PDA(Personal Digital Assistant,个人数字助理),其可以包括射频接收器、寻呼机、互联网/内联网访问、网络浏览器、记事本、日历和/或GPS(Global Positioning System,全球定位***)接收器;常规膝上型和/或掌上型计算机或其他设备,其具有和/或包括射频接收器的常规膝上型和/或掌上型计算机或其他设备。这里所使用的“终端”、“终端设备”可以是便携式、可运输、安装在交通工具(航空、海运和/或陆地)中的,或者适合于和/或配置为在本地运行,和/或以分布形式,运行在地球和/或空间的任何其他位置运行。这里所使用的“终端”、“终端设备”还可以是通讯终端、上网终端、音乐/视频播放终端,例如可以是PDA、MID(Mobile Internet Device,移动互联网设备)和/或具有音乐/视频播放功能的移动电话,也可以是智能电视、机顶盒等设备。Those skilled in the art can understand that the term "terminal" and "terminal equipment" used herein include both wireless signal receiver equipment, equipment that only has wireless signal receivers without transmitting capability, and equipment receiving and transmitting hardware. A device with a receiving and transmitting hardware device capable of performing two-way communication on a two-way communication link. Such equipment may include: cellular or other communication equipment, which has a single-line display or multi-line display or cellular or other communication equipment without a multi-line display; PCS (Personal Communications Service), which can combine voice and data Processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notepads, calendars and/or GPS (Global Positioning System (Global Positioning System) receiver; a conventional laptop and/or palmtop computer or other device, which has and/or includes a radio frequency receiver, a conventional laptop and/or palmtop computer or other device. The "terminal" and "terminal equipment" used here may be portable, transportable, installed in vehicles (aviation, sea and/or land), or suitable and/or configured to operate locally, and/or In a distributed form, it runs on the earth and/or any other location in space. The "terminal" and "terminal device" used here can also be communication terminals, internet terminals, music/video playback terminals, such as PDA, MID (Mobile Internet Device, mobile Internet device) and/or music/video playback Functional mobile phones can also be devices such as smart TVs and set-top boxes.
本技术领域技术人员可以理解,这里所使用的远端网络设备,其包括但不限于计算机、网络主机、单个网络服务器、多个网络服务器集或多个服务器构成的云。在此,云由基于云计算(Cloud Computing)的大量计算机或网络服务器构成,其中,云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。本申请的实施例中,远端网络设备、终端设备与WNS服务器之间可通过任何通讯方式实现通讯,包括但不限于,基于3GPP、LTE、WIMAX的移动通讯、基于TCP/IP、UDP协议的计算机网络通讯以及基于蓝牙、红外传输标准的近距无线传输方式。Those skilled in the art can understand that the remote network device used here includes but is not limited to a computer, a network host, a single network server, a set of multiple network servers, or a cloud composed of multiple servers. Here, the cloud is composed of a large number of computers or network servers based on Cloud Computing. Cloud computing is a type of distributed computing, a super virtual computer composed of a group of loosely coupled computer sets. In the embodiments of this application, the remote network equipment, terminal equipment and WNS server can communicate through any communication method, including but not limited to mobile communication based on 3GPP, LTE, WIMAX, and TCP/IP, UDP protocol-based mobile communications. Computer network communication and short-range wireless transmission based on Bluetooth and infrared transmission standards.
参考图1所示,图1是本申请实施例执行网络访问的异常判定方案的应用环境图;该实施例中,本申请技术方案可以基于服务器上实现,如图1中,终端设备110和120可以通过internet网络访问服务器130,终端设 备110和/或120向服务器130发出的网络请求,服务器130根据网络请求进行数据交互。在进行数据交互时,服务器130根据终端设备110和/或120的请求信息获取终端设备110和/或120的访问数据和属性数据,并根据该数据对该终端设备进行异常判定。As shown in FIG. 1, FIG. 1 is an application environment diagram of an abnormality determination solution for network access in an embodiment of the present application; in this embodiment, the technical solution of the present application can be implemented on a server. In FIG. 1, terminal devices 110 and 120 The server 130 can be accessed through the internet. The terminal device 110 and/or 120 sends a network request to the server 130, and the server 130 performs data interaction according to the network request. During data exchange, the server 130 obtains the access data and attribute data of the terminal device 110 and/or 120 according to the request information of the terminal device 110 and/or 120, and determines abnormality of the terminal device according to the data.
为了解决目前判定异常数据容易将真实用户判别为异常用户的问题,本申请提供了一种网络访问的异常判定方法。可参考图2,图2是一个实施例的网络访问的异常判定方法的流程图,该方法包括以下步骤:In order to solve the current problem that it is easy to identify a real user as an abnormal user by judging abnormal data, this application provides a method for judging abnormality of network access. Refer to FIG. 2, which is a flowchart of a method for determining abnormality of network access according to an embodiment. The method includes the following steps:
S210、响应终端设备发送的网络访问请求,利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集。S210: In response to a network access request sent by a terminal device, use a script to obtain multiple non-linear feature values of the terminal device, and form a non-linear combination feature set.
服务器与终端设备进行数据交互的时候,根据终端设备发出的网络请求,获取该终端设备的相关参数。在该步骤中,用户通过发送注册、验证请求,前端利用JavaScript脚本获取终端设备的相关的特征值,包括设备类型(IPone、Mac、Andriod)、***信息(OS类型、版本、分辨率)、IP等的多个特征值,所述特征值之间相互为非线性关系。所述多个特征值形成了关于所述终端设备关于网络访问请求的非线性组合特征集。When the server interacts with the terminal device, it obtains the relevant parameters of the terminal device according to the network request sent by the terminal device. In this step, the user sends registration and verification requests, and the front end uses JavaScript scripts to obtain the relevant feature values of the terminal device, including device type (IPone, Mac, Andriod), system information (OS type, version, resolution), IP The relationship between the characteristic values of the characteristic values is non-linear with each other. The multiple feature values form a non-linear combined feature set regarding the network access request of the terminal device.
在本实施例中,所述特征值可以具体包括通过前端获取设备的特征值user_agent、分辨率、像素比、触屏事件(支持触屏事件的最大点数、是否支持可触控)。通过解析user_agent中的字符串信息,获取设备的类型、品牌、型号、操作***版本号,通过上述解析的当前发出网络访问请求的终端设备的品牌及型号关联基础库中相同的设备品牌及型号的分辨率、像素比、触屏事件信息。其中,基础库是通过权威网站获取的所有设备型号的分辨率、像素比、触屏事件的真实信息。In this embodiment, the feature value may specifically include the feature value user_agent, resolution, pixel ratio, and touch screen event (the maximum number of touch screen events supported, whether touch is supported or not) of the device acquired through the front end. By parsing the string information in user_agent to obtain the device type, brand, model, and operating system version number, the brand and model of the terminal device currently issuing the network access request through the above analysis is associated with the same device brand and model in the basic library Resolution, pixel ratio, touch screen event information. Among them, the basic library is the real information about the resolution, pixel ratio, and touch screen events of all device models obtained through authoritative websites.
进一步地,为了消除变量间的量纲关系,从而使数据具有可比性,在对特征值标注之前,对各个特征集中的特征信息值进行标准化。例如,在得到的每一次访问记录的特征集中可能包括百分制的变量与一个5分值的变量,只有将所有的数据标准化,才能够在同一标准中进行比较。Furthermore, in order to eliminate the dimensional relationship between variables and make the data comparable, before labeling the feature values, the feature information values in each feature set are standardized. For example, the feature set of each access record obtained may include a variable with a percentile system and a variable with a value of 5 points. Only when all the data are standardized can they be compared in the same standard.
S220、对所述非线性组合特征集的特征值求取设定时间段内所述所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量。S220: Calculate the average value of all the corresponding characteristic values in the set time period for the characteristic values of the nonlinear combination characteristic set, and calculate the chi-square statistics of the characteristic values of the terminal device.
在该步骤中,根据从步骤S210得到的所述非线性组合特征集,并对应提取相应的特征值,并根据所述特征值得到的其他关联值,计算得到与 所述特征值对应的卡方统计量,得到所述终端设备在发起网络访问请求时所形成的特征值分布。In this step, according to the nonlinear combination feature set obtained from step S210, corresponding feature values are extracted correspondingly, and the chi-square corresponding to the feature value is calculated according to other associated values obtained from the feature value. Statistics are used to obtain the feature value distribution formed when the terminal device initiates a network access request.
其中,对于所述卡方统计量,可以进一步通过求取设定时间段内所述所有对应的特征值的平均值,并计算所述终端设备的所述特征值的卡方统计量。Wherein, for the chi-square statistic, the average value of all corresponding characteristic values in a set time period may be further calculated, and the chi-square statistic of the characteristic value of the terminal device may be calculated.
具体地,所述终端设备发起网络访问请求时,在一定连续时长内,正常的情况下,一般会发出多次网络访问请求。所述终端设备每发起一次网络访问请求,所形成的一组特征值便形成一组所述非线性组合特征集。为了能通过其中的若干项的特征值进行异常访问的判定,根据需要从所述非线性组合特征集提取相应的特征值。Specifically, when the terminal device initiates a network access request, within a certain continuous period of time, under normal circumstances, it generally sends multiple network access requests. Each time the terminal device initiates a network access request, the formed set of characteristic values forms a set of the nonlinear combination characteristic set. In order to be able to judge abnormal access based on the feature values of several items, corresponding feature values are extracted from the non-linear combined feature set as needed.
进一步地,求取在上述设定时间段内由所述终端设备发起网络访问请求时所形成的所有对应的特征值的平均值,计算得到所述终端设备的所述特征值的卡方统计量。Further, the average value of all corresponding characteristic values formed when the terminal device initiates a network access request within the above set time period is calculated, and the chi-square statistics of the characteristic value of the terminal device are calculated .
计算所述卡方统计量的具体计算过程如下:The specific calculation process for calculating the chi-square statistic is as follows:
Figure PCTCN2019118378-appb-000001
Figure PCTCN2019118378-appb-000001
其中,o i为所述终端设备在某一检测时间点所发起网络访问请求时所形成的第i个特征的值,E i为设定时间段内所述终端设备所发起网络访问请求所形成的所有第i个特征的平均值,n是在设定时间段内所述终端设备所发起网络访问请求所形成的所有第i个特征的特征总数。 Among them, o i is the value of the i-th characteristic formed when the terminal device initiates a network access request at a certain detection time point, and E i is the value formed by the network access request initiated by the terminal device within a set time period The average value of all the i-th features, n is the total number of all the i-th features formed by the network access request initiated by the terminal device within a set time period.
S230、用所述卡方统计量得到对应的特征值为离群点时,判定对应的网络访问为异常访问。S230: When the corresponding characteristic value is obtained by using the chi-square statistics as an outlier, it is determined that the corresponding network access is an abnormal access.
根据上述步骤S220根据关于终端设备在发起网络访问请求时所形成的特征值计算得到的卡方统计量,体现了检测时间点对应的特征值分布情况。根据所述特征值的分别情况,得到对应特征值的离群点,根据该离群点的位置,得到对应的网络访问为异常访问的判定结果。According to the above step S220, the chi-square statistic calculated according to the characteristic value formed when the terminal device initiates the network access request reflects the distribution of the characteristic value corresponding to the detection time point. According to the respective conditions of the characteristic values, an outlier corresponding to the characteristic value is obtained, and according to the position of the outlier, a determination result of the corresponding network access as an abnormal access is obtained.
本申请提供的一种网络访问的异常判定方法,通过所述终端设备发起的网络访问请求获得多个非线性的特征值,根据上述特征值极端得到对应 的卡方统计量,利用卡方统计量得到对应的特征值为离群点的情况下,得到该终端设备当前为网络访问为异常访问的判定结果。The method for determining abnormality of network access provided by the present application obtains a plurality of non-linear characteristic values through a network access request initiated by the terminal device, and obtains the corresponding chi-square statistics according to the extreme characteristic values mentioned above, and uses the chi-square statistics In the case where the corresponding characteristic value is obtained as an outlier, the determination result that the current network access of the terminal device is abnormal access is obtained.
本申请的技术方案将终端设备发起的网络访问请求得到为离群点对应的特征值,避免现有技术中仅对用户使用终端设备的所产生的使用记录如用户验证过程中的点击和拖动轨迹的数据作为异常判定的依据,从而造成容易将真实用户判别为异常用户的问题,更为准确地反应当前终端设备向服务器发起的网络访问请求的状态,并以更为简单、直观的数据对比方式得到异常访问的判定结果,有利于提高终端设备网络访问的异常判定的效率。The technical solution of the present application obtains the network access request initiated by the terminal device as the characteristic value corresponding to the outlier, avoiding the use record generated by the user using the terminal device in the prior art, such as clicking and dragging in the user authentication process The data of the trajectory is used as the basis for abnormal judgment, which causes the problem of easily identifying real users as abnormal users, and more accurately reflects the current status of network access requests initiated by terminal devices to the server, and compares with simpler and more intuitive data The way to obtain the judgment result of abnormal access is helpful to improve the efficiency of abnormal judgment of terminal equipment network access.
对于上述求取在上述设定时间段内由所述终端设备发起网络访问请求时所形成的所有对应的特征值的平均值,计算得到所述终端设备的所述特征值的卡方统计量的步骤,可进一步包括:For the above calculation, the average value of all corresponding characteristic values formed when the terminal device initiates a network access request within the above set time period is calculated to obtain the chi-square statistics of the characteristic value of the terminal device The steps may further include:
A221、根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的一特定的必要特征值,求取所述必要特征值的第一平均值;A221. Calculate a first average value of the necessary characteristic value according to a specific necessary characteristic value of the nonlinear combination characteristic set for each network access request within a set time period;
A222、根据所述必要特征值和所述第一平均值,计算得到所述终端设备的对应必要特征值的第一卡方统计量。A222. According to the necessary characteristic value and the first average value, a first chi-square statistic corresponding to the necessary characteristic value of the terminal device is calculated.
在步骤A221-A222的实施例,是以其中一项特征值对所述终端设备的网络访问进行异常判定。针对所进行异常判定的事项,在设定时间段内的所形成的所述非线性组合特征集中提取一特定的必要特征值,并对在该设定时间段内的每一次网络访问请求所产生的同一必要特征的值求取第一平均值。In the embodiment of steps A221-A222, one of the characteristic values is used to determine the abnormality of the network access of the terminal device. According to the abnormality judgment, a specific necessary feature value is extracted from the non-linear combination feature set formed within a set time period, and a specific necessary feature value is generated for each network access request within the set time period Calculate the first average of the values of the same essential feature.
其中,该必要特征值是所述终端设备发起网络访问请求所产生的必要特征值,能够通过所述基础库能找到相应的真实信息,以便后续作为参照。如分辨率、像素比、***对用户代理可用的逻辑处理器总数等可以数值的形式来衡量的信息。Wherein, the necessary characteristic value is a necessary characteristic value generated by the terminal device initiating a network access request, and corresponding real information can be found through the basic database for subsequent reference. Information that can be measured in numerical form such as resolution, pixel ratio, and the total number of logical processors available to the user agent by the system.
具体地,如在7PM-9PM的设定时间段,获取该期间的每一次网络访问请求所产生的非线性组合特征集的如分辨率的特定的必要特征值,并求其在该设定时间段内每一次网络访问请求所产生的分辨率的平均值,并以该平均值作为该必要特征值分辨率的第一平均值。根据上述的公式(1),计算得到该必要特征值的第一卡方统计量。Specifically, for example, during the set time period of 7PM-9PM, obtain the specific necessary feature values of the nonlinear combination feature set generated by each network access request during the period, such as the resolution, and find the value at the set time The average value of the resolution generated by each network access request in the segment, and the average value is used as the first average value of the necessary feature value resolution. According to the above formula (1), the first chi-square statistic of the necessary characteristic value is calculated.
可进一步地,计算在设定时间段内对应特征值的所有第一卡方统计量,并对所述第一卡方统计量求取其平均值,并以该平均值做进一步判定的依据,避免因偶然的异常情况引起的数据异常影响整体的判定结果。It may be further possible to calculate all the first chi-square statistics corresponding to the characteristic value within the set time period, and obtain the average value of the first chi-square statistics, and use the average value as a basis for further judgment, Avoid data abnormalities caused by accidental abnormalities from affecting the overall judgment result.
在此基础上,在步骤S230之前,即在对所述网络访问的异常访问的判定之前,还包括:On this basis, before step S230, that is, before determining the abnormal access of the network access, the method further includes:
根据所述设定时间段内所得到的第一卡方统计量,计算得到所述第一卡方统计量的四分位距的上限值作为所述特征值对应的第一判断阈值。According to the first chi-square statistic obtained within the set time period, the upper limit value of the interquartile range of the first chi-square statistic is calculated as the first judgment threshold corresponding to the characteristic value.
在该步骤,针对上述特征值在所述设定时间段内所有得到的对应的第一卡方统计量,并此作为统计的对应形成一数组,并按照所形成的时间进行排列。根据排列好的数组,分别计算其下四分位数Q1、中位数Q2和上四分位数Q3,得到对应的四分位距IQR=Q3-Q1,最后计算得到对应的四分位距的上限值,并以该四分位距的上限值作为所述特征值的第一判定阈值。In this step, all corresponding first chi-square statistics obtained in the set time period for the above-mentioned feature values are formed as a statistical correspondence and an array is formed and arranged according to the formed time. According to the arranged array, calculate the lower quartile Q1, median Q2 and upper quartile Q3 respectively, and get the corresponding interquartile range IQR=Q3-Q1, and finally calculate the corresponding interquartile range The upper limit value of the interquartile range is used as the first judgment threshold of the characteristic value.
所述第一判定阈值也可以是根据针对所述特征值所获取的历史数据得到的临界值,并根据所述临界值为依据设定对应的第一判定阈值。The first determination threshold may also be a critical value obtained based on historical data obtained for the characteristic value, and a corresponding first determination threshold may be set according to the critical value.
上述步骤S220,可形成另一实施例,其进一步包括:The above step S220 may form another embodiment, which further includes:
B221、根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的多个特征值,得到对应的非线性组合特征集的多个特征值的维度向量作为所述非线性组合特征集的特征值;B221. According to the multiple feature values of the non-linear combination feature set for each network access request within the set time period, obtain the dimension vector of the multiple feature values of the corresponding non-linear combination feature set as the non-linear combination The feature value of the feature set;
B222、根据所有所述非线性组合特征集的特征值,求取对应的第二平均值;B222. Obtain a corresponding second average value according to the feature values of all the nonlinear combination feature sets;
B223、根据所述非线性组合特征集的特征值和所述第二平均值,并计算得到所述终端设备的所述非线性组合特征集的特征值的第二卡方统计量。B223. According to the feature value of the non-linear combination feature set and the second average value, a second chi-square statistic of the feature value of the non-linear combination feature set of the terminal device is calculated.
在步骤B221-B223的实施例,是以所述非线性组合特征集的多个特征值、甚至是全部的特征值作为一整体对所述终端设备的网络访问进行异常访问的判定。In the embodiments of steps B221-B223, a plurality of feature values, or even all feature values of the non-linear combination feature set are used as a whole to determine abnormal access to the network access of the terminal device.
具体地,根据评估的需要,在设定时间段内针对所述终端设备在每一次发起的网络访问请求所形成的非线性组合特征集提取多个特征值。如确认对所述终端设备的IP地址是否为IDC(Internet Data Center,互联网数据中心)或者是机房服务器,则可以对其中的特征值user agent、分辨率、 像素比、触屏事件的各个特征值作为一个维度的值,并得到所述多个特征值对应的维度向量进行评估。并以该维度向量作为所述非线性组合特征集的特征值。Specifically, according to the needs of the evaluation, within a set time period, a plurality of feature values are extracted from the non-linear combination feature set formed by each network access request initiated by the terminal device. If it is confirmed whether the IP address of the terminal device is an IDC (Internet Data Center, Internet Data Center) or a computer room server, then the characteristic values of the user agent, resolution, pixel ratio, and touch screen events can be checked. As a value of a dimension, the dimension vectors corresponding to the multiple feature values are obtained for evaluation. And use the dimension vector as the feature value of the nonlinear combination feature set.
根据设定时间段内的每一次网络访问请求对应所述非线性组合特征集的特征值,求取对应的第二平均值。According to the feature value of the nonlinear combination feature set corresponding to each network access request within the set time period, the corresponding second average value is obtained.
根据所述非线性组合特征集的特征值和对应的所述第二平均值,利用上述的公式(1),计算得到所述终端设备的所述非线性组合特征集的第二卡方统计量,得到关于IP地址相关特征值的数据分布。According to the feature value of the non-linear combination feature set and the corresponding second average value, the above formula (1) is used to calculate the second chi-square statistic of the non-linear combination feature set of the terminal device , Get the data distribution about the characteristic value of the IP address.
进一步地,计算在设定时间段内对应特征值的所有第二卡方统计量,并对所述第二卡方统计量求取其平均值,并以该平均值做进一步判定的依据,避免因偶然的异常情况引起的数据异常影响整体的判定结果。Further, calculate all the second chi-square statistics corresponding to the characteristic value within the set time period, and calculate the average value of the second chi-square statistics, and use the average value as a basis for further judgment to avoid Data abnormalities caused by accidental abnormalities affect the overall judgment result.
在该实施例的基础上,在步骤S230之前,即在对所述网络访问的异常访问的判定之前,还包括:On the basis of this embodiment, before step S230, that is, before determining the abnormal access of the network access, the method further includes:
根据设定时间段内所有网络访问请求对应的所述第二卡方统计量形成的曲线,并得到所述曲线的最大斜率对应的第二卡方统计量作为第二判断阈值。A curve formed according to the second chi-square statistic corresponding to all network access requests within a set time period, and the second chi-square statistic corresponding to the maximum slope of the curve is obtained as the second judgment threshold.
在该步骤中,针对所述非线性组合特征集的特征值在所述设定时间段内所有得到的对应的第二卡方统计量,并对应在坐标上标注对应的所述第二卡方统计量的值,形成对应的一曲线,并从所述曲线上的斜率最大的点所对应的第二卡方统计量作为所述第二判断阈值。In this step, for all the corresponding second chi-square statistics obtained in the set time period for the feature values of the non-linear combination feature set, and correspondingly mark the corresponding second chi-square statistics on the coordinates The value of the statistic forms a corresponding curve, and the second chi-square statistic corresponding to the point with the largest slope on the curve is used as the second judgment threshold.
所述第二判定阈值也可以是根据针对所述非线性组合特征集的特征值所获取的历史数据得到的临界值,并根据所述临界值为依据设定对应的第二判定阈值。The second judgment threshold may also be a critical value obtained according to historical data obtained with respect to the characteristic values of the nonlinear combination feature set, and a corresponding second judgment threshold is set according to the critical value.
对于步骤S230,进一步包括:For step S230, it further includes:
S231、通过所述终端设备的第一或第二卡方统计量大于对应的判断阈值时,对应的特征值为离群点;S231: When the first or second chi-square statistic passed by the terminal device is greater than the corresponding judgment threshold, the corresponding characteristic value is an outlier;
S232、根据得到的离群点的特征值,判定得到所述网络访问为异常访问。S232: According to the obtained characteristic value of the outlier, it is determined that the network access is an abnormal access.
将所述第一或第二卡方统计量与对应的第一或所述第二判断阈值进行对比。根据对比的结果,判定对应的所述非线性组合特征集的特征值是否 为离群点,以此判定对应的网络访问是否为异常访问。若第一或第二卡方统计量大于对应的第一或第二判断阈值时,对应的特征值为离群点,则判定得到产生对应特征值的网络访问为异常访问。The first or second chi-square statistic is compared with the corresponding first or second judgment threshold. According to the result of the comparison, it is determined whether the characteristic value of the corresponding nonlinear combination characteristic set is an outlier, so as to determine whether the corresponding network access is an abnormal access. If the first or second chi-square statistic is greater than the corresponding first or second judgment threshold, and the corresponding feature value is an outlier, it is determined that the network access that generates the corresponding feature value is an abnormal access.
对于所述终端设备当前发起的网络请求被判定为异常访问请求,服务器直接拒绝请求或重新要求所述终端设备进行访问验证;若所述终端设备当前发起的网络请求被判定为正常访问请求,则直接响应请求。If the network request currently initiated by the terminal device is determined to be an abnormal access request, the server directly rejects the request or re-requires the terminal device to perform access verification; if the network request currently initiated by the terminal device is determined to be a normal access request, then Respond directly to requests.
其中,在上述实施例中提到的所述判断阈值是表现所述终端设备在发起正常网络访问请求的情况下,所形成的相关特征值的临界点。将所述卡方统计量与所述判断阈值进行比较,若所述卡方统计量超出了所述判断阈值的范围,则所述卡方统计量对应的特征值为在异常访问的情况下所形成的,以此得到在对应的检测时间点所述终端设备发起的网络访问是否为异常访问的判定结果。Wherein, the judgment threshold mentioned in the above embodiment is a critical point representing the relevant feature value formed when the terminal device initiates a normal network access request. The chi-square statistic is compared with the judgment threshold, and if the chi-square statistic exceeds the range of the judgment threshold, the characteristic value corresponding to the chi-square statistic is determined in the case of abnormal access. It is formed to obtain a determination result of whether the network access initiated by the terminal device at the corresponding detection time point is an abnormal access.
对于上述提到的所述非线性组合特征集还可以包括原始类别的特征信息和有效衍生特征信息,其中,先利用脚本获取原始类别的特征信息,然后通过对原始类别的特征信息进行度量数据散布计算得到识别离群点的有效衍生特征信息。The non-linear combination feature set mentioned above can also include feature information of the original category and valid derived feature information. Among them, the feature information of the original category is first obtained by using the script, and then the feature information of the original category is measured and distributed. Calculate effective derived feature information to identify outliers.
具体地,所述非线性组合特征集可以包括浏览器语言、像素比、颜色深度、音频堆栈指纹是否提供、音频堆栈指纹的参数信息、***对用户代理可用的逻辑处理器总数、浏览器生产厂商是否为other、操作***生产厂商是否为other、浏览器类型是否为robot等原始类别的特征信息。Specifically, the non-linear combination feature set may include browser language, pixel ratio, color depth, whether audio stack fingerprints are provided, parameter information of audio stack fingerprints, the total number of logical processors available to the user agent by the system, and the browser manufacturer Whether it is other, whether the operating system manufacturer is other, and whether the browser type is the feature information of the original category such as robot.
根据所述度量数据散布计算,可以得到识别离群点的有效衍生特征,其包括是否安装AdBlock、用户是否篡改了语言、用户是否篡改了屏幕分辨率、用户是否篡改了操作***、浏览器生产厂商、操作***生产厂商、访问设备类型、操作***家族。According to the calculation of the metric data distribution, effective derivative features for identifying outliers can be obtained, including whether AdBlock is installed, whether the user has tampered with the language, whether the user has tampered with the screen resolution, whether the user has tampered with the operating system, and browser manufacturer , Operating system manufacturer, access device type, operating system family.
所述度量数据散布计算包括对应特征信息数据计算极差、四分位数、四分位数极差、五数概括,所述五数概括按次序为最小值、上四分位、中位数、下四分位数、最大值。The metric data distribution calculation includes calculating the range, the quartile, the quartile range, and the five-number summary corresponding to the characteristic information data, and the five-number summary is the minimum, the upper quartile, and the median in order. , Lower quartile, maximum value.
基于与上述网络访问的异常判定方法相同的发明构思,本申请实施例还提供了一种网络访问的异常判定装置,如图3所示,包括:Based on the same inventive concept as the aforementioned method for determining abnormality of network access, an embodiment of the present application also provides a device for determining abnormality of network access, as shown in FIG. 3, including:
获取模块310,用于响应终端设备发送的网络访问请求,利用脚本获 取所述终端设备的多个非线性的特征值,并形成非线性组合特征集;The obtaining module 310 is configured to respond to a network access request sent by a terminal device, obtain multiple non-linear feature values of the terminal device using a script, and form a non-linear combination feature set;
计算模块320,用于对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量;The calculation module 320 is configured to calculate the average value of all corresponding characteristic values in a set time period for the characteristic values of the nonlinear combination characteristic set, and calculate the chi-square statistics of the characteristic values of the terminal device;
判定模块330,用于当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点;根据得到的离群点的特征值,判定得到所述网络访问为异常访问;The judging module 330 is configured to determine that the corresponding feature value is an outlier when the chi-square statistic of the feature value of the terminal device is greater than the corresponding judgment threshold; and the obtained feature value of the outlier is determined to obtain the Network access is abnormal access;
其中,所述非线性组合特征集包括原始类别的特征信息的特征值和有效衍生特征信息的特征值,所述获取模块还用于用脚本获取原始类别的特征信息的特征值,并通过对原始类别的特征信息的特征值进行度量数据散布计算,得到识别离群点的有效衍生特征信息的特征值。Wherein, the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the acquisition module is also used to obtain the feature value of the feature information of the original category with a script, and compare the original The feature value of the feature information of the category is calculated by spreading the measurement data, and the feature value of the effective derived feature information for identifying outliers is obtained.
进一步的,所述卡方统计量包括第一卡方统计量,所述计算模块还用于根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的一特定的必要特征值,求取所述必要特征值的第一平均值;根据所述必要特征值和所述第一平均值,计算得到所述终端设备的对应必要特征值的第一卡方统计量。Further, the chi-square statistic includes a first chi-square statistic, and the calculation module is further configured to determine a specific necessary feature of the non-linear combination feature set for each network access request within a set time period. Value, the first average value of the necessary characteristic value is obtained; according to the necessary characteristic value and the first average value, the first chi-square statistic corresponding to the necessary characteristic value of the terminal device is calculated.
进一步的,所述判断阈值包括第一判断阈值,所述计算模块还用于根据所述设定时间段内所得到的第一卡方统计量,计算得到所述第一卡方统计量的四分位距的上限值作为所述特征值对应的第一判断阈值。Further, the judgment threshold value includes a first judgment threshold value, and the calculation module is further configured to calculate the fourth value of the first chi-square statistic according to the first chi-square statistic obtained in the set time period. The upper limit of the quantile distance serves as the first judgment threshold corresponding to the characteristic value.
进一步的,所述卡方统计量包括第二卡方统计量,所述计算模块还用于根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的多个特征值,得到对应的非线性组合特征集的多个特征值的维度向量作为所述非线性组合特征集的特征值;根据所有所述非线性组合特征集的特征值,求取对应的第二平均值;根据所述非线性组合特征集的特征值和所述第二平均值,并计算得到所述终端设备的所述非线性组合特征集的特征值的第二卡方统计量。Further, the chi-square statistic includes a second chi-square statistic, and the calculation module is further configured to, according to multiple feature values of the nonlinear combination feature set for each network access request within a set time period, Obtain the dimensional vectors of the multiple feature values of the corresponding nonlinear combination feature set as the feature values of the nonlinear combination feature set; obtain the corresponding second average value according to the feature values of all the nonlinear combination feature sets; According to the feature value of the nonlinear combination feature set and the second average value, a second chi-square statistic of the feature value of the nonlinear combination feature set of the terminal device is calculated.
进一步的,所述判断阈值包括第二判断阈值,所述计算模块还用于根据设定时间段内所有网络访问请求对应的所述第二卡方统计量形成的曲线,并得到所述曲线的最大斜率对应的第二卡方统计量作为第二判断阈值。Further, the judgment threshold includes a second judgment threshold, and the calculation module is further configured to form a curve based on the second chi-square statistics corresponding to all network access requests within a set time period, and obtain the curve The second chi-square statistic corresponding to the maximum slope is used as the second judgment threshold.
请参考图4,图4为一个实施例中服务器的内部结构示意图。如图4 所示,该服务器包括通过***总线连接的处理器410、存储介质420、存储器430和网络接口440。其中,该服务器的存储介质420存储有操作***、数据库和计算机可读指令,数据库中可存储有控件信息序列,该计算机可读指令被处理器410执行时,可使得处理器410实现一种网络访问的异常判定方法,处理器410能实现图3所示实施例中的一种网络访问的异常判定装置中的获取模块310、计算模块320和判定模块330的功能。该服务器的处理器410用于提供计算和控制能力,支撑整个服务器的运行。该服务器的存储器430中可存储有计算机可读指令,该计算机可读指令被处理器410执行时,可使得处理器410执行一种网络访问的异常判定方法。该服务器的网络接口440用于与终端连接通信。本领域技术人员可以理解,图4中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的服务器的限定,具体的服务器可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Please refer to FIG. 4, which is a schematic diagram of the internal structure of the server in an embodiment. As shown in FIG. 4, the server includes a processor 410, a storage medium 420, a memory 430, and a network interface 440 connected through a system bus. Wherein, the storage medium 420 of the server stores an operating system, a database, and computer-readable instructions. The database may store control information sequences. When the computer-readable instructions are executed by the processor 410, the processor 410 can implement a network In the access abnormality determination method, the processor 410 can implement the functions of the acquisition module 310, the calculation module 320, and the determination module 330 in a network access abnormality determination apparatus in the embodiment shown in FIG. 3. The processor 410 of the server is used to provide computing and control capabilities to support the operation of the entire server. The memory 430 of the server may store computer-readable instructions. When the computer-readable instructions are executed by the processor 410, the processor 410 can execute a method for determining an abnormality of network access. The network interface 440 of the server is used to connect and communicate with the terminal. Those skilled in the art can understand that the structure shown in FIG. 4 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the server to which the solution of the present application is applied. The specific server may include More or fewer components are shown in the figure, or some components are combined, or have different component arrangements.
在一个实施例中,本申请还提出了一种存储有计算机可读指令的计算机可读存储介质,所述计算机可读存储介质可以为非易失性可读存储介质,该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:响应终端设备发送的网络访问请求,利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集;对所述非线性组合特征集的特征值求取设定时间段内所述所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量;利用所述卡方统计量得到对应的特征值为离群点时,判定对应的网络访问为异常访问。In one embodiment, this application also proposes a computer-readable storage medium storing computer-readable instructions. The computer-readable storage medium may be a non-volatile readable storage medium, and the computer-readable instruction is When one or more processors are executed, one or more processors are made to perform the following steps: in response to a network access request sent by a terminal device, use a script to obtain multiple nonlinear characteristic values of the terminal device, and form a nonlinear combination Feature set; calculate the average value of all the corresponding feature values in a set time period for the feature values of the non-linear combined feature set, and calculate the chi-square statistics of the feature value of the terminal device; use the When the chi-square statistics obtain the corresponding characteristic value as an outlier, it is determined that the corresponding network visit is an abnormal visit.
综合上述实施例可知,本申请最大的有益效果在于:Based on the foregoing embodiments, it can be seen that the greatest beneficial effect of this application lies in:
本申请所提供的一种网络访问的异常判定方法、装置、服务器及存储介质,根据从所述终端设备发起的网络访问请求中获取对应的多个特征值,并形成非线性组合特征集,根据异常判定需要,从所述非线性组合特征集提取相应的特征值,得到对应的卡方统计量,并由卡方统计量得到该特征值对应的离群点,从而得到对应的网络访问为异常访问的判定结果。The method, device, server, and storage medium for determining abnormality of network access provided by this application acquire multiple corresponding feature values from the network access request initiated by the terminal device and form a non-linear combination feature set according to The abnormality determination requires that the corresponding feature value is extracted from the non-linear combination feature set to obtain the corresponding chi-square statistic, and the outlier corresponding to the feature value is obtained from the chi-square statistic, so that the corresponding network access is abnormal The judgment result of the visit.
在此基础上,该离群点为所述终端设备的卡方统计量大于对应的判断阈值得到的,根据所述卡方统计量与判断阈值进行对比,从而得到异常的 网络访问的判定结果。On this basis, the outlier is obtained when the chi-square statistic of the terminal device is greater than the corresponding judgment threshold, and the chi-square statistic is compared with the judgment threshold to obtain the judgment result of abnormal network access.
本申请所提供的技术方案运用了采用卡方统计量的检测算法,得到判定依据的值与临界值进行对比后得到相应的判定结果,且不需要对终端设备发起网络访问的特征信息数据进行标注,节省了后期统计和分析的工作量;而且该方案分析的过程简单,结果直观,可容易得到准确率较高的判定结果,最终提高所述网络访问的异常判定的判定效果。The technical solution provided in this application uses a detection algorithm using chi-square statistics, and compares the value of the judgment basis with the critical value to obtain the corresponding judgment result, and does not need to mark the characteristic information data of the terminal device to initiate network access , Saves the workload of later statistics and analysis; and the analysis process of the scheme is simple, the result is intuitive, and the judgment result with higher accuracy can be easily obtained, and finally the judgment effect of the abnormal judgment of the network access is improved.
综上,本申请通过网络访问的异常判定方法、装置、服务器及存储介质,通过无监督聚类的离群点检测算法直接对终端设备网络访问所生成的特征信息数据进行分析,并得到判定是否为异常访问的判定结果的技术方案,解决了现有技术中通过终端设备登录网络时用户的使用痕迹数据容易将真实用户判别为异常用户的问题,提高了对终端设备异常访问的判定能力。In summary, the method, device, server, and storage medium for determining abnormality of network access in this application directly analyze the characteristic information data generated by terminal device network access through the outlier detection algorithm of unsupervised clustering, and determine whether The technical solution for determining the result of abnormal access solves the problem in the prior art that the user's usage trace data when logging in to the network through a terminal device can easily identify a real user as an abnormal user, and improves the ability to determine abnormal access to a terminal device.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等存储介质,或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a computer readable storage medium. When executed, it may include the processes of the above-mentioned method embodiments. Among them, the aforementioned storage medium may be a storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, All should be considered as the scope of this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and their descriptions are relatively specific and detailed, but they should not be understood as a limitation on the patent scope of the present application. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种网络访问的异常判定方法,其特征在于,包括以下步骤:A method for determining abnormality of network access, characterized in that it comprises the following steps:
    响应终端设备发送的网络访问请求,利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集;In response to a network access request sent by a terminal device, use a script to obtain multiple non-linear feature values of the terminal device, and form a non-linear combination feature set;
    对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量;Calculating the average value of all corresponding feature values within a set time period for the feature values of the nonlinear combination feature set, and calculating the chi-square statistics of the feature values of the terminal device;
    当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点;When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, determining that the corresponding characteristic value is an outlier;
    根据得到的离群点的特征值,判定得到所述网络访问为异常访问;According to the obtained characteristic value of the outlier, it is determined that the network access is abnormal access;
    其中,所述非线性组合特征集包括原始类别的特征信息的特征值和有效衍生特征信息的特征值,所述利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集的步骤包括:Wherein, the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination The steps of the feature set include:
    利用脚本获取原始类别的特征信息的特征值;Use the script to obtain the feature value of the feature information of the original category;
    通过对原始类别的特征信息的特征值进行度量数据散布计算,得到识别离群点的有效衍生特征信息的特征值。By performing measurement data distribution calculation on the feature value of the feature information of the original category, the feature value of the effective derived feature information for identifying outliers is obtained.
  2. 根据权利要求1所述的方法,其特征在于,所述卡方统计量包括第一卡方统计量,The method according to claim 1, wherein the chi-square statistic comprises a first chi-square statistic,
    所述对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量的步骤,包括:The step of calculating the characteristic value of the non-linear combination characteristic set for the average of all corresponding characteristic values within a set time period and calculating the chi-square statistics of the characteristic value of the terminal device includes:
    根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的一特定的必要特征值,求取所述必要特征值的第一平均值;Obtaining a first average value of the necessary characteristic value according to a specific necessary characteristic value of the non-linear combination characteristic set of each network access request within a set time period;
    根据所述必要特征值和所述第一平均值,计算得到所述终端设备的对应必要特征值的第一卡方统计量。According to the necessary characteristic value and the first average value, a first chi-square statistic corresponding to the necessary characteristic value of the terminal device is calculated.
  3. 根据权利要求2所述的方法,其特征在于,所述判断阈值包括第一判断阈值,The method according to claim 2, wherein the judgment threshold comprises a first judgment threshold,
    所述当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点的步骤之前,还包括:When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, before the step of determining the corresponding characteristic value as an outlier, the method further includes:
    根据所述设定时间段内所得到的第一卡方统计量,计算得到所述第一 卡方统计量的四分位距的上限值作为所述特征值对应的第一判断阈值。According to the first chi-square statistic obtained in the set time period, the upper limit value of the interquartile range of the first chi-square statistic is calculated as the first judgment threshold corresponding to the characteristic value.
  4. 根据权利要求1所述的方法,其特征在于,所述卡方统计量包括第二卡方统计量,The method according to claim 1, wherein the chi-square statistic comprises a second chi-square statistic,
    所述对所述非线性组合特征集的特征值求取设定时间段内所述所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量的步骤包括:The step of calculating the average value of all the corresponding characteristic values in a set time period for the characteristic values of the non-linear combination characteristic set, and calculating the chi-square statistics of the characteristic values of the terminal device includes:
    根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的多个特征值,得到对应的非线性组合特征集的多个特征值的维度向量作为所述非线性组合特征集的特征值;According to the multiple feature values of the non-linear combination feature set for each network access request within the set time period, the dimension vectors of the multiple feature values of the corresponding non-linear combination feature set are obtained as the non-linear combination feature set Eigenvalues;
    根据所有所述非线性组合特征集的特征值,求取对应的第二平均值;Obtaining a corresponding second average value according to the feature values of all the nonlinear combination feature sets;
    根据所述非线性组合特征集的特征值和所述第二平均值,并计算得到所述终端设备的所述非线性组合特征集的特征值的第二卡方统计量。According to the feature value of the nonlinear combination feature set and the second average value, a second chi-square statistic of the feature value of the nonlinear combination feature set of the terminal device is calculated.
  5. 根据权利要求4所述的方法,其特征在于,所述判断阈值包括第二判断阈值,The method according to claim 4, wherein the judgment threshold comprises a second judgment threshold,
    所述当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点的步骤之前,还包括:When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, before the step of determining the corresponding characteristic value as an outlier, the method further includes:
    根据设定时间段内所有网络访问请求对应的所述第二卡方统计量形成的曲线,并得到所述曲线的最大斜率对应的第二卡方统计量作为第二判断阈值。A curve formed according to the second chi-square statistic corresponding to all network access requests within a set time period, and the second chi-square statistic corresponding to the maximum slope of the curve is obtained as the second judgment threshold.
  6. 一种网络访问的异常判定装置,其特征在于,包括:An abnormality determination device for network access, characterized in that it comprises:
    获取模块,用于响应终端设备发送的网络访问请求,利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集;The obtaining module is used to respond to the network access request sent by the terminal device, obtain a plurality of non-linear feature values of the terminal device using a script, and form a non-linear combination feature set;
    计算模块,用于对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量;A calculation module, configured to calculate the average value of all corresponding characteristic values in a set time period for the characteristic values of the nonlinear combination characteristic set, and calculate the chi-square statistics of the characteristic values of the terminal device;
    判定模块,用于当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点;根据得到的离群点的特征值,判定得到所述网络访问为异常访问;The determination module is used to determine that the corresponding characteristic value is an outlier when the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold; and determine that the network is obtained according to the obtained characteristic value of the outlier Access is abnormal access;
    其中,所述非线性组合特征集包括原始类别的特征信息的特征值和有效衍生特征信息的特征值,所述获取模块还用于用脚本获取原始类别的特征信息的特征值,并通过对原始类别的特征信息的特征值进行度量数据散 布计算,得到识别离群点的有效衍生特征信息的特征值。Wherein, the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the feature information effectively derived. The feature value of the feature information of the category is calculated by spreading the measurement data, and the feature value of the effective derived feature information for identifying outliers is obtained.
  7. 如权利要求6所述的装置,其特征在于,所述卡方统计量包括第一卡方统计量,所述计算模块还用于:7. The apparatus of claim 6, wherein the chi-square statistic comprises a first chi-square statistic, and the calculation module is further configured to:
    根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的一特定的必要特征值,求取所述必要特征值的第一平均值;Obtaining a first average value of the necessary characteristic value according to a specific necessary characteristic value of the nonlinear combination characteristic set of each network access request within a set time period;
    根据所述必要特征值和所述第一平均值,计算得到所述终端设备的对应必要特征值的第一卡方统计量。According to the necessary characteristic value and the first average value, a first chi-square statistic corresponding to the necessary characteristic value of the terminal device is calculated.
  8. 如权利要求7所述的装置,其特征在于,所述判断阈值包括第一判断阈值,所述计算模块还用于:8. The device of claim 7, wherein the judgment threshold comprises a first judgment threshold, and the calculation module is further configured to:
    根据所述设定时间段内所得到的第一卡方统计量,计算得到所述第一卡方统计量的四分位距的上限值作为所述特征值对应的第一判断阈值According to the first chi-square statistic obtained in the set time period, the upper limit of the interquartile range of the first chi-square statistic is calculated as the first judgment threshold corresponding to the characteristic value
  9. 如权利要求6所述的装置,其特征在于,所述卡方统计量包括第二卡方统计量,所述计算模块还用于:7. The apparatus of claim 6, wherein the chi-square statistic comprises a second chi-square statistic, and the calculation module is further configured to:
    根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的多个特征值,得到对应的非线性组合特征集的多个特征值的维度向量作为所述非线性组合特征集的特征值;According to the multiple feature values of the non-linear combination feature set for each network access request within the set time period, the dimension vectors of the multiple feature values of the corresponding non-linear combination feature set are obtained as the non-linear combination feature set Eigenvalues;
    根据所有所述非线性组合特征集的特征值,求取对应的第二平均值;Obtaining a corresponding second average value according to the feature values of all the nonlinear combination feature sets;
    根据所述非线性组合特征集的特征值和所述第二平均值,并计算得到所述终端设备的所述非线性组合特征集的特征值的第二卡方统计量According to the feature value of the nonlinear combination feature set and the second average value, the second chi-square statistic of the feature value of the nonlinear combination feature set of the terminal device is calculated
  10. 如权利要求9所述的装置,其特征在于,所述判断阈值包括第二判断阈值,所述计算模块还用于:The device according to claim 9, wherein the judgment threshold value comprises a second judgment threshold value, and the calculation module is further configured to:
    根据设定时间段内所有网络访问请求对应的所述第二卡方统计量形成的曲线,并得到所述曲线的最大斜率对应的第二卡方统计量作为第二判断阈值。A curve formed according to the second chi-square statistic corresponding to all network access requests within a set time period, and the second chi-square statistic corresponding to the maximum slope of the curve is obtained as the second judgment threshold.
  11. 一种服务器,其特征在于,包括:A server, characterized in that it comprises:
    一个或多个处理器;One or more processors;
    存储器;Memory
    一个或多个计算机可读指令,其中所述一个或多个计算机可读指令被存储在所述存储器中并被配置为由所述一个或多个处理器执行,所述一个或多个计算机可读指令被执行时实现以下步骤:One or more computer-readable instructions, wherein the one or more computer-readable instructions are stored in the memory and configured to be executed by the one or more processors, and the one or more computers may The following steps are implemented when the read instruction is executed:
    响应终端设备发送的网络访问请求,利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集;In response to a network access request sent by a terminal device, use a script to obtain multiple non-linear feature values of the terminal device, and form a non-linear combination feature set;
    对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量;Calculating the average value of all corresponding feature values within a set time period for the feature values of the nonlinear combination feature set, and calculating the chi-square statistics of the feature values of the terminal device;
    当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点;When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, determining that the corresponding characteristic value is an outlier;
    根据得到的离群点的特征值,判定得到所述网络访问为异常访问;According to the obtained characteristic value of the outlier, it is determined that the network access is abnormal access;
    其中,所述非线性组合特征集包括原始类别的特征信息的特征值和有效衍生特征信息的特征值,所述利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集的步骤包括:Wherein, the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination The steps of the feature set include:
    利用脚本获取原始类别的特征信息的特征值;Use the script to obtain the feature value of the feature information of the original category;
    通过对原始类别的特征信息的特征值进行度量数据散布计算,得到识别离群点的有效衍生特征信息的特征值。By performing measurement data distribution calculation on the feature value of the feature information of the original category, the feature value of the effective derived feature information for identifying outliers is obtained.
  12. 如权利要求11所述的服务器,其特征在于,所述卡方统计量包括第一卡方统计量,The server according to claim 11, wherein the chi-square statistic comprises a first chi-square statistic,
    所述对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量的步骤,包括:The step of calculating the characteristic value of the non-linear combination characteristic set for the average of all corresponding characteristic values within a set time period and calculating the chi-square statistics of the characteristic value of the terminal device includes:
    根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的一特定的必要特征值,求取所述必要特征值的第一平均值;Obtaining a first average value of the necessary characteristic value according to a specific necessary characteristic value of the nonlinear combination characteristic set of each network access request within a set time period;
    根据所述必要特征值和所述第一平均值,计算得到所述终端设备的对应必要特征值的第一卡方统计量。According to the necessary characteristic value and the first average value, a first chi-square statistic corresponding to the necessary characteristic value of the terminal device is calculated.
  13. 如权利要求12所述的服务器,其特征在于,所述判断阈值包括第一判断阈值,The server according to claim 12, wherein the judgment threshold comprises a first judgment threshold,
    所述当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点的步骤之前,还包括:When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, before the step of determining the corresponding characteristic value as an outlier, the method further includes:
    根据所述设定时间段内所得到的第一卡方统计量,计算得到所述第一卡方统计量的四分位距的上限值作为所述特征值对应的第一判断阈值。According to the first chi-square statistic obtained within the set time period, the upper limit value of the interquartile range of the first chi-square statistic is calculated as the first judgment threshold corresponding to the characteristic value.
  14. 如权利要求11所述的服务器,其特征在于,所述所述卡方统计量包括第二卡方统计量,The server according to claim 11, wherein the chi-square statistic comprises a second chi-square statistic,
    所述对所述非线性组合特征集的特征值求取设定时间段内所述所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量的步骤包括:The step of calculating the average value of all the corresponding characteristic values in a set time period for the characteristic values of the non-linear combination characteristic set, and calculating the chi-square statistics of the characteristic values of the terminal device includes:
    根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的多个特征值,得到对应的非线性组合特征集的多个特征值的维度向量作为所述非线性组合特征集的特征值;According to the multiple feature values of the non-linear combination feature set for each network access request within the set time period, the dimension vectors of the multiple feature values of the corresponding non-linear combination feature set are obtained as the non-linear combination feature set Eigenvalues;
    根据所有所述非线性组合特征集的特征值,求取对应的第二平均值;Obtaining a corresponding second average value according to the feature values of all the nonlinear combination feature sets;
    根据所述非线性组合特征集的特征值和所述第二平均值,并计算得到所述终端设备的所述非线性组合特征集的特征值的第二卡方统计量。According to the feature value of the nonlinear combination feature set and the second average value, a second chi-square statistic of the feature value of the nonlinear combination feature set of the terminal device is calculated.
  15. 如权利要求14所述的服务器,其特征在于,所述判断阈值包括第二判断阈值,The server according to claim 14, wherein the judgment threshold comprises a second judgment threshold,
    所述当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点的步骤之前,还包括:When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, before the step of determining the corresponding characteristic value as an outlier, the method further includes:
    根据设定时间段内所有网络访问请求对应的所述第二卡方统计量形成的曲线,并得到所述曲线的最大斜率对应的第二卡方统计量作为第二判断阈值。A curve formed according to the second chi-square statistic corresponding to all network access requests within a set time period, and the second chi-square statistic corresponding to the maximum slope of the curve is obtained as the second judgment threshold.
  16. 一种存储介质,其特征在于,所述存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现以下步骤:A storage medium, characterized in that computer-readable instructions are stored on the storage medium, and the following steps are implemented when the computer-readable instructions are executed by a processor:
    响应终端设备发送的网络访问请求,利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集;In response to a network access request sent by a terminal device, use a script to obtain multiple non-linear feature values of the terminal device, and form a non-linear combination feature set;
    对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量;Calculating the average value of all corresponding feature values within a set time period for the feature values of the nonlinear combination feature set, and calculating the chi-square statistics of the feature values of the terminal device;
    当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点;When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, determining that the corresponding characteristic value is an outlier;
    根据得到的离群点的特征值,判定得到所述网络访问为异常访问;According to the obtained characteristic value of the outlier, it is determined that the network access is abnormal access;
    其中,所述非线性组合特征集包括原始类别的特征信息的特征值和有效衍生特征信息的特征值,所述利用脚本获取所述终端设备的多个非线性的特征值,并形成非线性组合特征集的步骤包括:Wherein, the non-linear combination feature set includes the feature value of the feature information of the original category and the feature value of the effective derived feature information, and the script is used to obtain a plurality of non-linear feature values of the terminal device and form a non-linear combination The steps of the feature set include:
    利用脚本获取原始类别的特征信息的特征值;Use the script to obtain the feature value of the feature information of the original category;
    通过对原始类别的特征信息的特征值进行度量数据散布计算,得到识 别离群点的有效衍生特征信息的特征值。By performing measurement data distribution calculation on the feature value of the feature information of the original category, the feature value of the effective derived feature information for identifying outliers is obtained.
  17. 如权利要求16所述的存储介质,其特征在于,所述卡方统计量包括第一卡方统计量,15. The storage medium of claim 16, wherein the chi-square statistic comprises a first chi-square statistic,
    所述对所述非线性组合特征集的特征值求取设定时间段内所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量的步骤,包括:The step of calculating the characteristic value of the non-linear combination characteristic set for the average of all corresponding characteristic values within a set time period and calculating the chi-square statistics of the characteristic value of the terminal device includes:
    根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的一特定的必要特征值,求取所述必要特征值的第一平均值;Obtaining a first average value of the necessary characteristic value according to a specific necessary characteristic value of the nonlinear combination characteristic set of each network access request within a set time period;
    根据所述必要特征值和所述第一平均值,计算得到所述终端设备的对应必要特征值的第一卡方统计量。According to the necessary characteristic value and the first average value, a first chi-square statistic corresponding to the necessary characteristic value of the terminal device is calculated.
  18. 如权利要求17所述的存储介质,其特征在于,所述判断阈值包括第一判断阈值,18. The storage medium of claim 17, wherein the judgment threshold comprises a first judgment threshold,
    所述当所述终端设备的特征值的卡方统计量大于对应的判断阈值时,确定对应的特征值为离群点的步骤之前,还包括:When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, before the step of determining the corresponding characteristic value as an outlier, the method further includes:
    根据所述设定时间段内所得到的第一卡方统计量,计算得到所述第一卡方统计量的四分位距的上限值作为所述特征值对应的第一判断阈值。According to the first chi-square statistic obtained within the set time period, the upper limit value of the interquartile range of the first chi-square statistic is calculated as the first judgment threshold corresponding to the characteristic value.
  19. 如权利要求16所述的存储介质,其特征在于,所述所述卡方统计量包括第二卡方统计量,The storage medium of claim 16, wherein the chi-square statistic comprises a second chi-square statistic,
    所述对所述非线性组合特征集的特征值求取设定时间段内所述所有对应的特征值的平均值,并计算所述终端设备的特征值的卡方统计量的步骤包括:The step of calculating the average value of all the corresponding characteristic values in a set time period for the characteristic values of the non-linear combination characteristic set, and calculating the chi-square statistics of the characteristic values of the terminal device includes:
    根据在设定时间段内每一次网络访问请求的所述非线性组合特征集的多个特征值,得到对应的非线性组合特征集的多个特征值的维度向量作为所述非线性组合特征集的特征值;According to the multiple feature values of the non-linear combination feature set for each network access request within the set time period, the dimension vectors of the multiple feature values of the corresponding non-linear combination feature set are obtained as the non-linear combination feature set Eigenvalues;
    根据所有所述非线性组合特征集的特征值,求取对应的第二平均值;Obtaining a corresponding second average value according to the feature values of all the nonlinear combination feature sets;
    根据所述非线性组合特征集的特征值和所述第二平均值,并计算得到所述终端设备的所述非线性组合特征集的特征值的第二卡方统计量。According to the feature value of the nonlinear combination feature set and the second average value, a second chi-square statistic of the feature value of the nonlinear combination feature set of the terminal device is calculated.
  20. 如权利要求19所述的存储介质,其特征在于,所述判断阈值包括第二判断阈值,The storage medium according to claim 19, wherein the judgment threshold value comprises a second judgment threshold value,
    所述当所述终端设备的特征值的卡方统计量大于对应的判断阈值时, 确定对应的特征值为离群点的步骤之前,还包括:When the chi-square statistic of the characteristic value of the terminal device is greater than the corresponding judgment threshold, before the step of determining the corresponding characteristic value as an outlier, the method further includes:
    根据设定时间段内所有网络访问请求对应的所述第二卡方统计量形成的曲线,并得到所述曲线的最大斜率对应的第二卡方统计量作为第二判断阈值。A curve formed according to the second chi-square statistic corresponding to all network access requests within a set time period, and the second chi-square statistic corresponding to the maximum slope of the curve is obtained as the second judgment threshold.
PCT/CN2019/118378 2019-06-28 2019-11-14 Network access abnormality determination method and apparatus, server, and storage medium WO2020258670A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910580047.2 2019-06-28
CN201910580047.2A CN110311909B (en) 2019-06-28 2019-06-28 Method and device for judging abnormity of network access of terminal equipment

Publications (1)

Publication Number Publication Date
WO2020258670A1 true WO2020258670A1 (en) 2020-12-30

Family

ID=68079530

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118378 WO2020258670A1 (en) 2019-06-28 2019-11-14 Network access abnormality determination method and apparatus, server, and storage medium

Country Status (2)

Country Link
CN (1) CN110311909B (en)
WO (1) WO2020258670A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110311909B (en) * 2019-06-28 2021-12-24 平安科技(深圳)有限公司 Method and device for judging abnormity of network access of terminal equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003014974A2 (en) * 2001-07-09 2003-02-20 Polyvista, Inc A method for generating multidimensional summary reports from multidimensional data
US20070088534A1 (en) * 2005-10-18 2007-04-19 Honeywell International Inc. System, method, and computer program for early event detection
CN105915555A (en) * 2016-06-29 2016-08-31 北京奇虎科技有限公司 Method and system for detecting network anomalous behavior
CN109582741A (en) * 2018-11-15 2019-04-05 阿里巴巴集团控股有限公司 Characteristic treating method and apparatus
CN109905362A (en) * 2019-01-08 2019-06-18 平安科技(深圳)有限公司 User request detection method and device, computer equipment and storage medium
CN110311909A (en) * 2019-06-28 2019-10-08 平安科技(深圳)有限公司 The abnormality determination method and device of terminal device network access

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8934352B2 (en) * 2011-08-30 2015-01-13 At&T Intellectual Property I, L.P. Hierarchical anomaly localization and prioritization
US10867036B2 (en) * 2017-10-12 2020-12-15 Cisco Technology, Inc. Multiple pairwise feature histograms for representing network traffic
CN108806695A (en) * 2018-04-17 2018-11-13 平安科技(深圳)有限公司 Anti- fraud method, apparatus, computer equipment and the storage medium of self refresh
CN108881275B (en) * 2018-07-06 2021-07-23 武汉思普崚技术有限公司 Method and system for analyzing access compliance of user

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003014974A2 (en) * 2001-07-09 2003-02-20 Polyvista, Inc A method for generating multidimensional summary reports from multidimensional data
US20070088534A1 (en) * 2005-10-18 2007-04-19 Honeywell International Inc. System, method, and computer program for early event detection
CN105915555A (en) * 2016-06-29 2016-08-31 北京奇虎科技有限公司 Method and system for detecting network anomalous behavior
CN109582741A (en) * 2018-11-15 2019-04-05 阿里巴巴集团控股有限公司 Characteristic treating method and apparatus
CN109905362A (en) * 2019-01-08 2019-06-18 平安科技(深圳)有限公司 User request detection method and device, computer equipment and storage medium
CN110311909A (en) * 2019-06-28 2019-10-08 平安科技(深圳)有限公司 The abnormality determination method and device of terminal device network access

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CAO, QIANG ET AL.: "An Overview of Outlier Recognition Methods", SOFTWARE GUIDE, vol. 18, no. 6, 4 January 2019 (2019-01-04), DOI: 20200313090104Y *

Also Published As

Publication number Publication date
CN110311909B (en) 2021-12-24
CN110311909A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
JP7319370B2 (en) Systems and methods for behavioral threat detection
US11537940B2 (en) Systems and methods for unsupervised anomaly detection using non-parametric tolerance intervals over a sliding window of t-digests
US10922206B2 (en) Systems and methods for determining performance metrics of remote relational databases
US20170302545A1 (en) Device Cloud Monitoring and Stability
CN110392046B (en) Method and device for detecting abnormity of network access
WO2020181897A1 (en) Terminal device identification method, system, and storage medium
US10404566B2 (en) Online techniques for parameter mean and variance estimation in dynamic regression models
KR102131029B1 (en) System and method for clustering IoT device
CN110737891A (en) host intrusion detection method and device
US11315010B2 (en) Neural networks for detecting fraud based on user behavior biometrics
WO2020057523A1 (en) Method and device for triggering vulnerability detection
JP2022512195A (en) Systems and methods for behavioral threat detection
JP7389860B2 (en) Security information processing methods, devices, electronic devices, storage media and computer programs
WO2020258509A1 (en) Method and device for isolating abnormal access of terminal device
WO2020258670A1 (en) Network access abnormality determination method and apparatus, server, and storage medium
US20220058745A1 (en) System and method for crowdsensing-based insurance premiums
WO2019047488A1 (en) Method for realizing interaction between service system and multiple assemblies, electronic device and storage medium
CN110008261B (en) External change detection
CN110417744B (en) Security determination method and device for network access
CN117093798A (en) Rendering abnormality detection method and device, electronic equipment and storage medium
WO2020232903A1 (en) Monitoring task dynamic adjustment method and apparatus, and computer device and storage medium
JP2021502789A5 (en)
US20210334107A1 (en) PREDICTING EXECUTION DURATION FOR USER-DEFINED FUNCTIONS ON FUNCTION-AS-A-SERVICE (FaaS) CLOUD COMPUTING PLATFORMS
CN113609516A (en) Information generation method and device based on abnormal user, electronic equipment and medium
US10803094B1 (en) Predicting reach of content using an unresolved graph

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19934950

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19934950

Country of ref document: EP

Kind code of ref document: A1