WO2023093017A1 - 一种Web服务设备的识别方法及装置 - Google Patents

一种Web服务设备的识别方法及装置 Download PDF

Info

Publication number
WO2023093017A1
WO2023093017A1 PCT/CN2022/100025 CN2022100025W WO2023093017A1 WO 2023093017 A1 WO2023093017 A1 WO 2023093017A1 CN 2022100025 W CN2022100025 W CN 2022100025W WO 2023093017 A1 WO2023093017 A1 WO 2023093017A1
Authority
WO
WIPO (PCT)
Prior art keywords
device type
web service
access request
service device
value
Prior art date
Application number
PCT/CN2022/100025
Other languages
English (en)
French (fr)
Inventor
徐奎
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2023093017A1 publication Critical patent/WO2023093017A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Definitions

  • Embodiments of the present invention relate to the field of financial technology (Fintech), and in particular, to a method and device for identifying a Web service device.
  • Fetech financial technology
  • the first identification method is to use the web client to initiate a normal access request to the web server, and perform feature matching and identification of the device type related information of the web server in the response page returned by the web server, so as to determine the The device type of the Web server (some Web servers will mark the device type related information of the Web server in the server field in the header of the returned message, and some Web servers will include the device type related information of the Web server in the returned response page).
  • the second identification method is to construct an access request on the Web client that causes the Web server to report an error (such as constructing a non-existent access path in the access request or constructing an error parameter in the access request, etc.), and initiate the constructed request to the Web server.
  • the device type of the Web server is determined through feature matching and identification based on the information about the device type of the Web server in the error response page returned by the Web server.
  • these two identification methods will make the device type related information of the web server
  • feature matching determines the device type of the Web server, the rate of false positives is high and the rate of false positives is high.
  • Embodiments of the present invention provide a method and device for identifying a Web service device, which are used to effectively improve the accuracy of identifying the device type of the Web service device.
  • an embodiment of the present invention provides a method for identifying a Web service device, including:
  • the Web client constructs a first access request and at least one second access request for accessing the Web service device; the first access request is used to characterize multi-parameter values for pollutable parameters in the access request of the Web service device Polluting the constructed request; each second access request is used to represent a request constructed without polluting the pollutable parameters in the access request of the Web service device;
  • the Web client For each second response page of the second access request, the Web client performs a similarity calculation on the first response page of the first access request and the second response page of the second access request to determine the first response page a similarity value;
  • the Web client determines that the first similarity value is greater than the similarity threshold, it determines from the device types of at least one Web service device that matches the second response page that the Web service device matches the the first device type using the programming language;
  • the Web client determines the target device type of the Web service device based on the first device type.
  • the prior art solution is to obtain the device type related information of the Web server from the response page returned by the Web server, and perform feature matching on the device type related information of the Web server, to determine the device type of the Web server, Therefore, if the operation and maintenance personnel of the web server forge or erase the information related to the device type of the web server, or the operation and maintenance personnel customize the settings of the returned server information fields, then the feature matching identification method cannot be used to accurately identify Displays the device type of the web server.
  • the technical solution in the present invention jointly identifies the device type of the Web service device by introducing parameter pollution, which can accurately detect the real response status of the Web service device to the parameter pollution access request, and according to the parameters of the Web service device
  • the real response status of the polluted access request and the real response status of the Web service device to the access request without parameter pollution can accurately identify the response behavior characteristics of the Web service device, so that the device of the Web service device can be accurately identified Type, so that the recognition accuracy of the device type of the Web service device can be effectively improved, thereby effectively reducing the false negative rate and false positive rate generated during feature matching and recognition.
  • the Web client constructs a first access request and at least one second access request for accessing the Web service device; the first access request is used to characterize the multi-parameter value Polluting the constructed request; each second access request is used to represent a request constructed without polluting the pollutable parameters in the access request to the Web service device.
  • the second response page of each second access request perform a similarity calculation on the first response page of the first access request and the second response page of the second access request, so as to determine the first similarity value , and compare the first similarity value with the similarity threshold, you can determine whether the response behavior of the Web service device to the first access request is the same as the response behavior to the second access request, so as to preliminarily determine whether the Web service device supported device types.
  • the first device type can be further determined from the device types of at least one Web service device that matches the second response page according to the programming language used by the Web service device , and based on the first device type, the target device type of the Web service device can be accurately determined, thereby effectively improving the recognition accuracy of the device type of the Web service device, thereby effectively reducing the occurrence of feature matching and identification false negative rate and false negative rate.
  • the method also includes:
  • the Web client constructs a third access request and a fourth access request for accessing the Web service device; the third access request is used to represent a normal access request for the Web service device; the fourth access request Used to represent a request for abnormal access to the web service device;
  • the Web client determines the second device type of the Web service device by performing feature matching on the third response page of the third access request, and performs a feature matching on the fourth response page of the fourth access request. Feature matching, determining the third device type of the Web service device;
  • the Web client determines the target device type of the Web service device based on the first device type, including:
  • the Web client determines the target device type of the Web service device according to the first device type, the second device type, and the third device type.
  • the technical solution in the present invention introduces the structure of normal access request and abnormal access request, and according to the response behavior of the Web service device to the normal access request , determine the second device type of the corresponding Web service device, and determine the third device type of the corresponding Web service device according to the response behavior of the Web service device to the abnormal access request. Then, combining the first device type, the second device type and the third device type can help to more accurately identify the actual device type to which the Web service device belongs.
  • the Web client determines that the first similarity value is greater than a similarity threshold, it determines from the device type of at least one of the Web service devices that match the second response page that conforms to the The first device type of the programming language used by the Web service device, including:
  • the Web client determines that the first parameter value of the pollutable parameter in the first response page is the same as the pollutable parameter in the second response page.
  • the second parameter values of the parameters are the same, at least one device type matching the second parameter value is determined from the device type library;
  • the Web client determines the first device type of the Web service device from at least one device type matching the second parameter value according to the programming language used by the Web service device.
  • the parameter value of the pollutable parameter selected by the Web service device for the first access request is the same as the parameter value of the pollutable parameter selected for the second access request, and because the pollutable parameter selected by the Web service device for the second access request
  • the parameter value of the parameter is fixed and single, so it can be determined which device types of Web service devices choose a single parameter value based on the historical performance behavior data of Web service devices of each device type for pollutable parameters with multiple parameter values , for example, for a certain taintable parameter with multi-parameter values, the choice of the First parameter value of the taintable parameter can be determined according to the historical performance data of Web service devices of each device type for the taintable parameter with multi-parameter values What are the device types of the Web service device, and what are the device types of the Web service device that selects the value of the Last parameter of the pollutable parameter.
  • this solution can accurately determine whether the first response page is roughly the same as the second response page by judging whether the first similarity value is greater than the similarity threshold, so that it is directly unnecessary to judge whether the first parameter value is different from the second response page.
  • the process of whether the second parameter value is the same can also reduce potential false positives (for example, the parameter value does not exist, but the response page is actually roughly the same).
  • the method also includes:
  • the Web client determines that the first similarity values of the second response pages are all less than or equal to the similarity threshold or the first parameter value is different from the second parameter value, then after determining that the first parameter value is each When combining the second parameter values, constructing a fifth access request for accessing the Web service device according to the second parameter values;
  • the web client determines a second similarity value between the fifth response page and the first response page for the fifth response page of the fifth access request
  • the Web client determines that the second similarity value is greater than the similarity threshold, then determine at least one device type that matches the first parameter value from the device type library;
  • the Web client determines the first device type of the Web service device from at least one device type matching the first parameter value according to the programming language used by the Web service device.
  • the first second response page and the first response page The first similarity value is greater than the similarity threshold, but the first parameter value is different from the second parameter value in the first second response page, and the first similarity between the second second response page and the first response page The value is less than or equal to the similarity threshold, or the first similarity value of the second second response page and the first response page is greater than the similarity threshold, but the first parameter value is the same as the second parameter value in the second second response page If they are not the same, it is necessary to judge whether the first parameter value in the first response page is a combination of the second parameter values, so as to determine the device type range of the Web service device, so as to accurately identify the Web service device subsequently.
  • the actual device type provides support.
  • the method also includes:
  • the Web client determines that the second similarity value is less than or equal to the similarity threshold or that the first parameter value is not a combination of the second parameter values, then from each first similarity value and the first Determine the largest similarity value among the two similarity values;
  • the Web client determines that the maximum similarity value is less than the similarity threshold, then according to the programming language used by the Web service device, determine the first name of the Web service device from the device type library. Equipment type.
  • the slave device If it is determined that the second similarity value is less than or equal to the similarity threshold or the first parameter value is not a combination of the second parameter values, it is necessary to compare the first similarity value and the second similarity value , determine the maximum similarity value, and judge whether the maximum similarity value is greater than the similarity threshold, if the maximum similarity value is smaller than the similarity threshold, then according to the programming language used by the Web service device, the slave device
  • the first device type of the Web service device is preliminarily determined in the type library, thereby providing support for subsequently accurately identifying the actual device type of the Web service device.
  • the method also includes:
  • the web service device uses a programming language to determine the first device type of the Web service device from at least one device type that matches the second parameter value; or, after determining the maximum similarity value as the For the second similarity value, the first device type of the Web service device is determined from at least one device type matching the first parameter value according to the programming language used by the Web service device.
  • the maximum similarity value if the maximum similarity value is greater than the similarity threshold, then the maximum similarity value can be compared with each first similarity value and the second similarity value, if it is determined that the maximum similarity degree value is the same as a certain similarity value, the device type range corresponding to the similarity value can be used as the device type range of the Web service device, and then, according to the programming language used by the Web service device, the device type range of the Web service device can be further narrowed down Type range, so that a smaller range of device types can be obtained, which can help to improve the subsequent identification efficiency of device types for Web service devices, and can provide more accurate identification of device types for subsequent Web service devices. support.
  • the web client performs a similarity calculation on the first response page of the first access request and the second response page of the second access request to determine a first similarity value, including:
  • the Web client obtains the page source code of the first response page and the page source code of the second response page, and performs word segmentation for the page source code of the first response page and the page source code of the second response page respectively Processing, determining the first word segmentation set corresponding to the first response page and the second word segmentation set corresponding to the second response page;
  • the Web client merges and deduplicates each first participle in the first participle set and each second participle in the second participle set to obtain a third participle set;
  • the Web client uses each third participle in the third participle set as a key, and sets a corresponding numerical value for each key to obtain a key-value data set;
  • the Web client converts each first participle in the first participle set into a corresponding numerical value to obtain a first numerical value set, and converts each second participle in the second participle set Convert to the corresponding numerical value to obtain the second numerical value set;
  • the Web client performs encoding processing on the first numerical set to obtain a first vector set, and performs encoding processing on the second numerical set to obtain a second vector set;
  • the Web client determines the first similarity value by using the first vector set and the second vector set through a set similarity algorithm.
  • the page source code of the response page can truly reflect the actual behavior characteristics of the response page
  • the first response page and the second response page are calculated based on the page source code of the first response page and the page source code of the second response page.
  • the similarity value of the pages can accurately determine whether the response behavior of the first response page is approximately the same as that of the second response page, thereby accurately determining whether the first response page is approximately the same as the second response page.
  • the Web client determines the target device type of the Web service device according to the first device type, the second device type, and the third device type, including:
  • the Web client determines that there is only one sub-device type in the first device type, then determine the sub-device type as the target device type of the Web service device;
  • the Web client determines that there are at least two sub-device types in the first device type, when determining that the third device type exists in the first device type, determine the third device type as The target device type of the Web service device; or, when it is determined that the third device type is a null value and it is determined that the second device type exists in the first device type, determine the second device type is the target device type of the Web service device.
  • the real response behavior of the Web service device can reflect the specific behavior characteristics of the Web service device for parameter pollution access requests, and
  • the specific behavioral characteristics of a Web service device cannot be easily changed.
  • the second device type and the third device type are more accurate, and can better reflect the actual device type to which the Web service device belongs. Therefore, the technical solution in the present invention further accurately determines the target device type of the Web service device by first judging whether there are multiple sub-device types in the first device type.
  • the target device type of the Web service device can be determined more accurately, that is, the real device type of the Web service device can be determined.
  • the method also includes:
  • any subtype of the first device type is determined as the target device type of the Web service device.
  • any sub-device type in the first device type can be directly used as the target device type of the Web service device .
  • the embodiment of the present invention also provides an identification device for a Web service device, including:
  • a construction unit configured to construct a first access request and at least one second access request for accessing a Web service device; the first access request is used to characterize multiple A request constructed by polluting parameter values; each second access request is used to represent a request constructed without polluting the pollutable parameters in the access request to the Web service device;
  • the processing unit is configured to perform a similarity calculation on the first response page of the first access request and the second response page of the second access request for each second response page of the second access request, and determine the first A similarity value; if it is determined that the first similarity value is greater than the similarity threshold, then determine from the device type of at least one of the Web service devices that match the second response page that conforms to the Web service device Using a first device type of a programming language; determining a target device type of the Web service device based on the first device type.
  • an embodiment of the present invention provides a computing device, including at least one processor and at least one memory, wherein the memory stores a computer program, and when the program is executed by the processor, the processing The device executes the method for identifying a Web service device described in any of the above first aspects.
  • an embodiment of the present invention provides a computer-readable storage medium, which stores a computer program executable by a computing device, and when the program runs on the computing device, the computing device executes the above-mentioned first The method for identifying any of the Web service devices described in the aspect.
  • FIG. 1 is a schematic flowchart of a method for identifying a Web service device provided by an embodiment of the present invention
  • FIG. 2 is a schematic flow diagram of determining a device type C of a Web server provided by an embodiment of the present invention
  • FIG. 3 is a schematic flow diagram of determining a target device type of a Web service device provided by an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of an identification device for a Web service device provided by an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.
  • Web (World Wide Web, World Wide Web) server also known as WWW server, generally refers to a website server, which mainly provides online information browsing and download services, and can handle requests from web clients such as browsers and return corresponding responses.
  • common web server device types include Apache, IIS, Tomcat, and the like.
  • HTTP HyperText Transfer Protocol, Hypertext Transfer Protocol
  • HTTP Parameter Pollution It is a common attack method to bypass security devices. This problem is mainly due to the fact that the current HTTP standard does not define which one should be selected as the final value of the parameter when the web server has multiple values in the face of a parameter; resulting in different web servers encountering a multi-value scenario of a parameter , there are different processing methods, so that the device type of the back-end Web server can be obtained through HTTP parameter pollution.
  • One-Hot encoding also known as one-bit effective encoding, mainly uses N-bit status registers to encode N states, each state has its own independent register bit, and only one bit is valid at any time .
  • FIG. 1 exemplarily shows the flow of a method for identifying a Web service device provided by an embodiment of the present invention, and the flow can be executed by an identification device for a Web service device.
  • the process specifically includes:
  • Step 101 a Web client constructs a first access request and at least one second access request for accessing a Web service device.
  • the Web client constructs the first access request formed by polluting a certain parameter in the Web page of the Web service device that can be polluted, and at the same time constructs at least one Web page for the Web service device At least one second access request formed by accessing respectively.
  • the first access request is used to represent the request constructed by polluting the taintable parameters in the access request for the Web service device; each second access request is used to represent the taintable parameter in the access request for the Web service device The tainted parameter did not taint the constructed request.
  • the technical solution in the present invention introduces the construction of normal access requests and abnormal access requests, that is, constructs the third access request and the fourth access request for accessing Web service devices access request.
  • the third access request is used to represent a request for normal access to the Web service device;
  • the fourth access request is used to represent a request for abnormal access to the Web service device.
  • the Web service device After the Web service device receives the third access request, it will process the third access request, return a third response page to be displayed on the Web client, and after the Web service device receives a fourth access request , that is, the fourth access request will be processed, and a fourth response page will be returned and displayed on the web client.
  • the Web client determines the second device type of the Web service device by performing feature matching on the third response page of the third access request, and determines the Web service device by performing feature matching on the fourth response page of the fourth access request of the third device type.
  • a certain number of access requests will first be constructed on the Web client corresponding to the Web server, for example, the first one is the default page of the Web server.
  • the access request A for accessing is to normally access the Web server through the default URL of the Web server, such as URL (Uniform Resource Locator, Uniform Resource Locator).
  • the second is to construct an access request B that causes the web server to report an error, that is, an abnormal access request B, for example, by constructing a non-existent access path in a certain URL of the web server, adding illegal parameters, or constructing illegal protocol fields, etc.
  • the third is to construct an access request formed by polluting a parameter in a Web page of the Web server, and to construct at least one request for normal access to at least one Web page of the Web server.
  • the Web page crawling operation is performed on the Web server to obtain the relevant parameter information of each Web page of the Web server, and at the same time, the URL of each Web page of the Web server can be obtained, and from the relevant parameters of each Web page Parameter pollution is carried out by identifying parameters that can be used by HTTP parameter pollution in the parameter information, such as page parameters or ID (Identification, identification) parameters.
  • identifying parameters can be used by HTTP parameter pollution in the parameter information, such as page parameters or ID (Identification, identification) parameters.
  • the page parameter is a parameter that can be utilized by HTTP parameter pollution.
  • Step 102 for the second response page of each second access request, the Web client performs a similarity calculation on the first response page of the first access request and the second response page of the second access request, A first similarity value is determined.
  • the Web service device After the Web service device receives the first access request, it will process the first access request, return a first response page and display it on the Web client, and the Web service device will After the second access request, the second access request will be processed, and a second response page will be returned and displayed on the web client.
  • the Web client obtains the page source code of the first response page and the page source code of the second response page, and performs word segmentation processing on the page source code of the first response page to obtain The first word segmentation set corresponding to the first response page and the page source code of the second response page are subjected to word segmentation processing to obtain the second word segmentation set corresponding to the second response page.
  • each first participle in the first participle set and each second participle in the second participle set can be obtained, and each third participle in the third participle set is used as a key , set the corresponding value for each key to get the key-value data set.
  • each first participle in the first participle set is converted into a corresponding value to obtain the first value set
  • each second participle in the second participle set is converted into a corresponding value to obtain the second participle
  • the first value set is coded to obtain a first vector set
  • the second value set is coded to obtain a second vector set.
  • the first similarity value between the first response page and the second response page can be calculated by performing similarity calculation on the first vector set and the second vector set through the set similarity algorithm.
  • this solution calculates the first response page and the second response page based on the page source code of the first response page and the page source code of the second response page can accurately determine whether the response behavior of the first response page is approximately the same as the response phase of the second response page, thereby accurately determining whether the first response page is approximately the same as the second response page.
  • Step 103 If the Web client determines that the first similarity value is greater than the similarity threshold, it determines from the device type of at least one of the Web service devices that match the second response page that conforms to the Web service device type.
  • the first device type of the programming language used by the service device is the programming language used by the service device.
  • the first parameter of the pollutable parameter in the first response page is determined
  • the value is the same as the second parameter value of the pollutable parameter in the second response page, and because the parameter value of the pollutable parameter selected by the Web service device for the second access request is fixed and single, it can be obtained from the device type library At least one device type that matches the second parameter value is determined in .
  • the device type library is used to store each parameter value (such as First parameter value and/or Last parameter value) and the device type corresponding to the parameter value and the programming language used.
  • the Web service device of each device type determines which device types of the Web service device select the First parameter value of the taintable parameter, and selects the Last parameter value of the taintable parameter. What are the device types of the Web service device for the parameter value, or what are the device types for the Web service device that selects the All parameter value (including the First parameter value and the Last parameter value) of the pollutable parameter.
  • the similarity threshold may be set according to experience of persons skilled in the art or according to a specific application scenario of the technical solution, which is not limited in this embodiment of the present invention.
  • first similarity values of each second response page are less than or equal to the similarity threshold, or assuming that there are two second response pages, the first similarity value of the first second response page and the first response page Greater than the similarity threshold, but the first parameter value is different from the second parameter value in the first second response page, and the first similarity value between the second second response page and the first response page is less than or equal to the similarity threshold, or the first similarity value between the second second response page and the first response page is greater than the similarity threshold, but the first parameter value is different from the second parameter value in the second response page, then it is required Judging whether the first parameter value in the first response page is a combination of the second parameter values is used to determine the device type range of the Web service device.
  • a fifth access request for accessing the Web service device may be constructed according to the second parameter values, and the fifth access request may be initiated to the Web service device.
  • the web service device After receiving the fifth access request, the web service device will process the fifth access request, and return a fifth response page to be displayed on the web client.
  • the web client determines the second similarity value between the fifth response page and the first response page through a set similarity algorithm. If it is determined that the second similarity value is greater than the similarity threshold, at least one device type matching the first parameter value (a combination of multiple second parameter values) is determined from the device type library.
  • the Web service device selects the All parameter value (including the First parameter value and the Last parameter value) of the pollutable parameter as the parameter value for final processing, then according to the All parameter value, the Web service device with multiple parameters can be targeted according to each device type.
  • the historical performance behavior data of the pollutable parameter of the value determines which device types of Web service devices select the value of the All parameter of the pollutable parameter.
  • the device type range of the Web service device can be further narrowed, that is, the first parameter of the Web service device can be determined from at least one device type that matches the first parameter value.
  • a device type so as to provide support for subsequently accurately identifying the actual device type of the Web service device.
  • the second similarity value is less than or equal to the similarity threshold or the first parameter value is not a combination of the second parameter values, it is necessary to compare the first similarity value and the second similarity value to determine The maximum similarity value, and judge whether the maximum similarity value is greater than the similarity value threshold, if the maximum similarity value is smaller than the similarity threshold, then according to the programming language used by the Web service device, from the device type library The first device type of the Web service device is preliminarily determined.
  • the maximum similarity value can be compared with each first similarity value and the second similarity value, if it is determined that the maximum similarity value is consistent with a certain If the similarity values (such as any one of the similarity values or the second similarity value) are the same, the device type range corresponding to the similarity value can be used as the device type range of the Web service device. Then, according to the programming language used by the Web service device, the range of device types of the Web service device is further narrowed, so that a narrow range of device types can be obtained.
  • the Web service device can be determined from at least one device type matching the second parameter value according to the programming language used by the Web service device. or, if it is determined that the maximum similarity value is the second similarity value, according to the programming language used by the Web service device, determine the Web service from at least one device type that matches the first parameter value The first device type of the service device. In this way, the solution can help to improve the subsequent identification efficiency of the device type of the Web service device, and can provide support for the subsequent identification of the device type of the Web service device more accurately.
  • the page parameter is a parameter that can be polluted.
  • a user can enter the URL for accessing the Web server on the URL input interface provided by the Web client.
  • the default URL is to initiate an access request A to the web server.
  • the web server processes the access request A and returns a response page for the access request A to be processed on the web client.
  • the device type A of the web server can be obtained by performing feature matching on the returned response page. For example, the value corresponding to the server field can be obtained from the header of the response message returned by the web server.
  • the value is Device type A for the web server.
  • the device type B of the web server can be obtained by performing feature matching processing on the returned error response page.
  • the URLs of each Web page (such as two Web pages) of the Web server respectively, such as URL1 and URL2, that is, to initiate an access request D and access to the Web server.
  • the web server processes the access request D after receiving the access request D, and returns a response page 1 for the access request D to be displayed on the web client, and will also receive the access request D After E, process the access request E, and return the response page 2 for the access request E to display on the Web client, at this time, the Page parameter of the response page 1 can be obtained through the response page 1 for the access request D
  • the value is 1, and the Page parameter value 2 of the response page 2 can be obtained through the response page 2 for the access request E.
  • the URL input interface provided by the Web client will also input a new URL constructed after parameter pollution, such as URL3, that is, to initiate an access request C to the Web server, and the Web server will receive the URL when receiving the URL.
  • the response page 3 can be obtained through the response page 3 for the access request C Page parameter value 3, the page parameter value 3 of the response page 3 may be a single one, or may be a combination of multiple page parameter values, for example, it may be a combination of Page parameter value 1 and Page parameter value 2. Then, by performing a similarity calculation on the response page 1, the response page 2, and the response page 3, and combining the programming language used by the Web server, the device type C of the Web server can be obtained.
  • Fig. 2 is a schematic flowchart of determining the device type C of the Web server provided by the embodiment of the present invention.
  • Step 201 acquiring response page 1 for URL1 , response page 2 for URL2 and response page 3 for URL3 .
  • the page parameter value 1 of response page 1 is determined based on the relevant information in response page 1, and based on the response page 2
  • the page parameter value 2 of the response page 2 is determined based on the relevant information of the response page 3
  • the page parameter value 3 of the response page 3 is determined based on the relevant information in the response page 3.
  • the page parameter value 1 of response page 1 is single
  • the page parameter value 2 of response page 2 is single
  • the page parameter value 3 of response page 3 may be single or a combination of multiple page parameter values.
  • the similarity calculation of Web response pages can be performed through similarity algorithms (such as Euclidean distance algorithm, Pearson correlation coefficient algorithm, or cosine similarity algorithm, etc.), for example, the similarity value between response page 1 and response page 3 can be calculated Or the similarity value of response page 2 and response page 3, etc.
  • similarity algorithms such as Euclidean distance algorithm, Pearson correlation coefficient algorithm, or cosine similarity algorithm, etc.
  • Step 202 determine whether the similarity value a1 of the response page 1 and the response page 3 is greater than a similarity threshold. If yes, go to step 203; if not, go to step 204.
  • the similarity value a1 of the response page 1 and the response page 3 can be calculated by performing a similarity calculation on the response page 1 and the response page 3 through the cosine similarity algorithm. Then, it is judged whether the similarity value a1 is greater than the similarity threshold. If the similarity value a1 is greater than the similarity threshold, perform step 203; if the similarity value a1 is less than or equal to the similarity threshold, perform step 204.
  • the similarity threshold can range from 0 to 1. The closer the similarity value is to 1, the higher the similarity is, and the closer the similarity value is to 0, the lower the similarity.
  • a similarity threshold For example, setting a similarity threshold to 0.7 or 0.75, etc., or a specific similarity threshold can be set according to the experience of those skilled in the art or according to the actual application scenario of the embodiment of the present invention, such as 0.75 or 0.8, etc., which is not limited in the embodiment of the present invention.
  • Step 203 determine whether the page parameter value 1 of the response page 1 is the same as the page parameter value 3 of the response page 3. If yes, go to step 204; if not, go to step 205.
  • Step 204 determine the device type set 1 of the Web server matching the page parameter value 1 from the Web server device type library.
  • the web server has selected a parameter value of the page parameter as the final processing parameter value, which can be obtained from Table 1
  • the device type of at least one Web server corresponding to a parameter value is determined in the Web server device type library. For example, for a page parameter with multiple parameter values, the Web server selects the First parameter value of the page parameter as the final processing parameter value, or the Web server selects the Last parameter value of the page parameter as the final processing parameter value, then the device type set 1 of the Web server can be obtained from the Web server device type library.
  • the page parameter value 1 is the First parameter value of the page parameter, that is, the Web server selects the First parameter value of the page parameter as the final processing parameter value, and the device type set 1 includes ⁇ JSP/Tomcat, Perl(CGI )/Apache ⁇ .
  • Table 1 is only a simple example for the convenience of describing the technical solution in the embodiment of the present invention, and does not constitute a limitation to the technical solution in the embodiment of the present invention.
  • web server parameter get function The obtained parameters ASP/IIS Request. QueryString("par") All (comma-delimited string) PHP/Apache $_GET("par") Last JSP/Tomcat Request. getParameter("par") first
  • the Apache server in Table 1 uses PHP as the website programming language, and the Last parameter value of the same parameter will be used as The final value of this parameter, and if Perl is used as the website programming language, the First parameter value of the same parameter will be taken as the final value of this parameter. For example, taking URL3 in the above example as an example, assuming that the device type of the Web server is IIS, if the website programming language of the Web server is ASP, then the parameter values of the page parameter obtained by the back end of the Web server are 1, 2 (for A combination of multiple page parameter values).
  • the device type of the web server is Apache
  • the website programming language of the web server is PHP
  • the backend of the web server will take the last parameter value of the page parameter as the final value for processing. In this way, by using this difference, the device type range of a certain Web server can be obtained without relying on feature matching for identification.
  • Step 205 determine whether the similarity value a2 between the response page 2 and the response page 3 is greater than the similarity threshold. If yes, go to step 206; if not, go to step 207.
  • the similarity value a2 of the response page 2 and the response page 3 can be calculated by performing a similarity calculation on the response page 2 and the response page 3 through the cosine similarity algorithm. Then, it is judged whether the similarity value a2 is greater than the similarity threshold. If the similarity value a2 is greater than the similarity threshold, perform step 206; if the similarity value a2 is less than or equal to the similarity threshold, perform step 207.
  • Step 206 determine whether the page parameter value 2 of the response page 2 is the same as the page parameter value 3 of the response page 3. If yes, go to step 207; if not, go to step 208.
  • Step 207 determine the device type set 2 of the Web server matching the page parameter value 2 from the Web server device type library.
  • the web server has selected a parameter value of the page parameter as the final processing parameter value, which can be obtained from Table 1
  • the device type of at least one Web server corresponding to a parameter value is determined in the Web server device type library. For example, for a page parameter with multiple parameter values, the Web server selects the Last parameter value of the page parameter as the final processing parameter value, or the Web server selects the value of the First parameter of the page parameter as the final processing parameter value, then the device type set 2 of the Web server can be obtained from the Web server device type library. For example, suppose the page parameter value 2 is the Last parameter value of the page parameter, that is, the Web server selects the Last parameter value of the page parameter as the final processing parameter value, and the device type set 2 includes ⁇ PHP/Apache ⁇ .
  • Step 208 determine whether the page parameter value 3 of the response page 3 is a combination of page parameter value 1 and page parameter value 2. If yes, go to step 209; if not, go to step 212.
  • step 212 is executed.
  • Step 209 according to the page parameter value 1 and the page parameter value 2, construct URL4 for accessing the web server, and obtain the response page 4 for the URL4.
  • Step 210 determine whether the similarity value a3 of the response page 3 and the response page 4 is greater than the similarity threshold. If yes, execute the steps; if not, execute the steps.
  • the similarity value a3 between the response page 3 and the response page 4 can be calculated by performing a similarity calculation on the response page 3 and the response page 4 through the cosine similarity algorithm. Then, it is judged whether the similarity value a3 is greater than the similarity threshold. If the similarity value a3 is greater than the similarity threshold, perform step 211; if the similarity value a3 is less than or equal to the similarity threshold, perform step 212.
  • Step 211 determine the device type set 3 of the Web server matching the page parameter value 3 from the Web server device type library.
  • the web server has selected the multi-parameter value of the page parameter as the final processing parameter value.
  • the multi-parameter can be determined from the web server device type library shown in Table 1
  • the device type set 3 of the web server is obtained, and the device type set 3 includes ⁇ ASP/IIS, Python/Apache ⁇ .
  • Step 212 compare the similarity value a1, the similarity value a2 and the similarity value a3, and determine the largest similarity value a0.
  • Step 213 determine whether the maximum similarity value a0 is greater than the similarity threshold. If yes, go to step 215; if not, go to step 214.
  • Step 214 Determine the device type C of the Web server from the Web server device type library according to the programming language used by the Web server.
  • Step 215 if it is determined that the maximum similarity value a0 is equal to the similarity value a1, then determine that the device type set of the Web server is device type set 1, and if it is determined that the maximum similarity value a0 is equal to the similarity value a2, then the device type set of the Web server The type set is device type set 2, and if it is determined that the maximum similarity value a0 is equal to the similarity value a3, it is determined that the device type set of the web server is device type set 3.
  • Step 216 according to the programming language used by the Web server, determine the device type C of the Web server from the device type set 1, the device type set 2 or the device type set 3 respectively.
  • the device type C of the Web server can be determined from the device type set 1 according to the programming language used by the Web server. For example, assuming that the programming language used by the web server is JSP, it can be determined from the device type set 1 that the device type C of the web server is Tomcat.
  • the device type C of the Web server can be determined from the device type set 2 according to the programming language used by the Web server. For example, assuming that the programming language used by the Web server is PHP, it can be determined from the device type set 2 that the device type C of the Web server is Apache.
  • the device type C of the Web server can be determined from the device type set 3 according to the programming language used by the Web server. For example, assuming that the programming language used by the web server is ASP, it can be determined from the device type set 3 that the device type C of the web server is IIS.
  • the following describes the specific implementation process of determining the similarity value between the response pages by taking the calculation of the similarity value between the response page 1 and the response page 3 as an example.
  • Step a obtain the page source code 1 of the response page 1 and the page source code 3 of the response page 3.
  • page source 3 is:
  • Step b Segment the page source code 1 to obtain the word segmentation set 1 corresponding to the response page 1, and perform word segmentation processing on the page source code 3 to obtain the word segmentation set 3 corresponding to the response page 3.
  • word segmentation processing for the page source code is mainly performed based on html tags and intermediate entities of the page source code.
  • word segmentation set 1 [ ⁇ html>, ⁇ head>, ⁇ title>, test1, ⁇ /title>, ⁇ /head>, ⁇ body>, Just a test1, ⁇ /body>, ⁇ /html>]
  • word segmentation set 2 [ ⁇ html>, ⁇ head>, ⁇ title>, test2, ⁇ /title>, ⁇ /head>, ⁇ body>, Just a test2, ⁇ /body>, ⁇ /html>].
  • step c the word segmentation set 1 and the word segmentation set 3 are merged and deduplicated to obtain a merged word segmentation set list.
  • the word segmentation set list2 [test1, test2, just a test1, just a test2, ⁇ html>, ⁇ head>, ⁇ title> , ⁇ /title>, ⁇ /head>, ⁇ body>, ⁇ /body>, ⁇ /html>].
  • the sorted word segment set list2 into a word segment set dict in key-value pair format, that is, use each word segment in the word segment set list2 as the key, and use each word segment in the word segment set list2 as the key in the word segment set list2
  • the word segmentation set obtained by merging and deduplication may be directly converted into a key-value pair format to obtain a word segmentation set dict.
  • step d according to the word segmentation set dict in the key-value pair format obtained after the conversion process, each word segmentation in word segmentation set 1 and word segmentation set 3 is converted into a corresponding value value.
  • step e the new word segmentation set 1 is encoded to obtain the vector set vector1
  • the new word segmentation set 3 is encoded to obtain the vector set vector3.
  • step f the similarity calculation is performed on the vector set vector1 corresponding to the response page 1 and the vector set vector3 corresponding to the response page 3 to obtain the similarity value between the response page 1 and the response page 3 .
  • the cosine similarity algorithm is used to perform similarity calculation on the above-mentioned vector set vector1 and vector set vector3, and the similarity value cos( ⁇ ) can be obtained ,Right now:
  • the similarity value between response page 1 and response page 3 can be obtained as 0.8, assuming that the similarity threshold is 0.75, it can be obtained that 0.8 is greater than 0.75, so it can be determined that the similarity value between response page 1 and response page 3 is greater than the similarity threshold .
  • Step 104 the Web client determines the target device type of the Web service device based on the first device type.
  • the technical solution in the present invention is not only constructed after parameter pollution is performed on a certain parameter that can be polluted in the Web page of the Web service device.
  • the first access request at the same time construct at least one second access request formed by separately accessing at least one Web page of the Web service device, and also construct a request for normal access to the default Web page of the Web service device and a request for Web service
  • the target device type of the Web service device can be determined more accurately by synthesizing the feature information of the response pages reflected in each access request.
  • the actual device type to which the Web service device belongs can be more accurately identified.
  • the first device type of the Web service device is determined based on the real response behavior of the Web service device, the real response behavior of the Web service device can reflect the specific behavior characteristics of the Web service device for parameter pollution access requests, and the Web service
  • the specific behavioral characteristics of the device cannot be easily changed. For example, the behavioral characteristics of an object cannot be easily changed, so it has non-repudiation.
  • the determined first device type of the Web service device is compared with the second device type.
  • the device type and the third device type are more accurate, and can better reflect the actual device type to which the Web service device belongs.
  • the technical solution in the present invention further accurately determines the target device type of the Web service device by first judging whether there are multiple sub-device types in the first device type. If there is only one sub-device type in the first device type, the Web client can directly use the sub-device type as the target device type of the Web service device; if it is determined that there are multiple sub-device types in the first device type, then after determining the first device type When the three device types exist in the first device type, determine the third device type as the target device type of the Web service device; or, determine that the third device type is a null value and determine that the second device type exists in the first device type When in the middle, determine the second device type as the target device type of the Web service device.
  • any sub-device type in the first device type can be directly used as the target device type of the Web service device.
  • FIG. 3 is a schematic flowchart of determining a target device type of a Web service device provided by an embodiment of the present invention.
  • Step 301 determine whether there is only one sub-device type in the device type C. If yes, go to step 302; if not, go to step 303.
  • Step 302 Determine only the sub-device type in the device type C as the target device type of the Web service device.
  • the device type of the Web service device obtained through HTTP parameter pollution is unique, so the unique device type can be used as the target device type of the Web service device.
  • Step 303 determine whether the device type B is not empty. If yes, go to step 304; if not, go to step 306.
  • step 304 It is judged whether the device type B is a null value, if not, execute step 304, and if it is null, execute step 306.
  • Step 304 determine whether device type B exists in device type C. If yes, go to step 305; if not, go to step 306.
  • the device type B exists in the device type C when it is determined that the device type B is not a null value.
  • Step 305 determining the device type B as the target device type of the Web service device.
  • Step 306 determine whether the device type A is not empty. If yes, go to step 307; if not, go to step 309.
  • Step 307 determine whether device type A exists in device type C. If yes, go to step 308; if not, go to step 309.
  • Step 308 determining device type A as the target device type of the Web service device.
  • Step 309 determining any sub-device type in the device type C as the target device type of the Web service device.
  • the above-mentioned embodiment shows that in the above-mentioned technical solution, because the existing technical solution is to obtain the device type-related information of the Web server from the response page returned by the Web server, and perform feature matching on the device-type related information of the Web server, to determine the Web The device type of the server. Therefore, if the operation and maintenance personnel of the web server forge or erase the information related to the device type of the web server, or the operation and maintenance personnel customize the settings of the returned server information fields, then by using feature matching to identify It is impossible to accurately identify the device type of the Web server.
  • the technical solution in the present invention jointly identifies the device type of the Web service device by introducing parameter pollution, which can accurately detect the real response status of the Web service device to the parameter pollution access request, and according to the parameters of the Web service device
  • the real response status of the polluted access request and the real response status of the Web service device to the access request without parameter pollution can accurately identify the response behavior characteristics of the Web service device, so that the device of the Web service device can be accurately identified Type, so that the recognition accuracy of the device type of the Web service device can be effectively improved, thereby effectively reducing the false negative rate and false positive rate generated during feature matching and recognition.
  • the Web client constructs a first access request and at least one second access request for accessing the Web service device; the first access request is used to characterize the multi-parameter value Polluting the constructed request; each second access request is used to represent a request constructed without polluting the pollutable parameters in the access request to the Web service device.
  • the second response page of each second access request perform a similarity calculation on the first response page of the first access request and the second response page of the second access request, so as to determine the first similarity value , and compare the first similarity value with the similarity threshold, you can determine whether the response behavior of the Web service device to the first access request is the same as the response behavior to the second access request, so as to preliminarily determine whether the Web service device supported device types.
  • the first device type can be further determined from the device types of at least one Web service device that matches the second response page according to the programming language used by the Web service device , and based on the first device type, the target device type of the Web service device can be accurately determined, thereby effectively improving the recognition accuracy of the device type of the Web service device, thereby effectively reducing the occurrence of feature matching and identification false negative rate and false negative rate.
  • FIG. 4 exemplarily shows an apparatus for identifying a Web service device provided by an embodiment of the present invention, and the apparatus can execute a flow of a method for identifying a Web service device.
  • the device includes:
  • a construction unit 401 configured to construct a first access request and at least one second access request for accessing a Web service device; the first access request is used to characterize the The request constructed by polluting multi-parameter values; each second access request is used to represent the request constructed without polluting the pollutable parameters in the access request to the Web service device;
  • the processing unit 402 is configured to perform a similarity calculation on the first response page of the first access request and the second response page of the second access request for each second response page of the second access request, and determine The first similarity value; if it is determined that the first similarity value is greater than the similarity threshold, it is determined from the device type of at least one of the Web service devices that match the second response page that the Web service device matches the The first device type using the programming language; based on the first device type, determine the target device type of the Web service device.
  • processing unit 402 is further configured to:
  • the third access request is used to represent a normal access request for the Web service device;
  • the fourth access request is used to represent a request for the Web service device A request for abnormal access by the above-mentioned Web service device;
  • the processing unit 402 is specifically used for:
  • a target device type of the Web service device is determined according to the first device type, the second device type, and the third device type.
  • processing unit 402 is specifically configured to:
  • the second response page whose first similarity value is greater than the similarity threshold, after determining the first parameter value of the pollutable parameter in the first response page and the second parameter of the pollutable parameter in the second response page When the values are the same, at least one device type matching the second parameter value is determined from the device type library;
  • the first device type of the Web service device is determined from at least one device type matching the second parameter value according to the programming language used by the Web service device.
  • processing unit 402 is further configured to:
  • At least one device type that matches the first parameter value is determined from the device type library
  • the first device type of the Web service device is determined from at least one device type matching the first parameter value according to the programming language used by the Web service device.
  • processing unit 402 is further configured to:
  • the first device type of the Web service device is determined from the device type library according to the programming language used by the Web service device.
  • processing unit 402 is further configured to:
  • the maximum similarity value is greater than or equal to the similarity threshold, when it is determined that the maximum similarity value is any one of the first similarity values, according to the use of the Web service device programming language, determining the first device type of the Web service device from at least one device type matching the second parameter value; or, after determining the maximum similarity value to the second similarity value At this time, the first device type of the Web service device is determined from at least one device type matching the first parameter value according to the programming language used by the Web service device.
  • processing unit 402 is specifically configured to:
  • Each first participle in the first participle set and each second participle in the second participle set are merged and deduplicated to obtain a third participle set;
  • each third participle in the third participle set as a key, setting a corresponding value for each key to obtain a key-value data set;
  • each first participle in the first participle set is converted into a corresponding numerical value to obtain a first numerical set
  • each second participle in the second participle set is converted into a corresponding numerical value , get the second value set;
  • the first similarity value is determined by using the first vector set and the second vector set through a set similarity algorithm.
  • processing unit 402 is specifically configured to:
  • the third device type determines the third device type as the Web service device or, when determining that the third device type is a null value and determining that the second device type exists in the first device type, determining the second device type as the Web service The device's target device type.
  • processing unit 402 is further configured to:
  • any subtype of the first device type is determined as the target device type of the Web service device.
  • an embodiment of the present invention also provides a computing device, as shown in FIG. 5 , including at least one processor 501 and a memory 502 connected to the at least one processor.
  • the specific connection medium between the processor 501 and the memory 502, the connection between the processor 501 and the memory 502 in FIG. 5 is taken as an example.
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the memory 502 stores instructions that can be executed by at least one processor 501, and at least one processor 501 can execute the steps included in the aforementioned method for identifying a Web service device by executing the instructions stored in the memory 502 .
  • the processor 501 is the control center of the computing device, which can use various interfaces and lines to connect various parts of the computing device, by running or executing instructions stored in the memory 502 and calling data stored in the memory 502, thereby realizing data deal with.
  • the processor 501 may include one or more processing units, and the processor 501 may integrate an application processor and a modem processor.
  • the call processor mainly handles issuing instructions. It can be understood that the foregoing modem processor may not be integrated into the processor 501 .
  • the processor 501 and the memory 502 can be implemented on the same chip, and in some embodiments, they can also be implemented on independent chips.
  • the processor 501 can be a general processor, such as a central processing unit (CPU), a digital signal processor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array or other programmable logic devices, discrete gates or transistors Logic devices and discrete hardware components can implement or execute the methods, steps and logic block diagrams disclosed in the embodiments of the present invention.
  • a general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiment of the method for identifying a Web service device can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
  • the memory 502 as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs and modules.
  • Memory 502 may include at least one type of storage medium, for example, may include flash memory, hard disk, multimedia card, card memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Magnetic Memory, Disk , CD, etc.
  • Memory 502 is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the memory 502 in the embodiment of the present invention may also be a circuit or any other device capable of implementing a storage function, and is used for storing program instructions and/or data.
  • an embodiment of the present invention also provides a computer-readable storage medium, which stores a computer program executable by a computing device, and when the program is run on the computing device, the computing device Execute the steps of the identification method of the above-mentioned Web service device.
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

本发明实施例提供了一种Web服务设备的识别方法及装置,该方法包括Web客户端构造用于访问Web服务设备的第一访问请求和至少一个第二访问请求,针对每个第二访问请求的第二响应页面,对第一访问请求的第一响应页面和第二访问请求的第二响应页面进行相似度运算,确定出第一相似度值,若确定第一相似度值大于相似度阈值,则从与第二响应页面匹配的至少一个Web服务设备的设备类型中确定出符合Web服务设备的使用编程语言的第一设备类型,基于第一设备类型,即可准确地确定出Web服务设备的目标设备类型,从而可以有效地提高Web服务设备的设备类型的识别准确性,以此可以有效地降低特征匹配识别时所产生的漏报率、误报率。

Description

一种Web服务设备的识别方法及装置
相关申请的交叉引用
本申请要求在2021年11月23日提交中国专利局、申请号为202111396969.1、申请名称为“一种Web服务设备的识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及金融科技(Fintech)领域,尤其涉及一种Web服务设备的识别方法及装置。
背景技术
随着计算机技术的发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技转变,但由于金融行业的安全性、实时性要求,也对技术提出的更高的要求。在金融信息安全方面,通常需要远程识别即将访问的网站具体使用了哪种类型的Web服务器,以便进一步对Web服务器进行资产画像和漏洞检测。
现阶段,通常采用两种识别方式对Web服务器的设备类型进行识别。具体来说,第一种识别方式是通过利用Web客户端向Web服务器发起正常访问请求,并对该Web服务器返回的响应页面中Web服务器的设备类型相关信息进行特征匹配识别,以此确定出该Web服务器的设备类型(有些Web服务器会在返回报文的header中的server字段标志Web服务器的设备类型相关信息,有些Web服务器会在返回的响应页面中包含Web服务器的设备类型相关信息)。第二种识别方式是通过在Web客户端构造使Web服务器报错的访问请求(比如在访问请求中构造不存在的访问路径或在访问请求中构造错误参数等),并向Web服务器发起该构造的访问请求,从针对Web服务器返回的报错响应页面中Web服务器的设备类型相关信息进行特征匹配识别,以此确定出Web服务器的设备类型。然而,这两种识别方式由于存在有些Web服务器的运维人员会主动针对Web服务器的设备类型相关信息进行伪造或抹除,因此会使得该两种识别方式在针对Web服务器的设备类型相关信息进行特征匹配确定Web服务器的设备类型时的漏报率高、误报率高。
综上,目前亟需一种Web服务设备的识别方法,用以有效地提高Web服务设备的设备类型的识别准确性。
发明内容
本发明实施例提供了一种Web服务设备的识别方法及装置,用以有效地提高Web服务设备的设备类型的识别准确性。
第一方面,本发明实施例提供了一种Web服务设备的识别方法,包括:
Web客户端构造用于访问Web服务设备的第一访问请求和至少一个第二访问请求;所述第一访问请求用于表征针对所述Web服务设备的访问请求中的可污染参数进行多参数 值污染所构造的请求;每个第二访问请求用于表征针对所述Web服务设备的访问请求中的可污染参数未进行污染所构造的请求;
针对每个第二访问请求的第二响应页面,所述Web客户端对所述第一访问请求的第一响应页面和所述第二访问请求的第二响应页面进行相似度运算,确定出第一相似度值;
所述Web客户端若确定所述第一相似度值大于相似度阈值,则从与所述第二响应页面匹配的至少一个所述Web服务设备的设备类型中确定出符合所述Web服务设备的使用编程语言的第一设备类型;
所述Web客户端基于所述第一设备类型,确定出所述Web服务设备的目标设备类型。
上述技术方案中,由于现有技术方案是通过从Web服务器返回的响应页面中获取Web服务器的设备类型相关信息,并对Web服务器的设备类型相关信息进行特征匹配,来确定Web服务器的设备类型,因此若Web服务器的运维人员对Web服务器的设备类型相关信息进行伪造或抹除,或者运维人员对返回的服务器信息字段进行自定义设置,那么通过采用特征匹配识别的方式就不能准确地识别出Web服务器的设备类型。基于此,本发明中的技术方案通过引入参数污染的方式来共同识别Web服务设备的设备类型,可以精确地探测到Web服务设备针对参数污染访问请求的真实响应状态,并根据Web服务设备针对参数污染访问请求的真实响应状态以及Web服务设备针对未进行参数污染的访问请求的真实响应状态,可以准确地识别出Web服务设备的响应行为特征,如此也就可以准确地识别出Web服务设备的设备类型,从而可以有效地提高Web服务设备的设备类型的识别准确性,以此可以有效地降低特征匹配识别时所产生的漏报率、误报率。具体来说,Web客户端构造用于访问Web服务设备的第一访问请求和至少一个第二访问请求;第一访问请求用于表征针对Web服务设备的访问请求中的可污染参数进行多参数值污染所构造的请求;每个第二访问请求用于表征针对Web服务设备的访问请求中的可污染参数未进行污染所构造的请求。再针对每个第二访问请求的第二响应页面,对第一访问请求的第一响应页面和该第二访问请求的第二响应页面进行相似度运算,以此可确定出第一相似度值,并将该第一相似度值与相似度阈值进行比对,即可确定Web服务设备针对第一访问请求的响应行为与针对第二访问请求的响应行为是否相同,从而为初步确定Web服务设备的设备类型提供支持。然后,在确定第一相似度值大于相似度阈值时,可进一步地根据Web服务设备的使用编程语言,从与第二响应页面匹配的至少一个Web服务设备的设备类型中确定出第一设备类型,并基于该第一设备类型,可准确地确定出Web服务设备的目标设备类型,从而可以有效地提高Web服务设备的设备类型的识别准确性,以此可以有效地降低特征匹配识别时所产生的漏报率、误报率。
可选地,所述方法还包括:
所述Web客户端构造用于访问Web服务设备的第三访问请求和第四访问请求;所述第三访问请求用于表征针对所述Web服务设备进行正常访问的请求;所述第四访问请求用于表征针对所述Web服务设备进行异常访问的请求;
所述Web客户端通过对所述第三访问请求的第三响应页面进行特征匹配,确定出所述Web服务设备的第二设备类型,并通过对所述第四访问请求的第四响应页面进行特征匹配,确定出所述Web服务设备的第三设备类型;
所述Web客户端基于所述第一设备类型,确定出所述Web服务设备的目设备标类型,包括:
所述Web客户端根据所述第一设备类型、所述第二设备类型以及所述第三设备类型,确定出所述Web服务设备的目标设备类型。
上述技术方案中,为了能够更准确地识别出Web服务设备的设备类型,本发明中的技术方案通过引入正常访问请求以及异常访问请求的构造,并根据Web服务设备针对该正常访问请求的响应行为,确定出对应的Web服务设备的第二设备类型,以及根据Web服务设备针对该异常访问请求的响应行为,确定出对应的Web服务设备的第三设备类型。然后,结合第一设备类型、第二设备类型以及第三设备类型,即可有助于更为准确地识别出Web服务设备所属的实际设备类型。
可选地,所述Web客户端若确定所述第一相似度值大于相似度阈值,则从与所述第二响应页面匹配的至少一个所述Web服务设备的设备类型中确定出符合所述Web服务设备的使用编程语言的第一设备类型,包括:
所述Web客户端针对第一相似度值大于相似度阈值的第二响应页面,在确定所述第一响应页面中可污染参数的第一参数值与所述第二响应页面中所述可污染参数的第二参数值相同时,从设备类型库中确定出与所述第二参数值匹配的至少一个设备类型;
所述Web客户端根据所述Web服务设备的使用编程语言,从与所述第二参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型。
上述技术方案中,在确定第一相似度值大于相似度阈值后,可以进一步通过判断第一响应页面中可污染参数的第一参数值与第二响应页面中可污染参数的第二参数值是否相同,来进一步地确定Web服务设备的设备类型范围。如果Web服务设备针对第一访问请求所选择的可污染参数的参数值与针对第二访问请求所选择的可污染参数的参数值相同,且由于Web服务设备针对第二访问请求所选择的可污染参数的参数值是固定的、单一的,因此可以根据各设备类型的Web服务设备针对具有多参数值的可污染参数的历史表现行为数据可确定出选择单一参数值的Web服务设备有哪些设备类型,比如针对具有多参数值的某一可污染参数,可以根据各设备类型的Web服务设备针对具有多参数值的可污染参数的历史表现行为数据,确定出选择该可污染参数的First参数值的Web服务设备有哪些设备类型,以及选择该可污染参数的Last参数值的Web服务设备有哪些设备类型。然后,根据该Web服务设备的使用编程语言,即可更进一步地缩小该Web服务设备的设备类型范围,从而为后续准确地识别出该Web服务设备的实际设备类型提供支持。此外,该方案通过判断第一相似度值是否大于相似度阈值,可以准确地判断第一响应页面是否与第二响应页面大致相同,如此可以直接省去响应页面不相同无需判断第一参数值与第二参数值是否相同的过程,同时也可以减少潜在的漏报(比如参数值不存在,但是响应页面实际上大致相同)。
可选地,所述方法还包括:
所述Web客户端若确定各第二响应页面的第一相似度值均小于等于所述相似度阈值或者第一参数值与第二参数值不相同,则在确定所述第一参数值为各第二参数值的组合时,根据所述各第二参数值构造出用于访问Web服务设备的第五访问请求;
所述Web客户端针对所述第五访问请求的第五响应页面,确定所述第五响应页面与所述第一响应页面的第二相似度值;
所述Web客户端若确定所述第二相似度值大于所述相似度阈值,则从所述设备类型库中确定出与所述第一参数值匹配的至少一个设备类型;
所述Web客户端根据所述Web服务设备的使用编程语言,从与所述第一参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型。
上述技术方案中,如果确定各第二响应页面的第一相似度值均小于等于所述相似度阈值,或者假设有两个第二响应页面,第一个第二响应页面与第一响应页面的第一相似度值大于相似度阈值,但是第一参数值与第一个第二响应页面中的第二参数值不相同,且第二个第二响应页面与第一响应页面的第一相似度值小于等于相似度阈值,或者第二个第二响应页面与第一响应页面的第一相似度值大于相似度阈值,但是第一参数值与第二个第二响应页面中的第二参数值不相同,则需要判断第一响应页面中的第一参数值是否为各第二参数值的组合,以此来确定Web服务设备的设备类型范围,从而为后续准确地识别出该Web服务设备的实际设备类型提供支持。
可选地,所述方法还包括:
所述Web客户端若确定所述第二相似度值小于等于所述相似度阈值或者所述第一参数值不为各第二参数值的组合,则从各第一相似度值以及所述第二相似度值中确定出最大的相似度值;
所述Web客户端若确定所述最大的相似度值小于所述相似度阈值,则根据所述Web服务设备的使用编程语言,从所述设备类型库中确定出所述Web服务设备的第一设备类型。
上述技术方案中,如果确定第二相似度值小于等于相似度阈值或者第一参数值不为各第二参数值的组合,则需要将各第一相似度值、第二相似度值进行比对,确定出最大的相似度值,并判断该最大的相似度值是否大于相似度值阈值,如果该最大的相似度值小于相似度阈值,则可以根据该Web服务设备的使用编程语言,从设备类型库中初步确定出Web服务设备的第一设备类型,从而为后续准确地识别出该Web服务设备的实际设备类型提供支持。
可选地,所述方法还包括:
所述Web客户端若确定所述最大的相似度值大于等于所述相似度阈值,则在确定所述最大的相似度值为所述各第一相似度值中的任一个时,根据所述Web服务设备的使用编程语言,从与所述第二参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型;或者,在确定所述最大的相似度值为所述第二相似度值时,根据所述Web服务设备的使用编程语言,从与所述第一参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型。
上述技术方案中,如果最大的相似度值是大于相似度阈值的,则可以将该最大的相似度值与各第一相似度值、第二相似度值进行比对,若确定该最大的相似度值与某一相似度值相同,则可以将该相似度值所对应的设备类型范围作为Web服务设备的设备类型范围,然后,根据Web服务设备的使用编程语言,进一步缩小Web服务设备的设备类型范围,从而可以得到一个范围较小的设备类型范围,如此可以有助于提高后续针对Web服务设备的设备类型的识别效率,并可以为后续更为准确地识别出Web服务设备的设备类型提供支持。
可选地,所述Web客户端对所述第一访问请求的第一响应页面和所述第二访问请求的第二响应页面进行相似度运算,确定出第一相似度值,包括:
所述Web客户端获取所述第一响应页面的页面源码以及所述第二响应页面的页面源码,并分别针对所述第一响应页面的页面源码以及所述第二响应页面的页面源码进行分词 处理,确定出所述第一响应页面对应的第一分词集以及所述第二响应页面对应的第二分词集;
所述Web客户端将所述第一分词集中的各第一分词与所述第二分词集中的各第二分词进行合并去重处理,得到第三分词集;
所述Web客户端以所述第三分词集中的每个第三分词作为键,为每个键设置对应的数值,得到键值数据集;
所述Web客户端根据所述键值数据集,将所述第一分词集中的各第一分词转换为对应的数值,得到第一数值集,并将所述第二分词集中的各第二分词转换为对应的数值,得到第二数值集;
所述Web客户端对所述第一数值集进行编码处理,得到第一向量集,并对所述第二数值集进行编码处理,得到第二向量集;
所述Web客户端将所述第一向量集以及所述第二向量集,通过设定的相似度算法,确定出所述第一相似度值。
上述技术方案中,由于响应页面的页面源码能够真实地反映响应页面的实际行为特征,因此通过基于第一响应页面的页面源码以及第二响应页面的页面源码来计算第一响应页面与第二响应页面的相似度值,就可以准确地判断第一响应页面的响应行为与第二响应页面的响应相位是否大致相同,从而可以准确地判断第一响应页面与第二响应页面是否大致相同。
可选地,所述Web客户端根据所述第一设备类型、所述第二设备类型以及所述第三设备类型,确定出所述Web服务设备的目标设备类型,包括:
所述Web客户端若确定所述第一设备类型中仅存在一个子设备类型,则将所述子设备类型确定为所述Web服务设备的目标设备类型;
所述Web客户端若确定所述第一设备类型中存在至少两个子设备类型,则在确定所述第三设备类型存在于所述第一设备类型中时,将所述第三设备类型确定为所述Web服务设备的目标设备类型;或者,在确定所述第三设备类型为空值且确定所述第二设备类型存在于所述第一设备类型中时,将所述第二设备类型确定为所述Web服务设备的目标设备类型。
上述技术方案中,由于Web服务设备的第一设备类型是基于Web服务设备的真实响应行为确定的,该Web服务设备的真实响应行为能够体现Web服务设备针对参数污染访问请求的具体行为特征,而Web服务设备的具体行为特征是无法轻易改变的,比如一个物体所具有的行为特征是不可能轻易改变的,因此具有不可抵赖性,那么所确定出的Web服务设备的第一设备类型相比第二设备类型、第三设备类型更较为准确,更能够体现Web服务设备所属的实际设备类型。所以,本发明中的技术方案通过先判断第一设备类型是否存在多个子设备类型,来进一步准确地确定出Web服务设备的目标设备类型。如果确定第一设备类型中仅存在一个子设备类型,则可以直接将该子设备类型作为Web服务设备的目标设备类型;如果确定第一设备类型中存在多个子设备类型,则需要结合第二设备类型和/或第三设备类型来进一步判断,那么就可以更为准确地确定出Web服务设备的目标设备类型,也即是确定出Web服务设备的真实设备类型。
可选地,所述方法还包括:
所述Web客户端在确定所述第一设备类型中存在至少两个子类型时,若确定所述第二设备类型、所述第三设备类型均为空值或者确定所述第二设备类型、所述第三设备类型均 不存在于所述第一设备类型中,则将所述第一设备类型中的任一子类型确定为所述Web服务设备的目标设备类型。
上述技术方案中,在第一设备类型中存在至少两个子类型的前提条件下,如果确定第二设备类型、第三设备类型均为空值或第二设备类型、第三设备类型均不存在于第一设备类型中,且由于第一设备类型能够更为准确地反映出Web服务设备的实际设备类型,因此可以直接将第一设备类型中的任一子设备类型作为Web服务设备的目标设备类型。
第二方面,本发明实施例还提供了一种Web服务设备的识别装置,包括:
构造单元,用于构造用于访问Web服务设备的第一访问请求和至少一个第二访问请求;所述第一访问请求用于表征针对所述Web服务设备的访问请求中的可污染参数进行多参数值污染所构造的请求;每个第二访问请求用于表征针对所述Web服务设备的访问请求中的可污染参数未进行污染所构造的请求;
处理单元,用于针对每个第二访问请求的第二响应页面,对所述第一访问请求的第一响应页面和所述第二访问请求的第二响应页面进行相似度运算,确定出第一相似度值;若确定所述第一相似度值大于相似度阈值,则从与所述第二响应页面匹配的至少一个所述Web服务设备的设备类型中确定出符合所述Web服务设备的使用编程语言的第一设备类型;基于所述第一设备类型,确定出所述Web服务设备的目标设备类型。
第三方面,本发明实施例提供一种计算设备,包括至少一个处理器以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行上述第一方面任意所述的Web服务设备的识别方法。
第四方面,本发明实施例提供一种计算机可读存储介质,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行上述第一方面任意所述的Web服务设备的识别方法。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种Web服务设备的识别方法的流程示意图;
图2为本发明实施例提供的一种确定Web服务器的设备类型C的流程示意图;
图3为本发明实施例提供的一种确定Web服务设备的目标设备类型的流程示意图;
图4为本发明实施例提供的一种Web服务设备的识别装置的结构示意图;
图5为本发明实施例提供的一种计算设备的结构示意图。
具体实施方式
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
下面首先对本发明实施例中涉及的部分用语进行解释说明,以便于本领域技术人员进行理解。
(1)Web(World Wide Web,万维网)服务器:也称为WWW服务器,一般指网站服务器,主要提供网上信息浏览、下载服务,可以处理浏览器等Web客户端的请求并返回相应的响应。其中,常见的Web服务器的设备类型有Apache、IIS、Tomcat等。
(2)HTTP(HyperText Transfer Protocol,超文本传输协议):是一个简单的请求-响应协议,它指定了客户端可能发送给服务器什么样的消息以及得到什么样的响应。
(3)HTTP参数污染(HTTP Parameter Pollution):是一种常见的绕过安全设备的攻击手段。该问题主要由于现行的HTTP标准未定义Web服务器在面对一个参数拥有多个值的情况下,该选用哪一个作为该参数的最终值;导致不同的WEB服务器在遇到一个参数多值场景下,出现了不同的处理方式,从而通过HTTP参数污染可以得出后端Web服务器的设备类型。
(4)One-Hot编码:又称为一位有效编码,主要是采用N位状态寄存器来对N个状态进行编码,每个状态都有他独立的寄存器位,并且在任意时候只有一位有效。
如上介绍了本发明实施例中涉及的部分用语,下面对本发明实施例涉及的技术特征进行介绍。
图1示例性的示出了本发明实施例提供的一种Web服务设备的识别方法的流程,该流程可以由Web服务设备的识别装置执行。
如图1所示,该流程具体包括:
步骤101,Web客户端构造用于访问Web服务设备的第一访问请求和至少一个第二访问请求。
本发明实施例中,Web客户端构造出针对Web服务设备的Web页面中可被污染的某一参数进行参数污染后所形成的第一访问请求,同时构造出针对Web服务设备的至少一个Web页面分别进行访问所形成的至少一个第二访问请求。其中,第一访问请求用于表征针对Web服务设备的访问请求中的可污染参数进行多参数值污染所构造的请求;每个第二访问请求用于表征针对Web服务设备的访问请求中的可污染参数未进行污染所构造的请求。而且,为了能够更准确地识别出Web服务设备的设备类型,本发明中的技术方案通过引入正常访问请求以及异常访问请求的构造,即构造用于访问Web服务设备的第三访问请求和第四访问请求。其中,第三访问请求用于表征针对Web服务设备进行正常访问的请求;第四访问请求用于表征针对Web服务设备进行异常访问的请求。然后,Web客户端在构造出各访问请求后,会分别向Web服务设备发起各访问请求,Web服务设备在接收到每个访问请求后,会针对该访问请求进行处理,并返回相应的响应页面。其中,Web服务设备在接收到第三访问请求后,即会针对该第三访问请求进行处理,返回一个第三响应页面展示在Web客户端,以及Web服务设备在接收到一个第四访问请求后,即会针对该第四访问请求进行处理,返回一个第四响应页面展示在Web客户端。Web客户端通过对第三访问请求的第三响应页面进行特征匹配,确定出Web服务设备的第二设备类型,并通过对第四访问请求的第四响应页面进行特征匹配,确定出Web服务设备的第三设备类型。
示例性地,在需要针对某一Web服务器的设备类型进行识别时,会通过在该Web服务器所对应的Web客户端先构造一定数量的访问请求,比如第一个是对该Web服务器的默认页面进行访问的访问请求A,也即是通过该Web服务器的默认网址,比如URL(Uniform  Resource Locator,统一资源定位符),进行正常访问该Web服务器。第二个是构造使Web服务器报错的访问请求B,也即是异常访问请求B,比如通过在Web服务器的某一URL中构造不存在的访问路径、加入非法的参数或构造非法的协议字段等可以引发在访问该Web服务器时会出现报错的响应页面,或者预先构造一个携带有非真实信息的URL进行访问该Web服务器,如此也会使得在访问该Web服务器时出现报错的响应页面。第三个是构造一个对Web服务器的Web页面中的某一参数进行参数污染后所形成的访问请求,以及构造至少一个针对该Web服务器的至少一个Web页面分别进行正常访问的请求。具体地,对该Web服务器的进行Web页面爬取操作,以此获取该Web服务器的各Web页面的相关参数信息,同时可以获取该Web服务器的各Web页面的URL,并从各Web页面的相关参数信息中识别出可被HTTP参数污染利用的参数进行参数污染,比如页面page参数或ID(Identification,标识)参数等。再针对该参数进行参数污染后构造一个新的URL,用于向该Web服务器发起访问请求,比如访问请求C,并在该Web服务器所对应的Web客户端输入该新的URL进行访问该Web服务器,同时也会通过该Web服务器的各Web页面的URL分别向该Web服务器发起访问请求,比如访问请求D和访问请求E。比如,以page参数作为可被参数污染的参数进行污染为例,假设该Web服务器有两个Web页面,即第一个Web页面的URL1为https://example.com/i.php?page=1,第二个Web页面的URL2为https://example.com/i.php?page=2,其中,page参数为可被HTTP参数污染利用的参数。通过对page参数进行参数污染后,形成新的page参数为page=1&page=2,并基于该新的page参数可构造出一个新的URL,即URL3,该URL3为https://example.com/i.php?page=1&page=2,通过该URL3进行访问该Web服务器。
步骤102,针对每个第二访问请求的第二响应页面,所述Web客户端对所述第一访问请求的第一响应页面和所述第二访问请求的第二响应页面进行相似度运算,确定出第一相似度值。
本发明实施例中,Web服务设备在接收到第一访问请求后,即会针对该第一访问请求进行处理,返回一个第一响应页面展示在Web客户端,而且Web服务设备在接收到一个第二访问请求后,即会针对该第二访问请求进行处理,返回一个第二响应页面展示在Web客户端。针对每个第二访问请求的第二响应页面,Web客户端获取第一响应页面的页面源码以及该第二响应页面的页面源码,并针对第一响应页面的页面源码进行分词处理,即可得到第一响应页面对应的第一分词集,以及对第二响应页面的页面源码进行分词处理,即可得到第二响应页面对应的第二分词集。再通过将第一分词集中的各第一分词与第二分词集中的各第二分词进行合并去重处理,即可得到第三分词集,并以第三分词集中的每个第三分词作为键,为每个键设置对应的数值,得到键值数据集。然后,根据键值数据集,将第一分词集中的各第一分词转换为对应的数值,得到第一数值集,并将第二分词集中的各第二分词转换为对应的数值,得到第二数值集,并对第一数值集进行编码处理,得到第一向量集,以及对第二数值集进行编码处理,得到第二向量集。最后,通过设定的相似度算法,对第一向量集和第二向量集进行相似度计算,即可计算出第一响应页面与该第二响应页面的第一相似度值。如此,由于响应页面的页面源码能够真实地反映响应页面的实际行为特征,因此该方案通过基于第一响应页面的页面源码以及第二响应页面的页面源码来计算第一响应页面与第二响应页面的相似度值,就可以准确地判断第一响应页面的响应行为与第二响应页面的响应相位是否大致相同,从而可以准确地判断第一响应页面与第二响应 页面是否大致相同。
步骤103,所述Web客户端若确定所述第一相似度值大于相似度阈值,则从与所述第二响应页面匹配的至少一个所述Web服务设备的设备类型中确定出符合所述Web服务设备的使用编程语言的第一设备类型。
本发明实施例中,在第一相似度值大于相似度阈值的前提下,针对第一相似度值大于相似度阈值的第二响应页面,在确定第一响应页面中可污染参数的第一参数值与第二响应页面中可污染参数的第二参数值相同时,且由于Web服务设备针对第二访问请求所选择的可污染参数的参数值是固定的、单一的,因此可以从设备类型库中确定出与第二参数值匹配的至少一个设备类型。然后,根据Web服务设备的使用编程语言,从与第二参数值匹配的至少一个设备类型中确定出Web服务设备的第一设备类型,以此可进一步地缩小该Web服务设备的设备类型范围,从而为后续准确地识别出该Web服务设备的实际设备类型提供支持。其中,设备类型库用于存储各参数值(比如First参数值和/或Last参数值)以及参数值对应的设备类型和使用编程语言,比如针对具有多参数值的某一可污染参数,可以根据各设备类型的Web服务设备针对具有多参数值的可污染参数的历史表现行为数据,确定出选择该可污染参数的First参数值的Web服务设备有哪些设备类型,以及选择该可污染参数的Last参数值的Web服务设备有哪些设备类型,或者选择该可污染参数的All参数值(包含First参数值和Last参数值)的Web服务设备有哪些设备类型。其中,相似度阈值可以根据本领域技术人员的经验或根据技术方案的具体应用场景进行设置,本发明实施例对此并不作限定。
如果确定各第二响应页面的第一相似度值均小于等于所述相似度阈值,或者假设有两个第二响应页面,第一个第二响应页面与第一响应页面的第一相似度值大于相似度阈值,但是第一参数值与第一个第二响应页面中的第二参数值不相同,且第二个第二响应页面与第一响应页面的第一相似度值小于等于相似度阈值,或者第二个第二响应页面与第一响应页面的第一相似度值大于相似度阈值,但是第一参数值与第二个第二响应页面中的第二参数值不相同,则需要判断第一响应页面中的第一参数值是否为各第二参数值的组合,以此来确定Web服务设备的设备类型范围。如果第一参数值为各第二参数值的组合,则可以根据各第二参数值构造出用于访问Web服务设备的第五访问请求,并向Web服务设备发起该第五访问请求。Web服务设备在接收到该第五访问请求后,即会针对该第五访问请求进行处理,返回一个第五响应页面展示在Web客户端。Web客户端针对第五访问请求的第五响应页面,通过设定的相似度算法,确定第五响应页面与第一响应页面的第二相似度值。如果确定第二相似度值大于相似度阈值,则从设备类型库中确定出与第一参数值(多个第二参数值的组合)匹配的至少一个设备类型。比如Web服务设备选择该可污染参数的All参数值(包含First参数值和Last参数值)作为最终处理的参数值,则根据该All参数值即可根据各设备类型的Web服务设备针对具有多参数值的可污染参数的历史表现行为数据,确定出选择该可污染参数的All参数值的Web服务设备有哪些设备类型。然后,根据该Web服务设备的使用编程语言,即可更进一步地缩小该Web服务设备的设备类型范围,也即是从与第一参数值匹配的至少一个设备类型中确定出Web服务设备的第一设备类型,从而为后续准确地识别出该Web服务设备的实际设备类型提供支持。此外,如果确定第二相似度值小于等于相似度阈值或者第一参数值不为各第二参数值的组合,则需要将各第一相似度值、第二相似度值进行比对,确定出最大的相似度值,并判断该最大的相似度值是否 大于相似度值阈值,如果该最大的相似度值小于相似度阈值,则可以根据该Web服务设备的使用编程语言,从设备类型库中初步确定出Web服务设备的第一设备类型。如果确定该最大的相似度值大于等于相似度阈值,则可以将该最大的相似度值与各第一相似度值、第二相似度值进行比对,若确定该最大的相似度值与某一相似度值(比如各相似度值中的任一个或第二相似度值)相同,则可以将该相似度值所对应的设备类型范围作为Web服务设备的设备类型范围。然后,根据Web服务设备的使用编程语言,进一步缩小Web服务设备的设备类型范围,从而可以得到一个范围较小的设备类型范围。比如,如果该最大的相似度值为各第一相似度值中的任一个,则可以根据Web服务设备的使用编程语言,从与第二参数值匹配的至少一个设备类型中确定出Web服务设备的第一设备类型;或者,如果确定该最大的相似度值为第二相似度值,根据所述Web服务设备的使用编程语言,从与第一参数值匹配的至少一个设备类型中确定出Web服务设备的第一设备类型。如此,该方案可以有助于提高后续针对Web服务设备的设备类型的识别效率,并可以为后续更为准确地识别出Web服务设备的设备类型提供支持。
示例性地,继续以上述Web服务器有两个Web页面为例,且假设page参数作为可被污染的参数,某一用户可以在Web客户端所提供的URL输入界面,输入用于访问Web服务器的默认URL,也即是向该Web服务器发起访问请求A,该Web服务器在接收到该访问请求A后,对该访问请求A进行处理,并返回针对该访问请求A的响应页面在Web客户端进行展示,此时可以通过针对返回的响应页面进行特征匹配处理,即可得到Web服务器的设备类型A,比如从该Web服务器所返回的响应报文的header中获取server字段对应的值,该值即为Web服务器的设备类型A。同时,也在Web客户端所提供的URL输入界面,输入Web服务器报错的URL,也即是向该Web服务器发起访问请求B,该Web服务器在接收到该访问请求B后,对该访问请求B进行处理,并返回针对该访问请求B的错误响应页面在Web客户端进行展示,此时可以通过针对返回的错误响应页面进行特征匹配处理,即可得到Web服务器的设备类型B。以及,在Web客户端所提供的URL输入界面,分别输入该Web服务器的各Web页面(比如两个Web页面)的URL,比如URL1、URL2,也即是向该Web服务器发起访问请求D和访问请求E,该Web服务器在接收到该访问请求D后,对该访问请求D进行处理,并返回针对该访问请求D的响应页面1在Web客户端进行展示,同时也会在接收到该访问请求E后,对该访问请求E进行处理,并返回针对该访问请求E的响应页面2在Web客户端进行展示,此时通过针对该访问请求D的响应页面1可以获取该响应页面1的Page参数值1,通过针对该访问请求E的响应页面2可以获取该响应页面2的Page参数值2。而且,也会在Web客户端所提供的URL输入界面,输入经过参数污染后所构造的一个新的URL,比如URL3,也即是向该Web服务器发起访问请求C,该Web服务器在接收到该访问请求C后,对该访问请求C进行处理,并返回针对该访问请求C的响应页面3在Web客户端进行展示,此时通过针对该访问请求C的响应页面3可以获取该响应页面3的Page参数值3,该响应页面3的page参数值3可能是单一的,也可能是多个page参数值的组合,比如可能是由Page参数值1和Page参数值2所构成的组合。然后,通过针对响应页面1、响应页面2以及响应页面3进行相似度运算,并结合Web服务器所使用的编程语言,即可得到Web服务器的设备类型C。
下面结合图2,对本发明实施例中通过响应页面1、响应页面2以及响应页面3进行相似度运算来确定Web服务器的设备类型C的实施过程进行具体描述。其中,图2为本发 明实施例提供的一种确定Web服务器的设备类型C的流程示意图。
步骤201,获取针对URL1的响应页面1、针对URL2的响应页面2以及针对URL3的响应页面3。
在通过Web客户端分别输入URL1、URL2以及URL3获取响应页面1、响应页面2以及响应页面3后,基于响应页面1中的相关信息确定出响应页面1的page参数值1,基于响应页面2中的相关信息确定出响应页面2的page参数值2,以及基于响应页面3中的相关信息确定出响应页面3的page参数值3。其中,响应页面1的page参数值1是单一的,响应页面2的page参数值2是单一的,响应页面3的page参数值3可能是单一的,也可能是多个page参数值的组合。同时可以通过相似度算法(比如欧几里得距离算法、皮尔逊相关系数算法或者余弦相似度算法等)进行Web响应页面相似度运算,比如可以计算出响应页面1与响应页面3的相似度值或响应页面2与响应页面3的相似度值等。
步骤202,确定响应页面1与响应页面3的相似度值a1是否大于相似度阈值。若是,则执行步骤203;若否,则执行步骤204。
作为一种示例,以余弦相似度算法为例,通过余弦相似度算法对响应页面1与响应页面3进行相似度运算,可以计算出响应页面1与响应页面3的相似度值a1。然后,判断该相似度值a1是否大于相似度阈值。如果该相似度值a1大于相似度阈值,执行步骤203;如果该相似度值a1小于等于相似度阈值,执行步骤204。其中,相似度阈值的范围可以为0~1,相似度值越趋近于1代表相似度越高,相似度值越趋近于0代表相似度越低,比如设置一个相似度阈值为0.7或0.75等,或者,可以根据本领域技术人员的经验或根据本发明实施例的实际应用场景设置一个具体的相似度阈值,比如设置为0.75或者0.8等,本发明实施例对此不做限定。
步骤203,确定响应页面1的page参数值1与响应页面3的page参数值3是否相同。若是,则执行步骤204;若否,则执行步骤205。
步骤204,从Web服务器设备类型库中确定出与page参数值1匹配的Web服务器的设备类型集1。
如果确定响应页面1的page参数值1与响应页面3的page参数值3相同,则可以说明Web服务器选择了page参数的一个参数值作为最终的处理参数值,此时可以从如表1所示的Web服务器设备类型库中确定出一个参数值所对应的至少一个Web服务器的设备类型,比如针对具有多个参数值的page参数,Web服务器选择了该page参数的First参数值作为最终的处理参数值,或者,Web服务器选择了该page参数的Last参数值作为最终的处理参数值,那么就可以从Web服务器设备类型库获取到Web服务器的设备类型集1。比如,假设page参数值1是page参数的First参数值,也即是Web服务器选择了该page参数的First参数值作为最终的处理参数值,该设备类型集1包括{JSP/Tomcat,Perl(CGI)/Apache}。需要说明的是,表1仅是一种简单的示例,是为了便于说明本发明实施例中的技术方案,并不构成对本发明实施例中的技术方案的限定。
表1
Web服务器 参数获取函数 获取到的参数
ASP/IIS Request.QueryString(“par”) All(comma-delimited string)
PHP/Apache $_GET(“par”) Last
JSP/Tomcat Request.getParameter(“par”) First
Perl(CGI)/Apache Param(“par”) First
Python/Apache getvalue(“par”) All(List)
针对表1,当特定的Web服务器与网站编程语言结合时会出现不同的HTTP参数污染表现行为,比如表1中的Apache服务器,以PHP作为网站编程语言,则将取同一参数的Last参数值作为该参数的最终值,而以Perl作为网站编程语言,则将取同一参数的First参数值作为该参数的最终值。比如,以上述示例的URL3为例,假设Web服务器的设备类型为IIS,若该Web服务器的网站编程语言为ASP,则该Web服务器后端所获得的page参数的参数值为1,2(为多个page参数值的组合)。或者,假设Web服务器的设备类型为Apache,若该Web服务器的网站编程语言为PHP,则该Web服务器后端将会取page参数的最后一个参数值作为最终值进行处理。如此,利用这种差异,即可获取某一Web服务器的设备类型范围,而无需依赖特征匹配进行识别。
步骤205,确定响应页面2与响应页面3的相似度值a2是否大于相似度阈值。若是,则执行步骤206;若否,则执行步骤207。
作为一种示例,以余弦相似度算法为例,通过余弦相似度算法对响应页面2与响应页面3进行相似度运算,可以计算出响应页面2与响应页面3的相似度值a2。然后,判断该相似度值a2是否大于相似度阈值。如果该相似度值a2大于相似度阈值,执行步骤206;如果该相似度值a2小于等于相似度阈值,执行步骤207。
步骤206,确定响应页面2的page参数值2与响应页面3的page参数值3是否相同。若是,则执行步骤207;若否,则执行步骤208。
步骤207,从Web服务器设备类型库中确定出与page参数值2匹配的Web服务器的设备类型集2。
如果确定响应页面2的page参数值2与响应页面3的page参数值3相同,则可以说明Web服务器选择了page参数的一个参数值作为最终的处理参数值,此时可以从如表1所示的Web服务器设备类型库中确定出一个参数值所对应的至少一个Web服务器的设备类型,比如针对具有多个参数值的page参数,Web服务器选择了该page参数的Last参数值作为最终的处理参数值,或者,Web服务器选择了该page参数的First参数值作为最终的处理参数值,那么就可以从Web服务器设备类型库获取到Web服务器的设备类型集2。比如,假设page参数值2是page参数的Last参数值,也即是Web服务器选择了该page参数的Last参数值作为最终的处理参数值,该设备类型集2包括{PHP/Apache}。
步骤208,确定响应页面3的page参数值3是否为page参数值1和page参数值2的组合。若是,则执行步骤209;若否,则执行步骤212。
如果响应页面3的page参数值3存在多个子参数值,且确定该多个子参数值是由page参数值1和page参数值2的组合构成的,则通过该page参数值1和page参数值2,构造出一个用于访问Web服务器的URL4,并在Web客户端输入该URL4后即可获取针对该URL4的响应页面4。如果响应页面3的page参数值3是单一数值,或者响应页面3的page参数值3存在多个子参数值,但是该多个子参数值并不是由page参数值1和page参数值2的组合构成的,则执行步骤212。
步骤209,根据page参数值1和page参数值2,构造出用于访问Web服务器的URL4,并获取针对该URL4的响应页面4。
步骤210,确定响应页面3与响应页面4的相似度值a3是否大于相似度阈值。若是, 则执行步骤;若否,则执行步骤。
作为一种示例,以余弦相似度算法为例,通过余弦相似度算法对响应页面3与响应页面4进行相似度运算,可以计算出响应页面3与响应页面4的相似度值a3。然后,判断该相似度值a3是否大于相似度阈值。如果该相似度值a3大于相似度阈值,执行步骤211;如果该相似度值a3小于等于相似度阈值,执行步骤212。
步骤211,从Web服务器设备类型库中确定出与page参数值3匹配的Web服务器的设备类型集3。
如果相似度值a3大于相似度阈值,则可以说明Web服务器选择了page参数的多参数值作为最终的处理参数值,此时可以从如表1所示的Web服务器设备类型库中确定出多参数值所对应的至少一个Web服务器的设备类型,比如针对具有多个参数值的page参数,Web服务器选择了该page参数的All参数值作为最终的处理参数值,那么就可以从Web服务器设备类型库获取到Web服务器的设备类型集3,该设备类型集3包括{ASP/IIS,Python/Apache}。
步骤212,将相似度值a1、相似度值a2和相似度值a3进行比对,确定出最大的相似度值a0。
步骤213,确定最大的相似度值a0是否大于相似度阈值。若是,则执行步骤215;若否,则执行步骤214。
步骤214,根据Web服务器所使用的编程语言,从Web服务器设备类型库中确定出Web服务器的设备类型C。
步骤215,若确定最大的相似度值a0等于相似度值a1,则确定Web服务器的设备类型集为设备类型集1,若确定最大的相似度值a0等于相似度值a2,则Web服务器的设备类型集为设备类型集2,若确定最大的相似度值a0等于相似度值a3,则确定Web服务器的设备类型集为设备类型集3。
步骤216,根据Web服务器所使用的编程语言,分别从设备类型集1、设备类型集2或设备类型集3中确定出Web服务器的设备类型C。
在确定出设备类型集1后,可以根据Web服务器所使用的编程语言,从设备类型集1中确定出Web服务器的设备类型C。比如,假设Web服务器所使用的编程语言为JSP,则可以从设备类型集1中确定出Web服务器的设备类型C为Tomcat。或者,在确定出设备类型集2后,可以根据Web服务器所使用的编程语言,从设备类型集2中确定出Web服务器的设备类型C。比如,假设Web服务器所使用的编程语言为PHP,则可以从设备类型集2中确定出Web服务器的设备类型C为Apache。或者,在确定出设备类型集3后,可以根据Web服务器所使用的编程语言,从设备类型集3中确定出Web服务器的设备类型C。比如,假设Web服务器所使用的编程语言为ASP,则可以从设备类型集3中确定出Web服务器的设备类型C为IIS。
其中,作为一种示例,下面以计算响应页面1与响应页面3的相似度值为例,对确定响应页面之间的相似度值的具体实施过程进行描述。
步骤a,获取响应页面1的页面源码1以及响应页面3的页面源码3。
示例性地,假设获取的页面源码1为:
<html>
<head>
<title>test1</title>
</head>
<body>
Just a test1
</body>
</html>
并假设页面源码3为:
<html>
<head>
<title>test2</title>
</head>
<body>
Just a test2
</body>
</html>
步骤b,对页面源码1进行分词处理,得到响应页面1对应的分词集1,并对页面源码3进行分词处理,得到响应页面3对应的分词集3。
其中,针对页面源码进行分词处理主要依据html标签以及页面源码的中间实体进行。通过针对页面源码1进行分词处理后,即可得到分词集1=[<html>,<head>,<title>,test1,</title>,</head>,<body>,Just a test1,</body>,</html>],并通过针对页面源码2进行分词处理后,即可得到分词集2=[<html>,<head>,<title>,test2,</title>,</head>,<body>,Just a test2,</body>,</html>]。
步骤c,对分词集1和分词集3进行合并去重处理,得到合并后的分词集list。
示例性地,将上述的分词集1和分词集3进行合并去重处理后,即可得到合并去重后的分词集list,即该分词集list1=[<html>,<head>,<title>,test1,test2,</title>,</head>,<body>,Just a test1,Just a test2,</body>,</html>]。再将该分词集list1进行随机排序,即可得到排序后的分词集list2,比如该分词集list2=[test1,test2,just a test1,just a test2,<html>,<head>,<title>,</title>,</head>,<body>,</body>,</html>]。然后,将排序后的分词集list2转换为键值对格式的分词集dict,即,以分词集list2中的每个分词作为键key,以分词集list2中的每个分词在该分词集list2中出现的位置顺序作为值value,如此即可得到分词集dict=[test1:0,test2:1,just a test1:2,just a test2:3,<html>:4,<head>:5,<title>:6,</title>:7,</head>:8,<body>:9,</body>:10,</html>:11]。或者,也可以不对分词集list1进行随机排序,直接以合并去重所得到的分词集进行键值对格式的转换处理,得到一个分词集dict。
步骤d,按照转换处理后所得到的键值对格式的分词集dict,分别将分词集1和分词集3中的各分词转换为对应的值value。
示例性地,以上述所得到的分词集dict=[test1:0,test2:1,just a test1:2,just a test2:3,<html>:4,<head>:5,<title>:6,</title>:7,</head>:8,<body>:9,</body>:10,</html>:11]为例,按照该分词集dict,将分词集1中的各分词转换为对应的值value,即新的分词集1=[4,5,6,0,7,8,9,2,10,11],并将分词集3中的各分词转换为对应的值value,即新的 分词集3=[4,5,6,1,7,8,9,3,10,11]。
步骤e,对新的分词集1进行编码处理,得到向量集vector1,并对新的分词集3进行编码处理,得到向量集vector3。
示例性地,以对分词集进行编码所使用的编码算法为one-hot编码为例,对新的分词集1进行one-hot编码处理,得到向量集vector1,该向量集vector1=[1,0,1,0,1,1,1,1,1,1,1,1]。对新的分词集3进行one-hot编码处理,得到向量集vector3,该向量集vector3=[0,1,0,1,1,1,1,1,1,1,1,1]。
步骤f,对响应页面1对应的向量集vector1和响应页面3对应的向量集vector3进行相似度运算,即可得到响应页面1与响应页面3的相似度值。
示例性地,以所采用的相似度算法为余弦相似度算法为例,采用该余弦相似度算法对上述的向量集vector1和向量集vector3进行相似度计算,即可得到相似度值cos(θ),即:
Figure PCTCN2022100025-appb-000001
如此,可得到响应页面1与响应页面3的相似度值为0.8,假设相似度阈值为0.75,则可以得到0.8大于0.75,因此可以确定响应页面1与响应页面3的相似度值大于相似度阈值。
步骤104,所述Web客户端基于所述第一设备类型,确定出所述Web服务设备的目标设备类型。
本发明实施例中,为了能够更准确地识别出Web服务设备的设备类型,本发明中的技术方案除了构造针对Web服务设备的Web页面中可被污染的某一参数进行参数污染后所形成的第一访问请求,同时构造出针对Web服务设备的至少一个Web页面分别进行访问所形成的至少一个第二访问请求,还会构造针对Web服务设备的默认Web页面进行正常访问的请求以及针对Web服务设备进行异常访问的请求,通过综合针对各访问请求所反映的响应页面的特征信息,能够更为准确地确定Web服务设备的目标设备类型。然后,综合第一设备类型、第二设备类型以及第三设备类型,即可更为准确地识别出Web服务设备所属的实际设备类型。具体地,由于Web服务设备的第一设备类型是基于Web服务设备的真实响应行为确定的,该Web服务设备的真实响应行为能够体现Web服务设备针对参数污染访问请求的具体行为特征,而Web服务设备的具体行为特征是无法轻易改变的,比如一个物体所具有的行为特征是不可能轻易改变的,因此具有不可抵赖性,那么所确定出的Web服务设备的第一设备类型相比第二设备类型、第三设备类型更较为准确,更能够体现Web服务设备所属的实际设备类型。所以,本发明中的技术方案通过先判断第一设备类型是否存在多个子设备类型,来进一步准确地确定出Web服务设备的目标设备类型。Web客户端如果第一设备类型中仅存在一个子设备类型,则可以直接将该子设备类型作为Web服务设备的目标设备类型;如果确定第一设备类型中存在多个子设备类型,则在确定第三设备类型存在于第一设备类型中时,将第三设备类型确定为Web服务设备的目标设备类型;或者,在确定第三设备类型为空值且确定第二设备类型存在于第一设备类型中时,将第二设备类型确定为Web服务设备的目标设备类型。此外,在第一设备类型中存在至少两个子 类型的前提条件下,如果确定第二设备类型、第三设备类型均为空值或第二设备类型、第三设备类型均不存在于第一设备类型中,且由于第一设备类型能够更为准确地反映出Web服务设备的实际设备类型,因此可以直接将第一设备类型中的任一子设备类型作为Web服务设备的目标设备类型。
下面结合图3,以上述所确定的设备类型A、设备类型B和设备类型C为例,对本发明实施例中确定Web服务设备的目标设备类型的实施过程进行描述。其中,图3为本发明实施例提供的一种确定Web服务设备的目标设备类型的流程示意图。
步骤301,确定设备类型C中是否仅存在一个子设备类型。若是,则执行步骤302;若否,则执行步骤303。
步骤302,将设备类型C中仅存在的子设备类型确定为Web服务设备的目标设备类型。
如果确定设备类型C中仅存在一个子设备类型,则说明通过HTTP参数污染方式所获得的Web服务设备的设备类型是唯一的,因此可以将该唯一的设备类型作为Web服务设备的目标设备类型。
步骤303,确定设备类型B是否不为空。若是,则执行步骤304;若否,则执行步骤306。
判断设备类型B是否为空值,如果不是空值,则执行步骤304,如果是空值,则执行步骤306。
步骤304,确定设备类型B是否存在于设备类型C中。若是,则执行步骤305;若否,则执行步骤306。
如果确定设备类型C中存在多个子设备类型,则在确定设备类型B不是空值时,判断该设备类型C中是否存在设备类型B。
步骤305,将设备类型B确定为Web服务设备的目标设备类型。
步骤306,确定设备类型A是否不为空。若是,则执行步骤307;若否,则执行步骤309。
判断设备类型A是否为空值,如果不是空值,则执行步骤307,如果是空值,则执行步骤309。
步骤307,确定设备类型A是否存在于设备类型C中。若是,则执行步骤308;若否,则执行步骤309。
如果确定设备类型C中存在多个子设备类型,则在确定设备类型B是空值且设备类型A不为空值时,或者在确定设备类型B不是空值且设备类型B不存在于设备类型C中时,判断该设备类型C中是否存在设备类型A。
步骤308,将设备类型A确定为Web服务设备的目标设备类型。
步骤309,将设备类型C中存在的任一子设备类型确定为Web服务设备的目标设备类型。
上述实施例表明,上述技术方案中,由于现有技术方案是通过从Web服务器返回的响应页面中获取Web服务器的设备类型相关信息,并对Web服务器的设备类型相关信息进行特征匹配,来确定Web服务器的设备类型,因此若Web服务器的运维人员对Web服务器的设备类型相关信息进行伪造或抹除,或者运维人员对返回的服务器信息字段进行自定义设置,那么通过采用特征匹配识别的方式就不能准确地识别出Web服务器的设备类型。基于此,本发明中的技术方案通过引入参数污染的方式来共同识别Web服务设备的设备类 型,可以精确地探测到Web服务设备针对参数污染访问请求的真实响应状态,并根据Web服务设备针对参数污染访问请求的真实响应状态以及Web服务设备针对未进行参数污染的访问请求的真实响应状态,可以准确地识别出Web服务设备的响应行为特征,如此也就可以准确地识别出Web服务设备的设备类型,从而可以有效地提高Web服务设备的设备类型的识别准确性,以此可以有效地降低特征匹配识别时所产生的漏报率、误报率。具体来说,Web客户端构造用于访问Web服务设备的第一访问请求和至少一个第二访问请求;第一访问请求用于表征针对Web服务设备的访问请求中的可污染参数进行多参数值污染所构造的请求;每个第二访问请求用于表征针对Web服务设备的访问请求中的可污染参数未进行污染所构造的请求。再针对每个第二访问请求的第二响应页面,对第一访问请求的第一响应页面和该第二访问请求的第二响应页面进行相似度运算,以此可确定出第一相似度值,并将该第一相似度值与相似度阈值进行比对,即可确定Web服务设备针对第一访问请求的响应行为与针对第二访问请求的响应行为是否相同,从而为初步确定Web服务设备的设备类型提供支持。然后,在确定第一相似度值大于相似度阈值时,可进一步地根据Web服务设备的使用编程语言,从与第二响应页面匹配的至少一个Web服务设备的设备类型中确定出第一设备类型,并基于该第一设备类型,可准确地确定出Web服务设备的目标设备类型,从而可以有效地提高Web服务设备的设备类型的识别准确性,以此可以有效地降低特征匹配识别时所产生的漏报率、误报率。
基于相同的技术构思,图4示例性的示出了本发明实施例提供的一种Web服务设备的识别装置,该装置可以执行Web服务设备的识别方法的流程。
如图4所示,该装置包括:
构造单元401,用于构造用于访问Web服务设备的第一访问请求和至少一个第二访问请求;所述第一访问请求用于表征针对所述Web服务设备的访问请求中的可污染参数进行多参数值污染所构造的请求;每个第二访问请求用于表征针对所述Web服务设备的访问请求中的可污染参数未进行污染所构造的请求;
处理单元402,用于针对每个第二访问请求的第二响应页面,对所述第一访问请求的第一响应页面和所述第二访问请求的第二响应页面进行相似度运算,确定出第一相似度值;若确定所述第一相似度值大于相似度阈值,则从与所述第二响应页面匹配的至少一个所述Web服务设备的设备类型中确定出符合所述Web服务设备的使用编程语言的第一设备类型;基于所述第一设备类型,确定出所述Web服务设备的目标设备类型。
可选地,所述处理单元402还用于:
构造用于访问Web服务设备的第三访问请求和第四访问请求;所述第三访问请求用于表征针对所述Web服务设备进行正常访问的请求;所述第四访问请求用于表征针对所述Web服务设备进行异常访问的请求;
通过对所述第三访问请求的第三响应页面进行特征匹配,确定出所述Web服务设备的第二设备类型,并通过对所述第四访问请求的第四响应页面进行特征匹配,确定出所述Web服务设备的第三设备类型;
所述处理单元402具体用于:
根据所述第一设备类型、所述第二设备类型以及所述第三设备类型,确定出所述Web服务设备的目标设备类型。
可选地,所述处理单元402具体用于:
针对第一相似度值大于相似度阈值的第二响应页面,在确定所述第一响应页面中可污染参数的第一参数值与所述第二响应页面中所述可污染参数的第二参数值相同时,从设备类型库中确定出与所述第二参数值匹配的至少一个设备类型;
根据所述Web服务设备的使用编程语言,从与所述第二参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型。
可选地,所述处理单元402还用于:
若确定各第二响应页面的第一相似度值均小于等于所述相似度阈值或者第一参数值与第二参数值不相同,则在确定所述第一参数值为各第二参数值的组合时,根据所述各第二参数值构造出用于访问Web服务设备的第五访问请求;
针对所述第五访问请求的第五响应页面,确定所述第五响应页面与所述第一响应页面的第二相似度值;
若确定所述第二相似度值大于所述相似度阈值,则从所述设备类型库中确定出与所述第一参数值匹配的至少一个设备类型;
根据所述Web服务设备的使用编程语言,从与所述第一参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型。
可选地,所述处理单元402还用于:
若确定所述第二相似度值小于等于所述相似度阈值或者所述第一参数值不为各第二参数值的组合,则从各第一相似度值以及所述第二相似度值中确定出最大的相似度值;
若确定所述最大的相似度值小于所述相似度阈值,则根据所述Web服务设备的使用编程语言,从所述设备类型库中确定出所述Web服务设备的第一设备类型。
可选地,所述处理单元402还用于:
若确定所述最大的相似度值大于等于所述相似度阈值,则在确定所述最大的相似度值为所述各第一相似度值中的任一个时,根据所述Web服务设备的使用编程语言,从与所述第二参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型;或者,在确定所述最大的相似度值为所述第二相似度值时,根据所述Web服务设备的使用编程语言,从与所述第一参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型。
可选地,所述处理单元402具体用于:
获取所述第一响应页面的页面源码以及所述第二响应页面的页面源码,并分别针对所述第一响应页面的页面源码以及所述第二响应页面的页面源码进行分词处理,确定出所述第一响应页面对应的第一分词集以及所述第二响应页面对应的第二分词集;
将所述第一分词集中的各第一分词与所述第二分词集中的各第二分词进行合并去重处理,得到第三分词集;
以所述第三分词集中的每个第三分词作为键,为每个键设置对应的数值,得到键值数据集;
根据所述键值数据集,将所述第一分词集中的各第一分词转换为对应的数值,得到第一数值集,并将所述第二分词集中的各第二分词转换为对应的数值,得到第二数值集;
对所述第一数值集进行编码处理,得到第一向量集,并对所述第二数值集进行编码处理,得到第二向量集;
将所述第一向量集以及所述第二向量集,通过设定的相似度算法,确定出所述第一相 似度值。
可选地,所述处理单元402具体用于:
若确定所述第一设备类型中仅存在一个子设备类型,则将所述子设备类型确定为所述Web服务设备的目标设备类型;
若确定所述第一设备类型中存在至少两个子设备类型,则在确定所述第三设备类型存在于所述第一设备类型中时,将所述第三设备类型确定为所述Web服务设备的目标设备类型;或者,在确定所述第三设备类型为空值且确定所述第二设备类型存在于所述第一设备类型中时,将所述第二设备类型确定为所述Web服务设备的目标设备类型。
可选地,所述处理单元402还用于:
在确定所述第一设备类型中存在至少两个子类型时,若确定所述第二设备类型、所述第三设备类型均为空值或者确定所述第二设备类型、所述第三设备类型均不存在于所述第一设备类型中,则将所述第一设备类型中的任一子类型确定为所述Web服务设备的目标设备类型。
基于相同的技术构思,本发明实施例还提供了一种计算设备,如图5所示,包括至少一个处理器501,以及与至少一个处理器连接的存储器502,本发明实施例中不限定处理器501与存储器502之间的具体连接介质,图5中处理器501和存储器502之间通过总线连接为例。总线可以分为地址总线、数据总线、控制总线等。
在本发明实施例中,存储器502存储有可被至少一个处理器501执行的指令,至少一个处理器501通过执行存储器502存储的指令,可以执行前述的Web服务设备的识别方法中所包括的步骤。
其中,处理器501是计算设备的控制中心,可以利用各种接口和线路连接计算设备的各个部分,通过运行或执行存储在存储器502内的指令以及调用存储在存储器502内的数据,从而实现数据处理。可选的,处理器501可包括一个或多个处理单元,处理器501可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作***、用户界面和应用程序等,调制解调处理器主要处理下发指令。可以理解的是,上述调制解调处理器也可以不集成到处理器501中。在一些实施例中,处理器501和存储器502可以在同一芯片上实现,在一些实施例中,它们也可以在独立的芯片上分别实现。
处理器501可以是通用处理器,例如中央处理器(CPU)、数字信号处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本发明实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合Web服务设备的识别方法实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器502作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器502可以包括至少一种类型的存储介质,例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory,RAM)、静态随机访问存储器(Static Random Access Memory,SRAM)、可编程只读存储器(Programmable Read Only Memory,PROM)、只读存储器(Read Only Memory,ROM)、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性存储器、磁盘、光盘等等。存储器502是能够用于携带或存储具有指令 或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本发明实施例中的存储器502还可以是电路或者其它任意能够实现存储功能的装置,用于存储程序指令和/或数据。
基于相同的技术构思,本发明实施例还提供了一种计算机可读存储介质,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行上述Web服务设备的识别方法的步骤。
本领域内的技术人员应明白,本发明的实施例可提供为方法、***、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (12)

  1. 一种Web服务设备的识别方法,其特征在于,包括:
    Web客户端构造用于访问Web服务设备的第一访问请求和至少一个第二访问请求;所述第一访问请求用于表征针对所述Web服务设备的访问请求中的可污染参数进行多参数值污染所构造的请求;每个第二访问请求用于表征针对所述Web服务设备的访问请求中的可污染参数未进行污染所构造的请求;
    针对每个第二访问请求的第二响应页面,所述Web客户端对所述第一访问请求的第一响应页面和所述第二访问请求的第二响应页面进行相似度运算,确定出第一相似度值;
    所述Web客户端若确定所述第一相似度值大于相似度阈值,则从与所述第二响应页面匹配的至少一个所述Web服务设备的设备类型中确定出符合所述Web服务设备的使用编程语言的第一设备类型;
    所述Web客户端基于所述第一设备类型,确定出所述Web服务设备的目标设备类型。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    所述Web客户端构造用于访问Web服务设备的第三访问请求和第四访问请求;所述第三访问请求用于表征针对所述Web服务设备进行正常访问的请求;所述第四访问请求用于表征针对所述Web服务设备进行异常访问的请求;
    所述Web客户端通过对所述第三访问请求的第三响应页面进行特征匹配,确定出所述Web服务设备的第二设备类型,并通过对所述第四访问请求的第四响应页面进行特征匹配,确定出所述Web服务设备的第三设备类型;
    所述Web客户端基于所述第一设备类型,确定出所述Web服务设备的目设备标类型,包括:
    所述Web客户端根据所述第一设备类型、所述第二设备类型以及所述第三设备类型,确定出所述Web服务设备的目标设备类型。
  3. 如权利要求1所述的方法,其特征在于,所述Web客户端若确定所述第一相似度值大于相似度阈值,则从与所述第二响应页面匹配的至少一个所述Web服务设备的设备类型中确定出符合所述Web服务设备的使用编程语言的第一设备类型,包括:
    所述Web客户端针对第一相似度值大于相似度阈值的第二响应页面,在确定所述第一响应页面中可污染参数的第一参数值与所述第二响应页面中所述可污染参数的第二参数值相同时,从设备类型库中确定出与所述第二参数值匹配的至少一个设备类型;
    所述Web客户端根据所述Web服务设备的使用编程语言,从与所述第二参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型。
  4. 如权利要求3所述的方法,其特征在于,所述方法还包括:
    所述Web客户端若确定各第二响应页面的第一相似度值均小于等于所述相似度阈值或者第一参数值与第二参数值不相同,则在确定所述第一参数值为各第二参数值的组合时,根据所述各第二参数值构造出用于访问Web服务设备的第五访问请求;
    所述Web客户端针对所述第五访问请求的第五响应页面,确定所述第五响应页面与所述第一响应页面的第二相似度值;
    所述Web客户端若确定所述第二相似度值大于所述相似度阈值,则从所述设备类型库中确定出与所述第一参数值匹配的至少一个设备类型;
    所述Web客户端根据所述Web服务设备的使用编程语言,从与所述第一参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型。
  5. 如权利要求4所述的方法,其特征在于,所述方法还包括:
    所述Web客户端若确定所述第二相似度值小于等于所述相似度阈值或者所述第一参数值不为各第二参数值的组合,则从各第一相似度值以及所述第二相似度值中确定出最大的相似度值;
    所述Web客户端若确定所述最大的相似度值小于所述相似度阈值,则根据所述Web服务设备的使用编程语言,从所述设备类型库中确定出所述Web服务设备的第一设备类型。
  6. 如权利要求5所述的方法,其特征在于,所述方法还包括:
    所述Web客户端若确定所述最大的相似度值大于等于所述相似度阈值,则在确定所述最大的相似度值为所述各第一相似度值中的任一个时,根据所述Web服务设备的使用编程语言,从与所述第二参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型;或者,在确定所述最大的相似度值为所述第二相似度值时,根据所述Web服务设备的使用编程语言,从与所述第一参数值匹配的至少一个设备类型中确定出所述Web服务设备的第一设备类型。
  7. 如权利要求1所述的方法,其特征在于,所述Web客户端对所述第一访问请求的第一响应页面和所述第二访问请求的第二响应页面进行相似度运算,确定出第一相似度值,包括:
    所述Web客户端获取所述第一响应页面的页面源码以及所述第二响应页面的页面源码,并分别针对所述第一响应页面的页面源码以及所述第二响应页面的页面源码进行分词处理,确定出所述第一响应页面对应的第一分词集以及所述第二响应页面对应的第二分词集;
    所述Web客户端将所述第一分词集中的各第一分词与所述第二分词集中的各第二分词进行合并去重处理,得到第三分词集;
    所述Web客户端以所述第三分词集中的每个第三分词作为键,为每个键设置对应的数值,得到键值数据集;
    所述Web客户端根据所述键值数据集,将所述第一分词集中的各第一分词转换为对应的数值,得到第一数值集,并将所述第二分词集中的各第二分词转换为对应的数值,得到第二数值集;
    所述Web客户端对所述第一数值集进行编码处理,得到第一向量集,并对所述第二数值集进行编码处理,得到第二向量集;
    所述Web客户端将所述第一向量集以及所述第二向量集,通过设定的相似度算法,确定出所述第一相似度值。
  8. 如权利要求2所述的方法,其特征在于,所述Web客户端根据所述第一设备类型、所述第二设备类型以及所述第三设备类型,确定出所述Web服务设备的目标设备类型,包括:
    所述Web客户端若确定所述第一设备类型中仅存在一个子设备类型,则将所述子设备类型确定为所述Web服务设备的目标设备类型;
    所述Web客户端若确定所述第一设备类型中存在至少两个子设备类型,则在确定所述第三设备类型存在于所述第一设备类型中时,将所述第三设备类型确定为所述Web服务设 备的目标设备类型;或者,在确定所述第三设备类型为空值且确定所述第二设备类型存在于所述第一设备类型中时,将所述第二设备类型确定为所述Web服务设备的目标设备类型。
  9. 如权利要求8所述的方法,其特征在于,所述方法还包括:
    所述Web客户端在确定所述第一设备类型中存在至少两个子类型时,若确定所述第二设备类型、所述第三设备类型均为空值或者确定所述第二设备类型、所述第三设备类型均不存在于所述第一设备类型中,则将所述第一设备类型中的任一子类型确定为所述Web服务设备的目标设备类型。
  10. 一种Web服务设备的识别装置,其特征在于,包括:
    构造单元,用于构造用于访问Web服务设备的第一访问请求和至少一个第二访问请求;所述第一访问请求用于表征针对所述Web服务设备的访问请求中的可污染参数进行多参数值污染所构造的请求;每个第二访问请求用于表征针对所述Web服务设备的访问请求中的可污染参数未进行污染所构造的请求;
    处理单元,用于针对每个第二访问请求的第二响应页面,对所述第一访问请求的第一响应页面和所述第二访问请求的第二响应页面进行相似度运算,确定出第一相似度值;若确定所述第一相似度值大于相似度阈值,则从与所述第二响应页面匹配的至少一个所述Web服务设备的设备类型中确定出符合所述Web服务设备的使用编程语言的第一设备类型;基于所述第一设备类型,确定出所述Web服务设备的目标设备类型。
  11. 一种计算设备,其特征在于,包括至少一个处理器以及至少一个存储器,其中,所述存储器存储有计算机程序,当所述程序被所述处理器执行时,使得所述处理器执行权利要求1至9任一项权利要求所述的方法。
  12. 一种计算机可读存储介质,其特征在于,其存储有可由计算设备执行的计算机程序,当所述程序在所述计算设备上运行时,使得所述计算设备执行权利要求1至9任一项权利要求所述的方法。
PCT/CN2022/100025 2021-11-23 2022-06-21 一种Web服务设备的识别方法及装置 WO2023093017A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111396969.1 2021-11-23
CN202111396969.1A CN114238822A (zh) 2021-11-23 2021-11-23 一种Web服务设备的识别方法及装置

Publications (1)

Publication Number Publication Date
WO2023093017A1 true WO2023093017A1 (zh) 2023-06-01

Family

ID=80750632

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100025 WO2023093017A1 (zh) 2021-11-23 2022-06-21 一种Web服务设备的识别方法及装置

Country Status (2)

Country Link
CN (1) CN114238822A (zh)
WO (1) WO2023093017A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114238822A (zh) * 2021-11-23 2022-03-25 深圳前海微众银行股份有限公司 一种Web服务设备的识别方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409090A (zh) * 2018-11-12 2019-03-01 北京知道创宇信息技术有限公司 网站后台检测方法、装置及服务器
CN110166522A (zh) * 2019-04-01 2019-08-23 腾讯科技(深圳)有限公司 服务器识别方法、装置、可读存储介质和计算机设备
CN111125748A (zh) * 2019-11-04 2020-05-08 广发银行股份有限公司 越权查询的判断方法、装置、计算机设备和存储介质
CN111404937A (zh) * 2020-03-16 2020-07-10 腾讯科技(深圳)有限公司 一种服务器漏洞的检测方法和装置
CN114238822A (zh) * 2021-11-23 2022-03-25 深圳前海微众银行股份有限公司 一种Web服务设备的识别方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109409090A (zh) * 2018-11-12 2019-03-01 北京知道创宇信息技术有限公司 网站后台检测方法、装置及服务器
CN110166522A (zh) * 2019-04-01 2019-08-23 腾讯科技(深圳)有限公司 服务器识别方法、装置、可读存储介质和计算机设备
CN111125748A (zh) * 2019-11-04 2020-05-08 广发银行股份有限公司 越权查询的判断方法、装置、计算机设备和存储介质
CN111404937A (zh) * 2020-03-16 2020-07-10 腾讯科技(深圳)有限公司 一种服务器漏洞的检测方法和装置
CN114238822A (zh) * 2021-11-23 2022-03-25 深圳前海微众银行股份有限公司 一种Web服务设备的识别方法及装置

Also Published As

Publication number Publication date
CN114238822A (zh) 2022-03-25

Similar Documents

Publication Publication Date Title
WO2019134334A1 (zh) 网络异常数据检测方法、装置、计算机设备和存储介质
CN111783875B (zh) 基于聚类分析的异常用户检测方法、装置、设备及介质
US10135852B2 (en) Bot detection based on behavior analytics
CN110489622B (zh) 对象信息的分享方法、装置、计算机设备和存储介质
CN110855648B (zh) 一种网络攻击的预警控制方法及装置
CN110888911A (zh) 样本数据处理方法、装置、计算机设备及存储介质
CN111756724A (zh) 钓鱼网站的检测方法、装置、设备、计算机可读存储介质
CN112016318A (zh) 基于解释模型的分诊信息推荐方法、装置、设备及介质
WO2019101197A1 (zh) 网页请求识别
CN110442762B (zh) 基于云平台大数据的大数据处理方法
CN111177719A (zh) 地址类别判定方法、装置、计算机可读存储介质及设备
WO2023093017A1 (zh) 一种Web服务设备的识别方法及装置
CN112559526A (zh) 数据表导出方法、装置、计算机设备及存储介质
CN114647636A (zh) 大数据异常检测方法及***
CN114266046A (zh) 网络病毒的识别方法、装置、计算机设备及存储介质
CN114492576A (zh) 一种异常用户检测方法、***、存储介质及电子设备
CN116155628B (zh) 网络安全检测方法、训练方法、装置、电子设备和介质
CN116668089B (zh) 基于深度学习的网络攻击检测方法、***及介质
CN114124913B (zh) 一种网络资产变化监控的方法、装置及电子设备
CN114257427B (zh) 目标用户的识别方法、装置、电子设备及存储介质
CN114064905A (zh) 网络攻击检测方法、装置、终端设备、芯片及存储介质
CN113282849A (zh) 相似url字符串识别方法、装置、计算机设备和存储介质
CN117278322B (zh) Web入侵检测方法、装置、终端设备及存储介质
US20230273982A1 (en) Login classification with sequential machine learning model
CN117292304B (zh) 一种多媒体数据传输控制方法及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897113

Country of ref document: EP

Kind code of ref document: A1