CN110674442B

CN110674442B - Page monitoring method, device, equipment and computer readable storage medium

Info

Publication number: CN110674442B
Application number: CN201910877369.3A
Authority: CN
Inventors: 彭中华; 华石榴; 钟彬; 裘愉锋
Original assignee: China Unionpay Co Ltd
Current assignee: China Unionpay Co Ltd
Priority date: 2019-09-17
Filing date: 2019-09-17
Publication date: 2023-08-18
Anticipated expiration: 2039-09-17
Also published as: CN110674442A

Abstract

The invention discloses a page monitoring method, a page monitoring device, page monitoring equipment and a computer readable storage medium. The page monitoring method comprises the following steps: acquiring message data to be detected corresponding to a page to be detected; determining a first element word segmentation set corresponding to message data to be detected; converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected; identifying target vector classification corresponding to the first word segmentation vector in the preset vector classification by using a preset vector classification model; the method comprises the steps that a preset vector classification model is generated according to historical message data corresponding to a historical page; and under the condition that the target vector classification is identified, determining that the page to be detected is in a normal state. According to the embodiment of the invention, whether the page to be detected of the webpage is in a normal state can be accurately and efficiently judged.

Description

Page monitoring method, device, equipment and computer readable storage medium

Technical Field

The present invention belongs to the field of communication technologies, and in particular, to a method, an apparatus, a device, and a computer readable storage medium for page monitoring.

Background

When a user accesses a website, the website is required to be able to provide the correct web page content in order to be able to standardize the experience for the user. Therefore, in order to ensure the correctness of the webpage content, page monitoring is required, so that better page service is provided for users.

The existing page monitoring method has the defects of complex implementation process, higher performance requirement on monitoring equipment, poorer page error identification accuracy, lower page monitoring efficiency and the like when detecting whether the webpage is in a normal state, thereby limiting the application development of the method.

Disclosure of Invention

The embodiment of the invention provides a page monitoring method, device, equipment and a computer readable storage medium, which can accurately and efficiently judge whether a page to be detected of a webpage is in a normal state.

In a first aspect, an embodiment of the present invention provides a page monitoring method, including:

acquiring message data to be detected corresponding to a page to be detected;

determining a first element word segmentation set corresponding to message data to be detected;

converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected;

identifying target vector classification corresponding to the first word segmentation vector in the preset vector classification by using a preset vector classification model; the method comprises the steps that a preset vector classification model is generated according to historical message data corresponding to a historical page;

and under the condition that the target vector classification is identified, determining that the page to be detected is in a normal state.

In a second aspect, an embodiment of the present invention provides a page monitoring apparatus, including:

The message data acquisition module is configured to acquire message data to be detected corresponding to a page to be detected;

the first set acquisition module is configured to determine a first element word segmentation set corresponding to the message data to be detected;

the first vector acquisition module is configured to convert the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected;

the vector classification recognition module is configured to recognize target vector classification corresponding to the first word segmentation vector in the preset vector classification by using a preset vector classification model; the method comprises the steps that a preset vector classification model is generated according to historical message data corresponding to a historical page;

and the page state determining module is configured to determine that the page to be detected is in a normal state under the condition that the target vector classification is identified.

In a third aspect, an embodiment of the present invention provides a page monitoring apparatus, including: a processor and a memory storing computer program instructions;

the processor executes the computer program instructions to implement the page monitoring method according to the first aspect of the embodiment of the present invention.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where computer program instructions are stored, where the computer program instructions, when executed by a processor, implement the page monitoring method according to the first aspect of the embodiment of the present invention.

The method, the device, the equipment and the computer readable storage medium for monitoring the page can determine the first element word segmentation set corresponding to the page to be detected by utilizing the acquired message data to be detected corresponding to the page to be detected, generate the first word segmentation vector by utilizing the first element word segmentation set, and input the first word segmentation vector into the preset vector classification model generated according to the historical message data corresponding to the historical page with the same page type as the page to be detected, so that the target vector classification corresponding to the first word segmentation vector is identified in the preset vector classification model by utilizing the preset vector classification model, and the page to be detected is determined to be in a normal state only when the target vector classification is identified.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are needed to be used in the embodiments of the present invention will be briefly described, and it is possible for a person skilled in the art to obtain other drawings according to these drawings without inventive effort.

FIG. 1 is a flow chart of a method for page monitoring according to an embodiment of the present invention;

FIG. 2 is a flow chart of a page monitoring method according to another embodiment of the present invention;

FIG. 3 is a flowchart illustrating a page monitoring method according to another embodiment of the present invention;

FIG. 4 is a flow chart of a page monitoring method according to still another embodiment of the present invention;

FIG. 5 is a flow chart of a page monitoring method according to still another embodiment of the present invention;

FIG. 6 is a flow chart of a web page monitoring process according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a page monitoring apparatus according to an embodiment of the present invention;

fig. 8 is a schematic diagram of a hardware structure of a page monitoring apparatus according to an embodiment of the invention.

Detailed Description

Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and the detailed embodiments. It should be understood that the specific embodiments described herein are merely configured to illustrate the invention and are not configured to limit the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the invention by showing examples of the invention.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

In order to solve the problems in the prior art, the embodiment of the invention provides a page monitoring method, a device, equipment and a computer readable storage medium. The page monitoring method provided by the embodiment of the invention is firstly explained below.

Fig. 1 is a schematic flow chart of a page monitoring method according to an embodiment of the present invention. As shown in fig. 1, the page monitoring method may include:

S110, obtaining message data to be detected corresponding to a page to be detected;

s120, determining a first element word segmentation set corresponding to message data to be detected;

s130, converting the first element word segmentation set into a first word segmentation vector corresponding to a page to be detected;

s140, identifying target vector classification corresponding to the first word segmentation vector in the preset vector classification by using a preset vector classification model; the method comprises the steps that a preset vector classification model is generated according to historical message data corresponding to a historical page;

and S150, under the condition that the target vector classification is identified, determining that the page to be detected is in a normal state.

In the embodiment of the invention, the first element word segmentation set corresponding to the page to be detected can be determined by utilizing the acquired message data to be detected corresponding to the page to be detected, the first element word segmentation set is utilized to generate the first word segmentation vector, and then the first word segmentation vector is input into the preset vector classification model generated according to the historical message data corresponding to the historical page with the same page type as the page to be detected, so that the target vector classification corresponding to the first word segmentation vector is identified in the preset vector classification model by utilizing the preset vector classification model, and the page to be detected is determined to be in a normal state only when the target vector classification is identified.

Therefore, the page monitoring method provided by the embodiment of the invention adopts a non-invasive design, and is free from modification such as page embedding, so that development modification and performance influence on a website system can be avoided.

The page monitoring method of the embodiment of the invention can be applied to a background server of a website system, and the background server can be a high-performance electronic calculator for storing and processing data.

In step S110 of the embodiment of the present invention, after the background server generates a web page in response to a user operation, the generated web page may be used as a to-be-detected page, and to-be-detected message data corresponding to the to-be-detected page may be obtained, so as to determine whether the to-be-detected page is in a normal state according to the to-be-detected message data.

In the embodiment of the invention, the to-be-detected webpage data field of the to-be-detected webpage can be acquired in real time or near real time from the background server serving as a data source, and the to-be-detected message data is extracted from the to-be-detected webpage data field. The data fields of the web pages to be detected are shown in table 1.

TABLE 1 to-be-detected Web page data field

Fields	Chinese character
		REQUEST_ID	Request ID
REQUEST_BEGIN_TIME	Request start time
		REQUEST_URI	Request URI
REQUEST_METHOD	Request method
		REQUEST_HEADER	Request header
REQUEST_PAYLOAD	Requesting data
		RESPONSE_CODE	Response status code
RESPONSE_HEADER	Response head
		RESPONSE_PAYLOAD	Response data

In step S120 of some embodiments of the present invention, a specific method for determining a first element word segmentation set corresponding to-be-detected message data may include:

deleting frame structure codes in the message data to be detected to obtain corpus data to be detected;

and performing word segmentation processing on the corpus data to be detected to obtain a first element word segmentation set corresponding to the webpage to be detected.

In some embodiments of the present invention, if the message format is hypertext markup language (HyperText Markup Language, HTML) format, the Python BeautifulSoup library may be used to perform HTML parsing, remove frame structure codes such as an interpreted script language (JavaScript, JS) code, a cascading style sheet (Cascading Style Sheets, CSS) code, and an HTML tag in the message data to be detected, and preserve text content as corpus data to be detected. In other embodiments of the present invention, if the message format is a static file such as JS format, CSS format, etc., no frame structure code may be in the static file, and no processing may be performed. In still other embodiments of the present invention, if the message format is JS object numbered musical notation (JavaScript Object Notation, JSON) format, extensible markup language (eXtensible Markup Language, XML) format, etc., annotation information as the frame structure code is removed, and corpus data to be detected is obtained. The corpus data to be detected can comprise at least one of short text and long text.

In the embodiment of the invention, word segmentation processing is carried out on corpus data to be detected to obtain a first element word segmentation set corresponding to a webpage to be detected, and information such as words, sentences, word and sentence positions and the like of each element word segmentation in the first element word segmentation set is recorded.

Continuing taking the to-be-detected message data obtained from the to-be-detected webpage data field shown in table 1 as an example, extracting the information such as words and sentences in the to-be-detected corpus data corresponding to the to-be-detected message data by using a Python bargain word bank to obtain a first element word segmentation set shown in table 2.

TABLE 2 first element word segmentation set table

In step S120 of other embodiments of the present invention, a specific method for determining a first element word segmentation set corresponding to the message data to be detected may further be: and directly performing word segmentation processing on the message data to be detected to obtain a first element word segmentation set corresponding to the message data to be detected.

In step S130 of some embodiments of the present invention, a specific method for converting a first element word segmentation set into a first word segmentation vector corresponding to a to-be-detected page may include:

generating a first word segmentation vector corresponding to a page to be detected according to the existence condition of each element word in the first word segmentation vector of a preset element word segmentation set; the preset element word segmentation set is an element word segmentation set corresponding to the page type of the page to be detected.

In some embodiments, the page type of the page to be detected includes a uniform resource identification type and/or a method type.

Specifically, the page type of the page to be detected may be determined according to the request_uri field, i.e., the uniform resource identification type, in the web page data field to be detected as shown in table 1, the page type of the page to be detected may be determined according to the request_method field, i.e., the METHOD type, in the web page data field to be detected as shown in table 1, and the page type of the page to be detected may be determined according to the request_uri field and the request_method field in the web page data field to be detected as shown in table 1.

Taking the example of determining the page type of the page to be detected according to the REQUEST_URI field and the REQUEST_METHOD field, only when the REQUEST_URI field and the REQUEST_METHOD field in the two webpage data fields are identical, the page types of the two pages corresponding to the two webpage data fields are confirmed to be identical.

In other embodiments, the page type of the page to be detected may also be determined according to the request_payoad field, the request_uri field, and the request_method field in the data field of the web page to be detected as shown in table 1.

The more fields used to determine the page type, the more accurate the classification of the page type, so that the more accurate the monitoring result.

In the embodiment of the invention, the preset element word segmentation set is an element word segmentation set corresponding to the page type of the page to be detected, and the acquisition method comprises the following steps: acquiring all element word segments in a second element word segment set of all history pages corresponding to the page types of the page to be detected, analyzing all acquired elements to form a word segment set, and performing de-duplication processing on the formed word segment set to obtain the element word segment set.

The specific method for obtaining the second element word segmentation set of all the history pages corresponding to the page types of the page to be detected will be described in detail below.

Specifically, based on a preset element word segmentation set, the method for synthesizing and converting the first word segmentation vector into the first word segmentation vector corresponding to the page to be detected comprises the following steps:

setting the word segmentation position of the element word in the first word segmentation vector in the preset element word segmentation set, where the element word in the first word segmentation vector exists in the preset element word segmentation set, as 1, setting the word segmentation position of the element word in the preset element word segmentation set, where the element word in the first word segmentation vector does not exist in the preset element word segmentation set, where the element word in the preset element word segmentation set exists, as 0, and generating a first word segmentation vector with vector dimension being the word segmentation number of the preset element word segmentation set.

In step S130 of other embodiments of the present invention, a specific method for converting a first element word segmentation set into a first word segmentation vector corresponding to a to-be-detected page may include:

deleting target word segmentation in the first element word segmentation set to obtain a target element word segmentation set; the target word segmentation comprises at least one of low-frequency words, error keywords and noise words, wherein the occurrence frequency of the low-frequency words in the second element word segmentation set is lower than a preset frequency threshold value, the error keywords exist in the error keyword set, and the noise words exist in the noise word dictionary;

generating a first word segmentation vector corresponding to a page to be detected according to the existence condition of each element word in a target element word segmentation set; the preset element word segmentation set is an element word segmentation set corresponding to the page type of the page to be detected.

The error keywords in the error keyword set may be keywords corresponding to abnormal data for prompting page content errors, such as "page does not exist", and the noise words in the noise word dictionary may be language stop words, punctuation marks, specific words, specific sentences, and the like, for example, specific words such as "statement" and "record".

Continuing to take the first element word segmentation set shown in table 2 as an example, deleting the target word segmentation in the first element word segmentation set to obtain the target element word segmentation set shown in table 3.

TABLE 3 target element word segmentation set table

The preset vector classification model in step S140 of the embodiment of the present invention may be generated according to the historical message data corresponding to the historical page before the page to be detected is monitored.

In the embodiment of the invention, in order to rapidly and accurately monitor the page to be detected, the page type of the page to be detected needs to be determined and is input into a preset vector classification model with the same page type.

Therefore, when the preset vector classification model is generated, the preset vector classification model of different page types needs to be generated by using the historical message data corresponding to the historical pages of different page types.

In the following, a specific description will be given by taking a method for generating a preset vector classification model with the same page type as that of a page to be detected as an example, and the specific method for generating the preset vector classification model is as follows:

acquiring historical message data corresponding to a historical page with the same page type as the page to be detected;

determining a second element word segmentation set corresponding to the historical message data;

converting the second element word segmentation set into a second word segmentation vector corresponding to the history page;

clustering the second word vectors to obtain preset vector classification;

And generating a preset vector classification model according to the center vector of the preset vector classification.

According to the method for determining the page type, the page type of each history page can be determined according to the history webpage data field corresponding to each history page, and the history message data corresponding to the history page with the same page type as the page to be detected in the preset time period is obtained.

In some embodiments, a specific method for determining a second element word segmentation set corresponding to historical packet data may include:

deleting the frame structure codes in the historical message data to obtain historical corpus data;

and performing word segmentation processing on the historical corpus data to obtain a second element word segmentation set corresponding to the historical page.

The specific method for deleting the frame structure code in the historical message data is the same as the specific method for deleting the frame structure code in the message data to be detected, and will not be described herein.

In some embodiments, a specific method for converting the second element word segmentation set into a second word segmentation vector corresponding to the history page may include:

deleting target word segmentation in the second element word segmentation set to obtain a third element word segmentation set; the target word segmentation comprises at least one of low-frequency words, error keywords and noise words, wherein the occurrence frequency of the low-frequency words in the second element word segmentation set is lower than a preset frequency threshold value, the error keywords exist in the error keyword set, and the noise words exist in the noise word dictionary;

Generating a second word segmentation vector corresponding to the history page according to the existence condition of each element segmentation in the third element segmentation set; the preset element word segmentation set is an element word segmentation set corresponding to the page type of the page to be detected.

The method for generating the second word-segmentation vector corresponding to the history page is the same as the method for generating the first word-segmentation vector corresponding to the page to be detected, and will not be described herein.

In the embodiment of the invention, the target word in the second element word segmentation set is deleted, especially the error keyword in the error keyword set is deleted, and then the second word vector is generated by the obtained third element word segmentation set so as to train a preset vector classification model, so that the page characteristics of the error page can be filtered and filtered in the process of training the model by using historical data, and the interference of the error page on the accuracy of page monitoring is avoided.

In the embodiment of the present invention, a specific method for clustering the second word vectors to obtain the preset vector classification may be:

generating a vector matrix by using a plurality of second word vectors, wherein each of the vectors acts as a second word vector;

Based on a preset neighborhood radius eps, the minimum point number MinPts in the preset radius and a preset distance algorithm metric, the vector matrix performs DBSCAN clustering calculation to obtain at least one preset vector class and a center vector of each preset vector class.

In the embodiment of the present invention, a specific method for generating a preset vector classification model according to a center vector of preset vector classification may include:

and storing the classification label of each preset vector classification and the center vector in an associated manner to obtain a preset vector classification model.

In summary, in the embodiment of the invention, the generated preset vector classification model is simple and flexible, has strong universality, can be applied to different website systems according to needs, and can be used in a large-scale manner in a transverse capacity expansion manner.

In other embodiments of the present invention, a part of the second word vector may be used for training, another part may be used for testing, and then the preset frequency threshold, the preset neighborhood radius eps, the minimum number of points in the preset radius MinPts, and the preset distance algorithm metric may be changed by using a gradient algorithm, so as to repeatedly use the second word vector for training to execute each step in the specific generation method of the preset vector classification model, and use the second word vector for testing to obtain the test result of the preset vector classification model generated each time. Wherein the test results include at least one of coverage and accuracy.

And finally, selecting a set of parameter generated preset vector classification models with optimal test results for carrying out page monitoring of the page to be detected so as to improve the accuracy of the page monitoring.

Thus, in step S140 of some embodiments of the present invention, a specific method for identifying a target vector classification corresponding to a first word segmentation vector in a preset vector classification by using a preset vector classification model may include:

calculating the vector distance between the center vector of the preset vector classification in the preset vector classification model and the first word segmentation vector;

determining a target center vector of which the vector distance from the first word segmentation vector is smaller than or equal to a preset distance threshold value in the center vectors;

and identifying the preset vector classification corresponding to the target center vector as the target vector classification corresponding to the first word segmentation vector.

In step S150 of some embodiments of the present invention, under the condition that the target vector classification is identified, a classification label of the target vector classification may be obtained, and since the page feature of the error page is filtered in the preset vector classification model generated in the embodiment of the present invention, the preset vector classification model may be considered to be generated based on the history packet data corresponding to the correct page, and thus, the preset vector classification that can be identified by the preset vector classification model is the vector classification of the correct page, if the classification label of the target vector classification is obtained, it may be determined that the page to be detected is in a normal state, and if the target vector classification is not identified, it may be determined that the page to be detected is in an abnormal state.

In summary, the embodiment of the invention can automatically identify whether the page to be detected is in a normal state by using the preset vector classification model, and avoid the problem caused by manually analyzing the page characteristics.

Fig. 2 is a schematic flow chart of a page monitoring method according to another embodiment of the present invention. As shown in fig. 2, the page monitoring method may include:

s210, acquiring a response status code of a page to be detected;

s220, under the condition that the response state code is an abnormal state code, determining that the page to be detected is in an abnormal state;

s230, under the condition that the state code is a normal state code, obtaining message data to be detected corresponding to a page to be detected;

s240, determining a first element word segmentation set corresponding to the message data to be detected;

s250, converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected;

s260, identifying target vector classification corresponding to the first word segmentation vector in the preset vector classification by using a preset vector classification model; the method comprises the steps that a preset vector classification model is generated according to historical message data corresponding to a historical page;

and S270, under the condition that the target vector classification is identified, determining that the page to be detected is in a normal state.

In the embodiment of the invention, the response status code of the page to be detected can be obtained from the data field of the web page to be detected shown in table 1, and because the page to be detected can be directly determined to be in an abnormal status when the response status code is in an abnormal status, the method further utilizes a preset vector classification model to monitor whether the page to be detected is in a normal status only when the status code is in a normal status.

Steps S230 to S270 in this embodiment are the same as steps S110 to S150 in the embodiment shown in fig. 1, and are not described here.

Therefore, the embodiment of the invention can combine the response state code and the preset vector classification model to form the judging tree so as to monitor whether the page to be detected is in a normal state, can effectively cover the situation that the state code is normal and the page content is abnormal, and has the characteristics of high identification coverage rate and high identification accuracy.

Fig. 3 is a schematic flow chart of a page monitoring method according to another embodiment of the present invention. As shown in fig. 3, the page monitoring method may include:

s310, acquiring an MD5 value of a to-be-detected information abstract algorithm corresponding to a to-be-detected page;

s320, under the condition that a target MD5 value which is the same as the MD5 value to be detected exists in the pre-stored MD5 values, determining that the page to be detected is in a normal state;

s330, under the condition that a target MD5 value which is the same as the MD5 value to be detected does not exist in the pre-stored MD5 values, obtaining message data to be detected, which corresponds to a page to be detected;

s340, determining a first element word segmentation set corresponding to the message data to be detected;

s350, converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected;

S360, identifying target vector classification corresponding to the first word segmentation vector in the preset vector classification by using a preset vector classification model; the method comprises the steps that a preset vector classification model is generated according to historical message data corresponding to a historical page;

and S370, under the condition that the target vector classification is identified, determining that the page to be detected is in a normal state.

In the embodiment of the invention, the pre-stored MD5 value can be the MD5 value corresponding to the normal history page, and if the pre-stored MD5 value has the target MD5 value identical to the MD5 value to be detected, the page message to be detected can be determined to be the same as the full text of the normal history page message, and the page to be detected can be directly determined to be in the normal state, so that the method further monitors whether the page to be detected is in the normal state by using the preset vector classification model only if the pre-stored MD5 value does not have the target MD5 value identical to the MD5 value to be detected.

The steps S330-S370 in this embodiment are the same as the steps S110-S150 in the embodiment described in fig. 1, and are not described here.

Therefore, the embodiment of the invention can combine the MD5 value and the preset vector classification model into the judging tree to monitor whether the page to be detected is in a normal state, can effectively cover the condition that the MD5 value is abnormal and the page content is normal, and has the characteristics of high identification coverage rate and high identification accuracy.

Fig. 4 is a schematic flow chart of a page monitoring method according to still another embodiment of the present invention. As shown in fig. 4, the page monitoring method may include:

s410, obtaining message data to be detected corresponding to a page to be detected;

s420, determining a first element word segmentation set corresponding to message data to be detected;

s430, acquiring a correct keyword set and an incorrect keyword set;

s440, if the target keywords are correct keywords in the first element word segmentation set and the target keywords are included in the correct keywords in the error keywords set, determining that the page to be detected is in a normal state; if the target keyword is an error keyword, determining that the page to be detected is in an abnormal state;

s450, converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected under the condition that the first element word segmentation set does not comprise target keywords in the correct keyword set and the error keyword set;

s460, identifying target vector classification corresponding to the first word segmentation vector in the preset vector classification by using a preset vector classification model; the method comprises the steps that a preset vector classification model is generated according to historical message data corresponding to a historical page;

And S470, under the condition that the target vector classification is identified, determining that the page to be detected is in a normal state.

In the embodiment of the invention, before converting the first element word segmentation set into the first word segmentation vector corresponding to the page to be detected, the correct keyword set and the error keyword set can be used for monitoring whether the page to be detected is in a normal state, and only when the first element word segmentation set does not include the target keywords in the correct keyword set and the error keyword set, the first element word segmentation set is converted into the first word segmentation vector corresponding to the page to be detected, and a preset vector classification model is further used for monitoring whether the page to be detected is in a normal state.

The principle of steps S410-S420 and steps S350-S370 in this embodiment is the same as that of steps S110-S150 in the embodiment described in fig. 1, and will not be described here again.

Therefore, the embodiment of the invention can combine the correct keyword set, the wrong keyword set and the preset vector classification model into the judging tree so as to monitor whether the page to be detected is in a normal state, can effectively cover the condition that the correct keywords and the wrong keywords are not detected, and has the characteristics of high recognition coverage rate and high recognition accuracy.

Fig. 5 is a schematic flow chart of a page monitoring method according to still another embodiment of the present invention. As shown in fig. 5, the page monitoring method may include:

s510, obtaining message data to be detected corresponding to a page to be detected;

s520, determining a first element word segmentation set corresponding to the message data to be detected;

s530, converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected;

s540, identifying target vector classification corresponding to the first word segmentation vector in the preset vector classification by using a preset vector classification model; the method comprises the steps that a preset vector classification model is generated according to historical message data corresponding to a historical page;

s550, under the condition that the target vector classification is identified, determining that the page to be detected is in a normal state;

s560, if the target vector classification is not recognized, determining a first similarity of the first element word segmentation set and an element word segmentation set template corresponding to the page type of the page to be detected; the method comprises the steps that element word segmentation set templates are generated according to template vectors corresponding to preset vector classifications corresponding to page types of a page to be detected and preset element word segmentation sets corresponding to the page types of the page to be detected, and the template vectors are obtained by carrying out bit-wise and calculation on all vectors in the preset vector classifications;

S570, determining that the page to be detected is in a normal state when the maximum similarity in the first similarity is greater than or equal to a first similarity threshold.

Specifically, in the specific generation method of the preset vector classification model, after the preset vector classification is obtained, all vectors corresponding to each preset vector classification can be determined according to the center vector and the preset neighborhood radius of the preset vector classification, all vectors and the center vector are subjected to bitwise and calculation to obtain a template vector, and then the template vector and a preset element word segmentation set corresponding to the page type of the page to be detected are utilized to generate an element word segmentation set template shown in table 4.

Table 4 element word segmentation set template

The principle of S510-S550 in this embodiment is the same as that of steps S110-S150 in the embodiment described in fig. 1, and will not be described here.

In the embodiment of the invention, the character string similarity comparison can be performed on the first element word segmentation set and the element word segmentation set templates corresponding to all preset vector classifications corresponding to the page types of the pages to be detected, so that the first similarity is obtained, and the pages to be detected are determined to be in a normal state under the condition that the maximum similarity in the first similarity is greater than or equal to the first similarity threshold value.

Therefore, the embodiment of the invention can avoid the interference caused by the error history page by verifying the element word segmentation set template.

In some embodiments of the present invention, the page monitoring method may further include:

and under the condition that the maximum similarity in the first similarity is smaller than a first similarity threshold value, determining that the page to be detected is in an abnormal state.

In other embodiments of the present invention, the page monitoring method may further include:

determining a second similarity of the first element word segmentation set and a second element word segmentation set corresponding to the history page under the condition that the maximum similarity in the first similarity is smaller than a first similarity threshold;

under the condition that the maximum similarity in the second similarity is greater than or equal to a second similarity threshold value, determining that the page to be detected is in a normal state;

and under the condition that the maximum similarity in the second similarity is smaller than a second similarity threshold value, determining that the page to be detected is in an abnormal state.

The second element word segmentation set corresponding to the history page is the second element word segmentation set corresponding to the history message data determined according to the history message data corresponding to the history page with the same page type as the page to be detected.

In the embodiment of the invention, the character string similarity can be compared between the first element word segmentation set and the second element word segmentation set corresponding to the history page to obtain the second similarity, the page to be detected is determined to be in a normal state when the maximum similarity in the second similarity is greater than or equal to the second similarity threshold value, and the page to be detected is determined to be in an abnormal state when the maximum similarity in the second similarity is less than the second similarity threshold value.

Therefore, the embodiment of the invention can also compare the first element word segmentation set with the second element word segmentation set corresponding to the history accurate message, and avoid the interference caused by the error history page by verifying the second element word segmentation set.

It should be noted that in the embodiment of the present invention, the preset vector classification model may be further combined with at least two of the response status code, the MD5 value, the correct keyword set and the error keyword set, the element word segmentation set template corresponding to the page type of the page to be detected, and the second element word segmentation set corresponding to the history page, so as to perform page monitoring, thereby further improving the monitoring comprehensiveness.

FIG. 6 is a flow chart illustrating a web page monitoring process according to an embodiment of the present invention. As shown in fig. 6, specific steps of the web page monitoring process may include:

S601, acquiring a to-be-detected webpage data field of a to-be-detected webpage;

s602, judging whether a response status code in a data field of the webpage to be detected is a normal status code, if so, executing a step S603, and if not, determining that the webpage to be detected is in an abnormal status, and ending the monitoring of the webpage;

s603, judging whether a target MD5 value which is the same as the MD5 value to be detected of the page to be detected exists in the pre-stored MD5 values, if not, executing a step S604, and if so, determining that the page to be detected is in a normal state, and ending page monitoring;

s604, extracting to-be-detected message data from a to-be-detected webpage data field, and determining a first element word segmentation set corresponding to the to-be-detected message data;

s605, judging whether the first element word segmentation set comprises target keywords in a correct keyword set and a wrong keyword set, if not, executing step S606, if so, determining that the page to be detected is in a normal state, and if so, determining that the page to be detected is in an abnormal state, and ending page monitoring;

s606, converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected;

S607, identifying target vector classification corresponding to the first word segmentation vector in the preset vector classification by using a preset vector classification model;

s608, judging whether the target vector classification is identified, if not, executing a step S609, and if so, determining that the page to be detected is in a normal state, and ending the page monitoring;

s609, determining a first similarity of a first element word segmentation set and an element word segmentation set template corresponding to the page type of the page to be detected;

s610, judging whether the maximum similarity in the first similarity is larger than or equal to a first similarity threshold value, if not, executing a step S611, if so, determining that the page to be detected is in a normal state, and ending page monitoring;

s611, determining a second similarity of the first element word segmentation set and a second element word segmentation set corresponding to the history page;

s612, judging whether the maximum similarity in the second similarity is larger than or equal to a second similarity threshold value, if so, determining that the page to be detected is in a normal state, and if not, determining that the page to be detected is in an abnormal state, and ending page monitoring.

In the embodiment of the present invention, according to the above-mentioned web page monitoring process, the monitoring result shown in table 5 may be obtained.

Table 5 table of monitoring results

The page monitoring method provided by the embodiment of the invention can be suitable for website systems with different system versions, can be used for accurately identifying the playback request of the test environment and the page content in the actual running process of the website system, has higher coverage rate of page error identification, can achieve 100% coverage, does not need to design and maintain test scripts, improves the page monitoring efficiency and speaks the labor cost.

Fig. 7 is a schematic structural diagram of a page monitoring apparatus according to an embodiment of the present invention. As shown in fig. 7, the page monitoring apparatus may include:

the message data acquisition module 710 is configured to acquire to-be-detected message data corresponding to the to-be-detected page;

the first set obtaining module 720 is configured to determine a first element word segmentation set corresponding to the message data to be detected;

the first vector obtaining module 730 is configured to convert the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected;

the vector classification recognition module 740 is configured to recognize a target vector classification corresponding to the first word segmentation vector in the preset vector classification by using a preset vector classification model; the method comprises the steps that a preset vector classification model is generated according to historical message data corresponding to a historical page;

The page status determining module 750 is configured to determine that the page to be detected is in a normal status if the target vector classification is identified.

The page monitoring device provided by the embodiment of the invention can be applied to a background server of a website system, and the background server can be a high-performance electronic calculator for storing and processing data.

In some embodiments of the invention, the page type includes a uniform resource identification type and/or a method type.

In some embodiments of the present invention, the first set acquisition module 720 may be specifically configured to: deleting frame structure codes in the message data to be detected to obtain corpus data to be detected; and performing word segmentation processing on the corpus data to be detected to obtain a first element word segmentation set corresponding to the webpage to be detected.

In some embodiments of the present invention, the first vector acquisition module 730 may be specifically configured to: generating a first word segmentation vector corresponding to a page to be detected according to the existence condition of each element word in the first word segmentation vector of a preset element word segmentation set; the preset element word segmentation set is an element word segmentation set corresponding to the page type of the page to be detected.

In some embodiments of the present invention, the page monitoring apparatus may further include a classification model generation module configured to: acquiring historical message data corresponding to a historical page with the same page type as the page to be detected; determining a second element word segmentation set corresponding to the historical message data; converting the second element word segmentation set into a second word segmentation vector corresponding to the history page; clustering the second word vectors to obtain preset vector classification; and generating a preset vector classification model according to the center vector of the preset vector classification.

In these embodiments, vector classification identification module 740 may be specifically configured to: calculating the vector distance between the center vector of the preset vector classification in the preset vector classification model and the first word segmentation vector; determining a target center vector of which the vector distance from the first word segmentation vector is smaller than or equal to a preset distance threshold value in the center vectors; and identifying the preset vector classification corresponding to the target center vector as the target vector classification corresponding to the first word segmentation vector.

In some embodiments of the present invention, the classification model generation module may be specifically configured to: deleting the frame structure codes in the historical message data to obtain historical corpus data; and performing word segmentation processing on the historical corpus data to obtain a second element word segmentation set corresponding to the historical page.

In other embodiments of the present invention, the classification model generation module may be further specifically configured to: deleting target word segmentation in the second element word segmentation set to obtain a third element word segmentation set; the target word segmentation comprises at least one of low-frequency words, error keywords and noise words, wherein the occurrence frequency of the low-frequency words in the second element word segmentation set is lower than a preset frequency threshold value, the error keywords exist in the error keyword set, and the noise words exist in the noise word dictionary; generating a second word segmentation vector corresponding to the history page according to the existence condition of each element segmentation in the third element segmentation set; the preset element word segmentation set is an element word segmentation set corresponding to the page type of the page to be detected.

In some embodiments of the present invention, the page monitoring apparatus may further include a status code acquisition module configured to: acquiring a response status code of a page to be detected;

in these embodiments, the message data acquisition module 710 may be specifically configured to: and under the condition that the response state code is the normal state code, acquiring the message data to be detected corresponding to the page to be detected.

In some embodiments of the present invention, the page monitoring apparatus may further include an MD5 value acquisition module configured to: acquiring an MD5 value of a to-be-detected information abstract algorithm corresponding to a to-be-detected page;

in these embodiments, the message data acquisition module 710 may be specifically configured to: and under the condition that the target MD5 value which is the same as the MD5 value to be detected does not exist in the prestored MD5 value, acquiring the message data to be detected corresponding to the page to be detected.

In some embodiments of the present invention, the page monitoring apparatus may further include a keyword set obtaining module configured to: acquiring a correct keyword set and an incorrect keyword set;

wherein the first vector acquisition module 730 may be specifically configured to: and under the condition that the first element word segmentation set does not comprise target keywords existing in the correct keyword set and the error keyword set, converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected.

In some embodiments of the present invention, the page status determination module 750 may be further configured to:

if the target vector classification is not recognized, determining a first similarity of the first element word segmentation set and an element word segmentation set template corresponding to the page type of the page to be detected; the method comprises the steps that element word segmentation set templates are generated according to template vectors corresponding to preset vector classifications corresponding to page types of a page to be detected and preset element word segmentation sets corresponding to the page types of the page to be detected, and the template vectors are obtained by carrying out bit-wise and calculation on all vectors in the preset vector classifications;

and under the condition that the maximum similarity in the first similarity is greater than or equal to a first similarity threshold value, determining that the page to be detected is in a normal state.

It should be noted that, the functions implemented by the above modules are similar to the principle of each step in the method embodiment, and the obtained effects are the same, which is not described herein.

The page monitoring method and the page monitoring device can be realized by the page monitoring equipment. Fig. 8 shows a schematic hardware configuration of a page monitoring apparatus 800 according to an embodiment of the invention.

As shown in fig. 8, the page monitoring device 800 includes an input device 801, an input interface 802, a central processor 803, a memory 804, an output interface 805, and an output device 806. The input interface 802, the central processor 803, the memory 804, and the output interface 805 are connected to each other through a bus 810, and the input device 801 and the output device 806 are connected to the bus 810 through the input interface 802 and the output interface 805, respectively, and further connected to other components of the page monitor device 800.

Specifically, the input device 801 receives input information from the outside and transmits the input information to the central processor 803 through the input interface 802; the central processor 803 processes the input information based on computer executable instructions stored in the memory 804 to generate output information, temporarily or permanently stores the output information in the memory 804, and then transmits the output information to the output device 806 through the output interface 805; the output device 806 outputs the output information to the outside of the page monitoring device 800 for use by the user.

That is, the page monitoring apparatus shown in fig. 8 may also be implemented to include: a memory storing computer-executable instructions; and a processor that, when executing computer-executable instructions, implements the page monitoring methods and apparatus described in embodiments of the present invention.

Embodiments of the present invention also provide a computer readable storage medium having computer program instructions stored thereon; the computer program instructions, when executed by the processor, implement the page monitoring method provided by the embodiments of the present invention.

The functional blocks shown in the above block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.

It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.

In the foregoing, only the specific embodiments of the present invention are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present invention is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present invention, and they should be included in the scope of the present invention.

Claims

1. A method of page monitoring, comprising:

acquiring message data to be detected corresponding to a page to be detected;

determining a first element word segmentation set corresponding to the message data to be detected;

Identifying target vector classification corresponding to the first word segmentation vector in preset vector classification by using a preset vector classification model; the preset vector classification model is generated according to historical message data corresponding to the historical page;

under the condition that the target vector classification is identified, determining that the page to be detected is in a normal state;

wherein, still include:

acquiring the historical message data corresponding to the historical page with the same page type as the page to be detected;

clustering the second word vectors to obtain preset vector classification;

generating the preset vector classification model according to the center vector of the preset vector classification, wherein the preset vector classification model is the same as the page type of the page to be detected;

the obtaining the message data to be detected corresponding to the page to be detected includes:

collecting to-be-detected webpage data fields of the to-be-detected webpage, and extracting to-be-detected message data from the to-be-detected webpage data fields;

The determining the first element word segmentation set corresponding to the message data to be detected comprises the following steps:

deleting the frame structure codes in the message data to be detected to obtain corpus data to be detected;

and performing word segmentation processing on the corpus data to be detected to obtain the first element word segmentation set corresponding to the webpage to be detected.

2. The method of claim 1, wherein the identifying, using a predetermined vector classification model, a target vector classification of a predetermined vector classification corresponding to the first word segmentation vector comprises:

3. The method of claim 1, wherein the determining the second set of element words corresponding to the historical packet data comprises:

And performing word segmentation processing on the historical corpus data to obtain the second element word segmentation set corresponding to the historical page.

4. The method of claim 1, wherein the converting the second set of element words into the second word-segmentation vector corresponding to the history page comprises:

deleting target word segmentation in the second element word segmentation set to obtain a third element word segmentation set; the target word segmentation comprises at least one of low-frequency words, error keywords and noise words, wherein the occurrence frequency of the low-frequency words in the second element word segmentation set is lower than a preset frequency threshold value, the error keywords exist in the error keyword set, and the noise words exist in a noise word dictionary;

generating the second word segmentation vector corresponding to the history page according to the existence condition of each element segmentation in the third element segmentation set; the preset element word segmentation set is an element word segmentation set corresponding to the page type of the page to be detected.

5. The method of claim 1, wherein the page type comprises a uniform resource identification type and/or a method type.

6. The method of claim 1, wherein the converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected comprises:

Generating a first word segmentation vector corresponding to the page to be detected according to the existence condition of each element word segmentation of a preset element word segmentation set in the first word segmentation vector; the preset element word segmentation set is an element word segmentation set corresponding to the page type of the page to be detected.

7. The method of claim 1, wherein before the obtaining the message data to be detected corresponding to the page to be detected, further comprises:

acquiring a response status code of the page to be detected;

and under the condition that the response state code is a normal state code, acquiring the message data to be detected corresponding to the page to be detected.

8. The method of claim 1, wherein before the obtaining the message data to be detected corresponding to the page to be detected, further comprises:

acquiring an MD5 value of a to-be-detected information abstract algorithm corresponding to the to-be-detected page;

and under the condition that the target MD5 value which is the same as the MD5 value to be detected does not exist in the prestored MD5 value, acquiring the message data to be detected corresponding to the page to be detected.

9. The method of claim 1, wherein prior to said converting the first set of element segmentations into the first segmentations vector corresponding to the page to be detected, further comprising:

acquiring a correct keyword set and an incorrect keyword set;

the converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected includes:

and under the condition that the first element word segmentation set does not comprise target keywords existing in the correct keyword set and the error keyword set, converting the first element word segmentation set into a first word segmentation vector corresponding to the page to be detected.

10. The method of claim 1, further comprising:

if the target vector classification is not recognized, determining a first similarity of the first element word segmentation set and an element word segmentation set template corresponding to the page type of the page to be detected; the element word segmentation set template is generated according to a template vector corresponding to a preset vector classification corresponding to the page type of the page to be detected and a preset element word segmentation set corresponding to the page type of the page to be detected, and the template vector is obtained by carrying out bitwise and calculation on all vectors in the preset vector classification;

11. The method of claim 10, further comprising:

determining a second similarity of the first element word segmentation set and a second element word segmentation set corresponding to the history page under the condition that the maximum similarity in the first similarity is smaller than the first similarity threshold;

determining that the page to be detected is in a normal state under the condition that the maximum similarity in the second similarity is greater than or equal to a second similarity threshold value;

and under the condition that the maximum similarity in the second similarity is smaller than the second similarity threshold value, determining that the page to be detected is in an abnormal state.

12. A page monitoring apparatus, the apparatus comprising:

The vector classification identification module is configured to identify target vector classification corresponding to the first word segmentation vector in preset vector classification by using a preset vector classification model; the preset vector classification model is generated according to historical message data corresponding to the historical page;

the page state determining module is configured to determine that the page to be detected is in a normal state under the condition that the target vector classification is identified;

wherein, still include:

the classification model generation module is configured to:

clustering the second word vectors to obtain preset vector classification;

the message data acquisition module is specifically configured to:

The first set acquisition module is specifically configured to:

13. A page monitoring apparatus, the apparatus comprising: a processor and a memory storing computer program instructions;

the method of page monitoring of any of claims 1-11 when executed by the processor.

14. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon computer program instructions, which when executed by a processor, implement the page monitoring method according to any of claims 1-11.