CN109829092B - Method for directionally monitoring webpage - Google Patents

Method for directionally monitoring webpage Download PDF

Info

Publication number
CN109829092B
CN109829092B CN201811604429.6A CN201811604429A CN109829092B CN 109829092 B CN109829092 B CN 109829092B CN 201811604429 A CN201811604429 A CN 201811604429A CN 109829092 B CN109829092 B CN 109829092B
Authority
CN
China
Prior art keywords
user
content
frame selection
webpage
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811604429.6A
Other languages
Chinese (zh)
Other versions
CN109829092A (en
Inventor
孙再连
吴谋荣
苏淮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yitong Intelligent Technology Group Co ltd
Original Assignee
Xiamen Etom Software Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Etom Software Technology Co ltd filed Critical Xiamen Etom Software Technology Co ltd
Priority to CN201811604429.6A priority Critical patent/CN109829092B/en
Publication of CN109829092A publication Critical patent/CN109829092A/en
Application granted granted Critical
Publication of CN109829092B publication Critical patent/CN109829092B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method for directionally monitoring a webpage, which is used for framing contents on the webpage, capturing each content in the framing and giving related information of each content, wherein the related information comprises a title, an abstract, a website, a webpage text and the like. According to the method, the webpage is framed, the related information of the framed content is directly obtained, the operation is simple and rapid, the content which is the same as the framed content and is not framed on the webpage can be automatically obtained, multiple framing of the webpage by a user is avoided, and the working efficiency of the user is improved. The method can also record historical frame selection operation, judge whether the corresponding contents of different frame selections are consistent, provide the contents crawled by the historical frame selection operation and related information thereof when the contents are consistent, avoid repeatedly crawling the webpage and wasting resources, and simultaneously add a manual supervision mechanism and a manual judgment mechanism, thereby improving the accuracy and reliability of the method.

Description

Method for directionally monitoring webpage
Technical Field
The invention relates to the technical field of webpage monitoring, in particular to a method for directionally monitoring a webpage.
Background
In the information explosion era, how to quickly and accurately acquire data in the face of mass data on the internet at present has become a strong appeal for individuals and enterprises.
Crawler tool and data acquisition product on the market are various at present, and some products use is simple directly perceived, but commonality, maintainability and accurate nature have more problem yet, and the concrete expression is as follows:
1. the data source is based on a strategy scheme customized by the product, and deep data customization cannot be completed;
2. the configuration process is very complex, has high requirements on personnel quality and can be completed by professional personnel;
3. a specific analyzer can only extract specific pages, and if the specific analyzer is to extract different columns of a plurality of different dynamic websites, a plurality of analyzers must be written, so that the complexity of the system is increased;
4. when some features of the target page are changed, such as the page link or the page layout is modified, the corresponding analyzer must make corresponding modifications, and if the target page is too many or changed too much, the difficulty of modifying the analyzer will be increased.
Therefore, based on the above situations, there is an urgent need in the market for a web page monitoring method that combines machine learning and user behavior trajectory monitoring technologies, has high versatility and good maintainability, and can make extraction of text data simpler and more accurate by using a natural language processing technology.
Disclosure of Invention
The invention provides a method for directionally monitoring a webpage, which aims to solve the technical problems and is characterized in that the method is simple and quick to operate and comprises the steps of framing contents on the webpage, capturing each content in the framing and giving related information of each content, wherein the related information comprises a title, an abstract, a website, a webpage text and the like.
Optionally, the frame selection adopts a screen capture positioning mode.
Optionally, the positioning information is obtained according to the user frame selection area, and then the positions of all elements in the webpage are compared with the positions of the user frame content, so as to preliminarily screen out the matching content, which is the content that the user wants to know.
Optionally, the positioning manner of the frame selection area may be: when in frame selection, the coordinates of the initial point of the frame selection are recorded as (X1, Y1), the coordinates of the end point are recorded as (X2, Y2), the initial point and the end point enclose a rectangular frame selection area, the coordinates of the frame selection area are superposed with the vertical and horizontal displacements caused by the user when the user pulls the webpage scroll bar, so as to obtain the absolute coordinate values of the initial point and the end point of the frame selection area, which are respectively (X1+ ScrollLeft, Y1+ ScrollTop) and (X2+ ScrollLeft, Y2+ ScrollTop), wherein the ScrollLeft is the value of the horizontal pulling of the webpage, and the ScrollTop is the value of the vertical pulling of the webpage.
And acquiring the coordinate of each content in the webpage, and recording the coordinate of any content A as (Xa, Ya), wherein the length of the content A is W, and the width of the content A is H.
And judging whether the element A on the webpage is contained in the area selected by the user frame by adopting an exclusion method, judging that the content A is not in the frame selection area when Xa + W < X1+ ScrollLeft, or X2+ ScrollLeft < Xa, or Ya + H < Y1+ ScrollTop, or Y2+ ScrollTop < Ya, or Xa < X1+ ScrollLeft, Ya < Y1+ ScrollTop, and Xa + W > X2+ ScrollLeft, and Ya + H > Y2+ ScrollTop, or otherwise, judging that the content A is in the frame selection area, and repeating the steps to obtain all the contents selected by the frame in the webpage.
Optionally, classifying and labeling the webpage source codes through machine learning; the track simulation of user frame selection is carried out on webpage contents concerned by the user through machine learning, user operation simulation, intelligent bid alignment and drill-down crawling, and the webpage is deeply mined, so that the contents which are not frame-selected by the user and are needed by the user in the webpage crawling.
Optionally, collecting all contents in the boxed region to form a set B, obtaining a first element B1 and a last element bn from the set B, analyzing the first element B1 and the last element bn to obtain a common parent node of the first element B1 and the last element bn, if the parent node hierarchies of the first element B1 and the last element bn are different, considering that the two elements are not of the same type, discarding the last element bn, then obtaining an element bn-1 again, analyzing the common parent node of the first element B1 and the element bn-1, and so on until an element bm which has a common parent node with B1 is found; analyzing whether the patterns of the b1 and the bm are the same, if the patterns of the b1 and the bm are different, discarding the bm element to obtain a bm-1 element again, analyzing the patterns of the b1 and the bm-1 element again, and so on until finding the elements b1 and bz which have a common pattern; respectively obtaining all father nodes of a b1 element and a bz element as list1 and listz, and comparing the same node with the maximum level of list1 and listz as node1, wherein the node1 is the nearest common father node of the b1 element and the bz element; and searching for an element having a common style with the b1 element by using the node1 node, and acquiring a set Y { b1, … …, bz }, wherein the set Y is the content required to be acquired by the user.
Optionally, comparing the real-time frame selection area of the user with the historical frame selection area or the historical frame selection areas of other users, and judging whether the real-time frame selection areas belong to the same frame selection area; when the judgment result shows that the two items are the same, acquiring real-time related information of the framed content according to historical framing; and when the judgment result is that the content is not the same, crawling the content which is not selected by the user in the frame and is needed by the user and the related information of the content. Through the judgment of the frame selection area, the same frame selection is not required to be crawled, the crawling frequency is reduced, and the resource waste caused by repeated crawling is avoided.
Optionally, the method for determining the same frame selection area includes constructing an SVM classifier by obtaining coordinates of a start point and coordinates of an end point (X1, Y1, X2, Y2) of frame selection by a user as input parameters, classifying the frame selection area according to the locations of all contents between the start point and the end point in the whole webpage, and determining the same area according to the classification result.
Optionally, the classifier adds a user supervision mechanism, and the user judges whether the classification result is the content concerned by the user, and adds the judgment result into the training set for the next training; the training set is cleaned and trained regularly, noise generated due to misoperation of a user is combined, a correct judgment result is finally stored in the training set, and the training result is called when the classifier is used, so that resource waste caused by repeated training is avoided.
Optionally, machine learning, supervised learning and reinforcement learning are continuously performed on the user directed frame selection behavior according to the identity characteristics of the user, so that automatic recommendation frame selection is intelligently performed on the content concerned by the user. The automatic recommendation frame selection is that a Bayesian classifier is used for carrying out classification training on user behavior data samples, when the recommendation frame selection meets the user requirements, the classifier automatically stores the user behaviors and recommendation results into a data sample library, when the recommendation frame selection does not meet the user requirements, a program automatically skips to a manual frame selection interface of a user, and simultaneously learns the user behaviors and the frame selection results, so that the accuracy of automatic recommendation frame selection of the user by the classifier is improved.
As can be seen from the above description of the present invention, compared with the prior art, the present invention has the following advantages:
1. the webpage is selected through the frame, the related information of the frame content is directly obtained, and the operation is simple and rapid;
2. the method and the device can automatically acquire the content which is the same as the frame selection content and is not selected by the frame selection on the webpage, avoid the repeated frame selection on the webpage by a user, and improve the working efficiency of the user;
3. the method can record historical frame selection operation, judge whether contents corresponding to different frame selections are consistent, and provide the contents acquired by the historical frame selection operation and related information thereof when the contents are consistent;
4. and a manual supervision mechanism and a manual judgment mechanism are added, so that the accuracy and the reliability of the method are improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Wherein:
FIG. 1 is a schematic flow chart illustrating a first embodiment of a method for directionally monitoring a web page according to the present invention;
FIG. 2 is a schematic flowchart of a third embodiment of a method for directionally monitoring a web page according to the present invention;
FIG. 3 is a flowchart illustrating a fourth embodiment of a method for directionally monitoring a web page according to the present invention;
FIG. 4 is a schematic diagram illustrating a fourth step of an embodiment of a method for directionally monitoring a web page according to the present invention;
fig. 5 is a schematic diagram of the steps of a five-user determination mechanism according to an embodiment of the method for directionally monitoring a web page.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The first embodiment is as follows: referring to fig. 1, a method for directionally monitoring a web page includes selecting a frame of content on the web page in a screen capture positioning manner, capturing each content selected in the frame, and providing related information of each content, where the related information includes a title, an abstract, a website, a web page text, and the like, and the operation is simple and fast.
Before frame selection, a user positions a target website to be captured, a whole picture is generated on a webpage, the user performs frame selection on the picture, screenshot is performed, data and content to be captured are selected, namely the picture identifies three events of mouse left key clicking, mouse moving and mouse button lifting of the user, so that a frame selection area is obtained, data and content expected by the user are captured, and the captured data are analyzed and displayed to the user.
In the second embodiment, on the basis of the first embodiment, in order to ensure that the obtained content is the content required by the user, the positioning information is obtained according to the user frame selection area, and then the positions of all elements in the webpage are compared with the positions of the user frame content, so that the matching content is preliminarily screened out, wherein the matching content is the content which the user wants to know.
In this embodiment, the positioning manner of the frame selection area may be: firstly, during frame selection, recording initial point coordinates of a mouse for frame selection as (X1, Y1), recording end point coordinates as (X2, Y2), wherein the initial point and the end point enclose a rectangular frame selection area, and a user may pull a scroll bar of a browser before screen capture, so that the coordinates of the frame selection area are superposed with up-down and left-right displacement caused when the user pulls the scroll bar of a webpage to obtain absolute coordinate values of the initial point and the end point of the frame selection area, which are respectively (X1+ ScrollLeft, Y1+ ScrollTop) and (X2+ ScrollLeft, Y2+ ScrollTop), wherein ScrollLeft is a value of left-right pulling of the webpage, and ScrollTop is a value of up-down pulling of the webpage.
Then, coordinates of each content in the web page are obtained, the coordinates of any content A are recorded as (Xa, Ya), the length of the content A is W, and the width of the content A is H.
And finally, judging whether the element A on the webpage is contained in the area selected by the user frame by adopting an exclusion method, and judging that the content A is not in the selected area when Xa + W < X1+ ScrollLeft, or X2+ ScrollLeft < Xa, or Ya + H < Y1+ ScrollTop, or Y2+ ScrollTop < Ya, or Xa < X1+ ScrollLeft, Ya < Y1+ ScrollTop, and Xa + W > X2+ ScrollLeft, and Ya + H > Y2+ ScrollTop, or otherwise, judging that the content A is in the selected area.
Repeating the three steps to obtain all the contents framed and selected in the webpage.
In the third embodiment, since the second embodiment only obtains the content selected by the user frame in the webpage, but cannot obtain the content that is not selected by the user frame and is also the content required by the user, the commonalities of the contents in the frame selection area need to be found, and then the webpage is crawled to obtain all the content required by the user in the webpage.
In the embodiment, the webpage source codes are classified and labeled through machine learning;
referring to fig. 2, the user operation behavior is analyzed by using artificial intelligence, specifically, the track simulation of user frame selection is performed on the webpage content concerned by the user through machine learning, user operation simulation, intelligent bid alignment, drill-down crawling, and deep mining is performed on the webpage, so that the content which is not frame-selected by the user and is required by the user in the webpage is obtained.
The method comprises the following specific steps:
firstly, collecting all contents in a boxed area to form a set B, obtaining a first element B1 and a last element bn from the set B, analyzing the first element B1 and the last element bn to obtain a common parent node of the first element B1 and the last element bn, if the parent node hierarchies of the first element B1 and the last element bn are different, considering that the two elements are not of the same type, discarding the last element bn, then obtaining an element bn-1 again, analyzing the common parent node of the first element B1 and the element bn-1, and so on until an element bm which has a common parent node with B1 is found;
then, whether the patterns of the b1 and the bm are the same or not is analyzed, if the patterns of the b1 and the bm are different, the bm element is discarded to obtain the bm-1 element again, the patterns of the b1 and the bm-1 element are analyzed again, and the like is carried out until the elements b1 and bz which have a common pattern are found;
then respectively acquiring all father nodes of the b1 element and the bz element as list1 and listz, and comparing the same node with the maximum level of list1 and listz as node1, wherein the node1 is the nearest common father node of the b1 element and the bz element;
finally, using the node1 node to find an element having a common style with the b1 element, and obtaining a set Y { b1, … …, bz }, where the set Y is the content that the user needs to obtain.
In the fourth embodiment, when the content of the web page is more, crawling the web page may consume a certain time, and if the crawling manner of the third embodiment is performed for each frame selection, the working efficiency may be low.
In different frame selection operations, the content selected by the frame and the content to be crawled out are possibly the same, if the content crawled in history can be provided for the user again and real-time relevant information of the corresponding content is provided, the webpage crawling times can be reduced, and the working efficiency is improved.
In this embodiment, training is performed by obtaining parameters framed by a user and corresponding contents thereof, where the parameters are coordinates of a starting point and coordinates of an ending point of framing (X1, Y1, X2, and Y2), and the parameters are used as inputs, and the corresponding framed contents are used as output classifications to construct an SVM classifier, so as to perform determination of the same region according to a classification result.
Comparing the real-time frame selection area of the user with the historical frame selection area or the historical frame selection areas of other users, and judging whether the real-time frame selection areas belong to the same frame selection area; when the judgment result shows that the two items are the same, acquiring real-time related information of the framed content according to historical framing; and when the judgment result is that the content is not the same, crawling the content which is not selected by the user in the frame and is needed by the user and the related information of the content.
Such as: a user A selects a microblog webpage frame every yesterday, selects three contents of sports, food and military, crawls the whole webpage by the system, crawls the contents which are not selected by the user in the frame and are needed by the user in the webpage, and provides related information of the contents, such as yesterday sports, food and military news titles and websites. The user B selects the microblog webpage frame at present, although the starting point and the end point of the frame selection are different, after judgment, the content in the frame selection is also sports, food and military content, namely the frame selection area of the user B is the same as the frame selection area of the user A, at the moment, the content which is not selected by the user and is needed by the user in the webpage does not need to be crawled again, the webpage crawling result of the user A is directly recommended to the user B, and then the real-time relevant information of the corresponding crawling result, such as news titles and websites of the sports, food and military, of the user B is recommended to the user B, namely the relevant information corresponding to the same content in the same frame selection area can be correspondingly updated due to different dates.
In other embodiments, please refer to fig. 3, on the basis of the fourth embodiment, a user supervision mechanism is added to the classifier, and the user determines the classification result, determines whether the content is the content concerned by the user, and adds the determination result to the training set for the next training; the training set is cleaned and trained regularly, noise generated due to misoperation of a user is combined, a correct judgment result is finally stored in the training set, and the training result is called when the classifier is used, so that resource waste caused by repeated training is avoided.
The specific steps refer to fig. 4:
1. the user selects a box and obtains input parameters (x1, y1, x2, y 2).
2. And inputting the parameters, classifying through an svm classifier constructed by machine learning to obtain a classification result (a standard frame selection area) and displaying on a frame selection page.
3. And the user judges whether the content of the classification result is correct or not and whether the content is the content required by the user, so that interaction is realized and a training set is perfected.
4. And if the user judges that the content of the standard frame selection area is not the wanted content, setting the acquired parameters into a new class, storing the new class into a database, crawling the webpage content, and outputting the result.
5. If the user judges that the content of the standard frame selection area is the desired content, acquiring the classified corresponding content, outputting the crawled result to avoid repeated crawl, and storing the parameter into a database to perfect the training set data.
6. The training set data is cleaned regularly, noise classes (actually pointed content repetition) generated by misoperation of a user are combined, and classification accuracy is improved.
7. Training the training set regularly, storing the training result, calling the training result when using the classifier, and avoiding resource waste caused by repeated training.
In a fifth embodiment, on the basis of the first embodiment, the second embodiment, the third embodiment or the fourth embodiment, the users are classified through the tags of the users, so that intelligent recommendation frame selection is realized, the tags include industries, positions and regions, machine learning, supervised learning and reinforcement learning are continuously performed on the user oriented frame selection behaviors according to the identity characteristics of the users, and therefore automatic recommendation frame selection is intelligently performed on the contents concerned by the users. And automatically recommending the frame selection. And the intelligent recommendation frame is also added with a user judgment mechanism, user behavior data sample classification training is carried out through a Bayesian classifier, when the recommendation frame selection meets the user requirement, the classifier automatically stores the user behavior and the recommendation result into a data sample library, when the recommendation frame selection does not meet the user requirement, a program automatically jumps to a manual frame selection interface of the user, and simultaneously learns the user behavior and the frame selection result, so that the accuracy of automatically recommending the frame selection to the user by the classifier is improved.
In this embodiment, referring to fig. 5, the specific operation steps are as follows:
1. and when the user enters a frame selection page, the system prompts whether to start the automatic recommendation frame selection, and if not, the user jumps to manual frame selection.
2. If so, acquiring the label of the user as an input parameter.
3. And constructing a Bayesian classifier, classifying according to the input parameters, taking the probability of the attention of the user to each class as a result, and outputting the result with the maximum probability.
4. And the user judges whether the automatic recommendation accords with the reality, if so, the result is output, and the user and the result are stored in the database to complete the training sample.
5. And if not, skipping to enter a manual frame selection interface, performing manual frame selection, and storing the user and the result into a database to complete the training sample.
In summary, compared with the prior art, the method for directionally monitoring the web page provided by the application directly obtains the relevant information of the framed content by framing the web page, is simple and quick to operate, can automatically obtain the content which is the same as the framed content and is not framed on the web page, avoids multiple framing on the web page by a user, and improves the working efficiency of the user. The method can also record historical frame selection operation, judge whether the contents corresponding to different frame selections are consistent, provide the contents and related information obtained by the historical frame selection operation when the contents are consistent, avoid repeated web page crawling and resource waste, and simultaneously add a manual supervision mechanism and a manual judgment mechanism to improve the accuracy and reliability of the method.
The invention has been described above with reference to the accompanying drawings, it is obvious that the invention is not limited to the specific implementation in the above-described manner, and it is within the scope of the invention to apply the inventive concept and solution to other applications without substantial modification.

Claims (9)

1. A method for carrying out directional monitoring on a webpage is characterized in that the contents on the webpage are selected in a frame mode, each selected content in the frame mode is captured, and relevant information of each content is given, wherein the relevant information comprises a title, an abstract, a website address and a webpage text; collecting all contents in the framed selection area to form a set B, obtaining a first element B1 and a last element bn from the set B, analyzing the first element B1 and the last element bn to obtain a common parent node of the first element B1 and the last element bn, if the parent node hierarchies of the first element B1 and the last element bn are different, considering that the two elements are not of the same type, discarding the last element bn, then obtaining an element bn-1 again, analyzing the common parent node of the first element B1 and the element bn-1, and so on until an element bm which has a common parent node with B1 is found; analyzing whether the patterns of the b1 and the bm are the same, if the patterns of the b1 and the bm are different, discarding the bm element to obtain a bm-1 element again, analyzing the patterns of the b1 and the bm-1 element again, and so on until finding the elements b1 and bz which have a common pattern; respectively obtaining all father nodes of a b1 element and a bz element as list1 and listz, and comparing the same node with the maximum level of list1 and listz as node1, wherein the node1 is the nearest common father node of the b1 element and the bz element; and searching for an element having a common style with the b1 element by using the node1 node, and acquiring a set Y { b1, … …, bz }, wherein the set Y is the content required to be acquired by the user.
2. The method of claim 1, wherein the frame selection is performed by a screen shot positioning method.
3. The method as claimed in claim 1, wherein the positioning information is obtained according to the user selection area, and then the positions of all elements in the web page are compared with the positions of the user selection content, so as to primarily screen out the matching content, which is the content that the user wants to know.
4. The method for monitoring web page orientation according to claim 3, wherein during the frame selection, the coordinates of the initial point of the frame selection are recorded as (X1, Y1), the coordinates of the end point are recorded as (X2, Y2), the initial point and the end point enclose a rectangular frame selection area, the coordinates of the frame selection area are superimposed with the vertical and horizontal displacements caused by the user pulling the web page scroll bar, so as to obtain the absolute coordinate values of the initial point and the end point of the frame selection area, which are respectively (X1+ ScrollLeft, Y1+ ScrollTop) and (X2+ ScrollLeft, Y2+ ScrollTop); acquiring coordinates of each content in a webpage, and recording the coordinates of any content A as (Xa, Ya), wherein the length of the content A is W, and the width of the content A is H; and judging whether the element A on the webpage is contained in the area selected by the user frame by adopting an exclusion method, and judging that the content A is not in the frame selection area when Xa + W < X1+ ScrollLeft, or X2+ ScrollLeft < Xa, or Ya + H < Y1+ ScrollTop, or Y2+ ScrollTop < Ya, or Xa < X1+ ScrollLeft, Ya < Y1+ ScrollTop, and Xa + W > X2+ ScrollLeft, and Ya + H > Y2+ ScrollTop, or judging that the content A is in the frame selection area.
5. The method for directionally monitoring web pages as claimed in claim 1, wherein the web page source code is classified and labeled through machine learning; the track simulation of user frame selection is carried out on webpage contents concerned by the user through machine learning, user operation simulation, intelligent bid alignment and drill-down crawling, so that the contents which are not frame selected by the user and are needed by the user are crawled.
6. The method for directionally monitoring the webpage according to claim 5, wherein the real-time frame selection area of the user is compared with the historical frame selection area or the historical frame selection areas of other users to judge whether the real-time frame selection areas belong to the same frame selection area; when the judgment result shows that the two items are the same, acquiring real-time related information of the framed content according to historical framing; and when the judgment result is that the content is not the same, crawling the content which is not selected by the user in the frame and is needed by the user and the related information of the content.
7. The method of claim 6, wherein the same frame region is determined by obtaining coordinate locations of a start point and an end point (X1, Y1, X2, Y2) of the user frame as input parameters to construct an SVM classifier, classifying the frame region according to the locations of all contents between the start point and the end point in the whole webpage, and determining the same region according to the classification result.
8. The method for directionally monitoring the webpage according to claim 7, wherein the classifier is added to a user supervision mechanism, and the user judges whether the classification result is the content concerned by the user, and adds the judgment result to a training set for next training; the training set is cleaned and trained regularly, noise generated due to misoperation of a user is combined, a correct judgment result is finally stored in the training set, and the training result is called when the classifier is used, so that resource waste caused by repeated training is avoided.
9. The method for directionally monitoring the webpage according to the claim 1 or 5, wherein the machine learning, the supervised learning and the reinforcement learning are continuously performed on the directional frame selection behaviors of the user according to the identity characteristics of the user, so that the automatic recommendation frame selection is intelligently performed on the contents concerned by the user; the automatic recommendation frame selection is that a Bayesian classifier is used for carrying out classification training on user behavior data samples, when the recommendation frame selection meets the user requirements, the classifier automatically stores the user behaviors and recommendation results into a data sample library, when the recommendation frame selection does not meet the user requirements, a program automatically skips to a manual frame selection interface of a user, and simultaneously learns the user behaviors and the frame selection results, so that the accuracy of automatic recommendation frame selection of the user by the classifier is improved.
CN201811604429.6A 2018-12-26 2018-12-26 Method for directionally monitoring webpage Active CN109829092B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811604429.6A CN109829092B (en) 2018-12-26 2018-12-26 Method for directionally monitoring webpage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811604429.6A CN109829092B (en) 2018-12-26 2018-12-26 Method for directionally monitoring webpage

Publications (2)

Publication Number Publication Date
CN109829092A CN109829092A (en) 2019-05-31
CN109829092B true CN109829092B (en) 2021-05-28

Family

ID=66861232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811604429.6A Active CN109829092B (en) 2018-12-26 2018-12-26 Method for directionally monitoring webpage

Country Status (1)

Country Link
CN (1) CN109829092B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110413499B (en) * 2019-07-30 2023-12-19 秒针信息技术有限公司 Service information monitoring method, device, equipment and storage medium
CN112560403A (en) * 2019-09-26 2021-03-26 北京国双科技有限公司 Text processing method and device and electronic equipment
CN112579852B (en) * 2019-09-30 2023-01-10 厦门邑通智能科技集团有限公司 Interactive webpage data accurate acquisition method
CN113722640A (en) * 2021-08-26 2021-11-30 长沙博为软件技术股份有限公司 Method, device and medium for collecting webpage configurable items based on RPA
CN114025210B (en) * 2021-11-01 2023-02-28 深圳小湃科技有限公司 Popup shielding method, equipment, storage medium and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130106519A (en) * 2012-03-20 2013-09-30 삼성전자주식회사 Method and apparatus for managing history of web-browser
CN106897287A (en) * 2015-12-18 2017-06-27 中国电信股份有限公司 Homepage Publishing decimation in time method and the device for Homepage Publishing decimation in time
CN107943812A (en) * 2017-05-24 2018-04-20 成都明途科技有限公司 Recommend method for the news of user's centralized integration resource

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9245274B2 (en) * 2011-08-30 2016-01-26 Adobe Systems Incorporated Identifying selected dynamic content regions
CN106294482B (en) * 2015-06-04 2019-10-15 阿里巴巴集团控股有限公司 The treating method and apparatus of webpage frame selection operation
CN106326316B (en) * 2015-07-08 2022-11-29 腾讯科技(深圳)有限公司 Webpage advertisement filtering method and device
CN105138605A (en) * 2015-08-07 2015-12-09 苏州博优赞信息科技有限责任公司 User behavior real-time monitoring method based on webpage
CN107609123B (en) * 2017-09-14 2021-12-14 强春娟 Method for aggregating news presentations based on news recommendation system
CN108279966B (en) * 2018-02-13 2021-08-20 Oppo广东移动通信有限公司 Webpage screenshot method, device, terminal and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130106519A (en) * 2012-03-20 2013-09-30 삼성전자주식회사 Method and apparatus for managing history of web-browser
CN106897287A (en) * 2015-12-18 2017-06-27 中国电信股份有限公司 Homepage Publishing decimation in time method and the device for Homepage Publishing decimation in time
CN107943812A (en) * 2017-05-24 2018-04-20 成都明途科技有限公司 Recommend method for the news of user's centralized integration resource

Also Published As

Publication number Publication date
CN109829092A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
CN109829092B (en) Method for directionally monitoring webpage
US11741160B1 (en) Determining states of key performance indicators derived from machine data
US10997190B2 (en) Context-adaptive selection options in a modular visualization framework
Liu et al. Crowdsourcing construction activity analysis from jobsite video streams
CN102915237B (en) The method and system of rewrite data quality rule is required according to user application
US7730023B2 (en) Apparatus and method for strategy map validation and visualization
CN107729475B (en) Webpage element acquisition method, device, terminal and computer-readable storage medium
US20180004823A1 (en) System and method for data profile driven analytics
CN105138312B (en) A kind of table generation method and device
US20090276733A1 (en) Method, system, and graphical user interface for presenting an interactive hierarchy and indicating entry of information therein
CN112579852B (en) Interactive webpage data accurate acquisition method
US9594545B2 (en) System for displaying notification dependencies between component instances
CN110222251B (en) Service packaging method based on webpage segmentation and search algorithm
CN116127203B (en) RPA service component recommendation method and system combining page information
JP2008015709A (en) Test support program, device, and method
CN111611236A (en) Data analysis method and system
KR102543064B1 (en) System for providing manufacturing environment monitoring service based on robotic process automation
CN113139141A (en) User label extension labeling method, device, equipment and storage medium
KR100910336B1 (en) A system and method for managing the business process model which mapped the logical process and the physical process model
CN105893574A (en) Data processing method and electronic device
CN114201144A (en) Micro service system construction method, device and medium based on domain-driven design
CN114416516A (en) Test case and test script generation method, system and medium based on screenshot
US11748682B2 (en) Systems and methods for discovery of automation opportunities
CN110147477B (en) Data resource modeling extraction method, device and equipment of Web system
Mejias et al. Model-Driven User Interface Development: A Systematic Mapping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 361000 one of unit 702, No. 1, xishanwei Road, phase III Software Park, Xiamen Torch High tech Zone, Xiamen, Fujian Province

Patentee after: Xiamen Yitong Intelligent Technology Group Co.,Ltd.

Address before: B11, 4th floor, 1036 Xiahe Road, Siming District, Xiamen City, Fujian Province, 361000

Patentee before: XIAMEN ETOM SOFTWARE TECHNOLOGY Co.,Ltd.

PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method for directional monitoring of web pages

Effective date of registration: 20220816

Granted publication date: 20210528

Pledgee: Xiamen Branch of PICC

Pledgor: Xiamen Yitong Intelligent Technology Group Co.,Ltd.

Registration number: Y2022980012793

PE01 Entry into force of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Granted publication date: 20210528

Pledgee: Xiamen Branch of PICC

Pledgor: Xiamen Yitong Intelligent Technology Group Co.,Ltd.

Registration number: Y2022980012793

PC01 Cancellation of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A method for targeted monitoring of web pages

Granted publication date: 20210528

Pledgee: Agricultural Bank of China Limited Xiamen Lianqian Branch

Pledgor: Xiamen Yitong Intelligent Technology Group Co.,Ltd.

Registration number: Y2024980004722

PE01 Entry into force of the registration of the contract for pledge of patent right