CN106910195B - Webpage layout monitoring method and device - Google Patents

Webpage layout monitoring method and device Download PDF

Info

Publication number
CN106910195B
CN106910195B CN201710047524.XA CN201710047524A CN106910195B CN 106910195 B CN106910195 B CN 106910195B CN 201710047524 A CN201710047524 A CN 201710047524A CN 106910195 B CN106910195 B CN 106910195B
Authority
CN
China
Prior art keywords
webpage
normal
page
background color
target webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710047524.XA
Other languages
Chinese (zh)
Other versions
CN106910195A (en
Inventor
刘楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201710047524.XA priority Critical patent/CN106910195B/en
Publication of CN106910195A publication Critical patent/CN106910195A/en
Application granted granted Critical
Publication of CN106910195B publication Critical patent/CN106910195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for monitoring the layout of a webpage, wherein the method comprises the following steps: extracting the background color of the normal webpage, and segmenting the normal webpage by using the background color to generate a webpage template with an interested area; when the image sizes of the target webpage page and the normal webpage page are different, carrying out forward and reverse bidirectional comparison on the target webpage page according to the webpage template, and calculating the difference in the region of interest between the target webpage page and the normal webpage page so as to obtain the state of the target webpage page; and when the image sizes of the target webpage and the normal webpage are the same, carrying out one-to-one forward comparison on the target webpage according to the webpage template, and calculating the difference in the region of interest between the target webpage and the normal webpage so as to acquire the state of the target webpage. The invention also discloses a device for monitoring the layout of the webpage. The invention can replace manual work, realize all-weather automatic monitoring of the webpage and save a large amount of manpower and material resources.

Description

Webpage layout monitoring method and device
Technical Field
The invention relates to the technical field of image processing, in particular to a method and a device for monitoring webpage layout.
Background
In the internet era, networks become the most important media, and a large number of users browse each webpage of the internet every day to acquire massive information. Therefore, the quality of each website webpage becomes an important index for measuring the user experience. The quality of the web page not only includes subjective evaluation criteria such as UI design, convenience of information acquisition and the like, but also includes objective evaluation criteria such as web page stability and the like, and if problems such as display errors occur at a user end due to problems of web page codes, very poor user experience can be caused. At present, the updating speed of the web page is very fast, and the web page cannot be ensured to display various errors due to human negligence and the like in each updating process. If the problems are monitored manually, all the web pages need to be monitored continuously for 7x24 hours manually, and the method is time-consuming and labor-consuming.
Disclosure of Invention
The invention mainly aims to provide a method and a device for monitoring webpage layout, and aims to solve the technical problems that in the prior art, all webpages need to be monitored continuously by manpower, and time and labor are consumed.
In order to achieve the above object, the invention provides a method for monitoring a layout of a web page, comprising the following steps:
extracting the background color of a normal webpage, and segmenting the normal webpage by using the background color to generate a webpage template with an interested area;
when the image sizes of a target webpage and the normal webpage are different, carrying out forward and reverse bidirectional comparison on the target webpage according to the webpage template, and calculating the difference in an interested area between the target webpage and the normal webpage so as to obtain the state of the target webpage;
and when the image sizes of the target webpage and the normal webpage are the same, carrying out one-to-one forward comparison on the target webpage according to the webpage template, and calculating the difference in the region of interest between the target webpage and the normal webpage so as to obtain the state of the target webpage.
Preferably, the step of extracting a background color of a normal webpage, segmenting the normal webpage by using the background color, and generating a webpage template with an area of interest includes:
converting the input image of the normal webpage into a gray scale space or any brightness and color separation space;
performing convolution with the gray scale space or any brightness and color separation space by using a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge image Eh and a vertical edge image Ev;
projecting each row of the horizontal direction edge graph Eh in the horizontal direction to obtain a histogram Hedge in the horizontal direction;
counting a color histogram Hcolor of an image pixel P (x, y) in the histogram Hedge;
obtaining two background colors, colorbg1, colorbg2, of the image;
segmenting the image according to a first direction by utilizing the background color colorbg 1;
performing second-direction segmentation on each segmented region, and segmenting the image according to the first direction by using the background color colorbg1 again for each segmented region in the second direction to obtain a plurality of interested regions; wherein the first direction is a horizontal direction, and the second direction is a vertical direction; or the first direction is a vertical direction, and the second direction is a horizontal direction;
comparing any pixel P (x, y) in each interested area with the background color colorbg1, setting the pixel P (x, y) as a statistical range if the pixel P (x, y) is colorbg1, and otherwise setting the pixel P (x, y) as a non-statistical range; and setting the pixel position in the non-interesting area as a non-statistical range to obtain the final webpage template with the interesting area.
Preferably, the step of extracting a background color of a normal webpage, segmenting the normal webpage by using the background color, and generating a webpage template having an area of interest further includes:
judging whether the pixels P (x, y) in the first line of the character part in the normal webpage page image all meet P (x, y) ═ colorbg2, if so, checking the next line, otherwise, recording P (x, y) as a first starting position;
when any pixel P (x, y) in the next row satisfies P (x, y) ═ colorbg2, then the first end position is recorded;
repeating the steps to obtain a second starting position and a second ending position, and stopping searching after the second starting position and the second ending position are obtained;
and setting the line from the first starting position to the first ending position and the line from the second starting position to the second ending position as the region of interest.
Preferably, when the image sizes of the target webpage page and the normal webpage page are different, the step of comparing the target webpage page with the normal webpage page in a forward and reverse direction according to the webpage template, and calculating the difference in the region of interest between the target webpage page and the normal webpage page to obtain the state of the target webpage page comprises the following steps:
if the height of the target webpage page is smaller than the normal webpage page height, calculating the difference between the color of the target webpage page pixel and the background color colorbg1 at the first interested area position of the target webpage;
calculating the difference between the target webpage page pixel color and the background color colorbg1 in each interested area in each line from top to bottom by using horizontal line units, and judging the problem type according to a preset rule;
and stopping the forward comparison when the problem is found in the forward comparison, and starting the reverse comparison.
Preferably, when the size of the image of the target webpage is the same as that of the image of the normal webpage, the step of performing one-to-one forward comparison on the target webpage according to the webpage template, and calculating the difference in the interest area between the target webpage and the normal webpage to obtain the state of the target webpage comprises the following steps:
calculating the difference between the color of the target webpage page pixel and the background color colorbg1 in each interested area of each line by using a horizontal line unit, and judging the problem type according to a preset rule;
and if no problem is found in the comparison, performing a second comparison on each interested area in the horizontal line, and recording the difference between the color of the target webpage page pixel and the background color colorbg2 in each interested area.
The invention also provides a web page layout monitoring device, which comprises:
the normal webpage template generating module is used for extracting the background color of a normal webpage, and segmenting the normal webpage by using the background color to generate a webpage template with an interested area;
the non-equal-size page comparison module is used for carrying out forward and reverse bidirectional comparison on a target webpage according to the webpage template when the image sizes of the target webpage and the normal webpage are different, and calculating the difference in an interested area between the target webpage and the normal webpage so as to obtain the state of the target webpage;
and the page comparison module with the same size is used for carrying out one-to-one forward comparison on the target webpage according to the webpage template when the image sizes of the target webpage and the normal webpage are the same, and calculating the difference in the region of interest between the target webpage and the normal webpage so as to acquire the state of the target webpage.
Preferably, the normal webpage template generating module includes:
the conversion unit is used for converting the input image of the normal webpage into a gray scale space or any brightness and color separation space;
the operation unit is used for carrying out convolution on the gray scale space or any brightness color separation space by utilizing a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge image Eh and a vertical edge image Ev;
the histogram acquisition unit is used for projecting each line of the horizontal direction edge graph Eh in the horizontal direction to obtain a horizontal direction histogram Hedge;
the statistical unit is used for counting a color histogram Hcolor of an image pixel P (x, y) in the histogram Hedge;
a background color obtaining unit for obtaining two background colors, colorbg1, colorbg2, of the image;
a cutting unit, configured to segment the image according to a first direction by using the background color colorbg 1;
the interesting region acquiring unit is used for segmenting each segmented region in the second direction, and segmenting the image in the first direction by using the background color colorbg1 again in each segmented region in the second direction to obtain a plurality of interesting regions; wherein the first direction is a horizontal direction, and the second direction is a vertical direction; or the first direction is a vertical direction, and the second direction is a horizontal direction
A template value setting unit, configured to compare an arbitrary pixel P (x, y) in each region of interest with a background color colorbg1, and set this as a statistical range if P (x, y) is colorbg1, otherwise set as a non-statistical range; setting the pixel position in the non-interested region as a non-statistical range to obtain the final webpage template with the interested region
Preferably, the normal webpage template generating module further includes:
a text header determining unit, configured to determine whether all pixels P (x, y) in a first line of a text portion in the normal webpage image satisfy P (x, y) ═ colorbg2, if so, check a next line, and otherwise, record P (x, y) as a first start position;
when any pixel P (x, y) in the next row satisfies P (x, y) ═ colorbg2, then the first end position is recorded;
repeating the steps to obtain a second starting position and a second ending position, and stopping searching after obtaining the second starting position and the second ending position;
and setting the line from the first starting position to the first ending position and the line from the second starting position to the second ending position as the region of interest.
Preferably, the non-equal size page alignment module is configured to:
if the height of the target webpage page is smaller than the normal webpage page height, calculating the difference between the color of the target webpage page pixel and the background color colorbg1 at the first interested area position of the target webpage;
calculating the difference between the target webpage page pixel color and the background color colorbg1 in each interested area in each line from top to bottom by using horizontal line units, and judging the problem type according to a preset rule;
and stopping the forward comparison when the problem is found in the forward comparison, and starting the reverse comparison.
Preferably, the equivalent size page comparison module is configured to:
calculating the difference between the color of the target webpage page pixel and the background color colorbg1 in each interested area of each line by using a horizontal line unit, and judging the problem type according to a preset rule;
and if no problem is found in the comparison, performing a second comparison on each interested area in the horizontal line, and recording the difference between the color of the target webpage page pixel and the background color colorbg2 in each interested area.
The webpage layout monitoring method provided by the invention can automatically generate the template of the webpage only according to the normal webpage image without manually providing additional information, enhances the universality and the automation degree of the algorithm, can automatically analyze the image generation template without manual intervention, and can process various types of webpages. Can replace manual work, realize 7x24 hours's automation and monitor the webpage, save a large amount of manpower and materials.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for monitoring a layout of a web page according to an embodiment of the present invention;
FIG. 2 is a schematic flowchart illustrating steps of generating a web page template having an area of interest in the web page layout monitoring method according to the present invention;
FIG. 3 is a schematic diagram illustrating the style and effect of a horizontal histogram according to the present invention;
FIG. 4 is a schematic diagram of a web page template and a normal web page of a region of interest according to an exemplary embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating a process of comparing non-equal-size pages in an embodiment of a method for monitoring a layout of a web page according to the present invention;
FIG. 6 is a schematic diagram illustrating a process of comparing non-equal-size pages in an embodiment of a method for monitoring a layout of a web page according to the present invention;
FIG. 7 is a schematic diagram illustrating a process of comparing pages of equal size in an embodiment of a method for monitoring a layout of a web page according to the present invention;
FIG. 8 is a schematic block diagram of an embodiment of a device for monitoring a layout of a web page according to the present invention;
fig. 9 is a schematic structural diagram of a normal webpage template generating module in an embodiment of the device for monitoring webpage layout according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a method for monitoring a webpage layout, and referring to fig. 1, in an embodiment, the method for monitoring the webpage layout comprises the following steps:
step S10, extracting the background color of a normal webpage, and segmenting the normal webpage by using the background color to generate a webpage template with an interested area;
in the embodiment of the invention, the segmentation of the normal webpage page by using the background color can be understood as the screenshot of the normal webpage page, the region of interest refers to a target region to be compared, and the number of the regions of interest can be an image region or a character region.
Step S20, when the image sizes of the target webpage page and the normal webpage page are different, the target webpage page is subjected to forward and reverse bidirectional comparison according to the webpage template, and the difference in the region of interest between the target webpage page and the normal webpage page is calculated to obtain the state of the target webpage page.
In the embodiment of the present invention, after the webpage template is generated, each region of the target webpage is compared with the region of interest of the webpage template, and the target webpage can be classified into: and specifying the position of the problem in the target webpage according to seven types of problems of normality, disordered style, missing of a whole column, missing of the content of the whole column, missing of pictures, unloaded pictures and missing of characters.
Step S30, when the image sizes of the target webpage and the normal webpage are the same, the target webpage is subjected to one-to-one forward comparison according to the webpage template, and the difference in the region of interest between the target webpage and the normal webpage is calculated to obtain the state of the target webpage.
The webpage layout monitoring method provided by the invention can automatically generate the template of the webpage only according to the normal webpage image without manually providing additional information, enhances the universality and the automation degree of the algorithm, can automatically analyze the image generation template without manual intervention, and can process various types of webpages. Can replace manual work, realize 7x 24's automation and monitor the webpage, save a large amount of manpower and materials.
Referring to fig. 2, in a preferred embodiment of the present invention, the step S10 includes:
step S11, converting the input image of the normal webpage into a gray scale space or any brightness and color separation space;
specifically, the input image may be converted from an RGB color space to a gray scale/or any luminance color separation space (e.g., YUV, HSV, HSL, LAB), and the formula for the gray scale space is: gray ═ R0.299 + G0.587 + B0.114; for the luminance color separation space, taking HSL as an example, the conversion formula of luminance l (luminance) is: l ═ max (R, G, B) + min (R, G, B))/2.
Step S12, performing convolution with the gray scale space or any brightness color separation space by using a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge image Eh and a vertical edge image Ev;
performing convolution with the gray scale space or any brightness and color separation space by using a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge image Eh and a vertical edge image Ev; the Sobel operator is taken as an example of the edge gradient operators in the horizontal direction and the vertical direction, and other operators are also applicable.
Step S13, projecting each line of the horizontal direction edge graph Eh in the horizontal direction to obtain a histogram Hedge in the horizontal direction;
to exclude the influence of the edge in the vertical direction, only the number of horizontal edges is counted, that is, only the edge of Eh (x, y) > Th1& & Ev (x, y) < Th2 is counted for any point (x, y) on the image.
Step S14, counting the color histogram Hcolor of the image pixel P (x, y) in the histogram Hedge;
counting a color histogram Hcolor of a pixel P (x, y) in the image, the pixel is included in the Hcolor statistics of the color histogram if and only if Hedge [ y ] is 0 in the horizontal edge histogram of P (x, y). In the embodiment of the invention, please refer to fig. 3 for the style and effect of the histogram.
Step S15, obtaining two background colors, namely colorbg1 and colorbg2, of the image;
two main background colors, colorbg1 and colorbg2, of the image are obtained by finding the position which enables the largest position in the Hcolor to be used as the colorbg1 and the position which enables the second largest position in the Hcolor to be used as the colorbg2, wherein the physical meanings of the two main background colors are the background color of the whole page and the background color of the frame around the picture.
Step S16, segmenting the image according to the first direction by using the background color colorbg 1;
in the present invention, the first direction may be a horizontal direction or a vertical direction, and the second direction may be a vertical direction or a vertical direction, and the technical solution of the present invention will be described in detail below by taking the first direction as the horizontal direction and the second direction as the vertical direction as examples. For example, if the image is horizontally cut with the background main color colorbg1, if all the pixels P (x, y) of the next line are equal to colorbg1, but the present line does not meet the condition, this position is taken as the division start position. If all the pixels P (x, y) of each line are equal to colorbg1 but the next line is not eligible, the image-level pre-division position is obtained by step S16, taking this position as the division end position. And segmenting the image to obtain a plurality of horizontally segmented regions.
Step S17, performing second-direction segmentation on each segmented region, and segmenting the image according to the first direction by using the background color colorbg1 again for each segmented region in the second direction to obtain a plurality of interested regions; wherein the first direction is a horizontal direction, and the second direction is a vertical direction; or the first direction is a vertical direction, and the second direction is a horizontal direction;
for example, if the division in the vertical direction is performed for each horizontally divided region and all the pixels P (x, y) in the next column are equal to colorbg1, but this column does not meet the condition, this position is taken as the division start position. If all the pixels P (x, y) of each column are equal to colorbg1 but the next column is not eligible, the vertical pre-segmentation position of each horizontal segmentation region can be obtained by step 7, taking this position as the segmentation end position. And dividing the horizontal dividing region to obtain a plurality of vertically divided regions.
Step S18, comparing any pixel P (x, y) in each region of interest with the background color colorbg1, setting the value as a statistical range if P (x, y) is colorbg1, otherwise setting the value as a non-statistical range; setting the pixel position in the non-interested region as a non-statistical range to obtain the final webpage template with the interested region
Comparing any pixel P (x, y) in each region of interest with colorbg1, setting the statistical range (specifically, it can be realized by setting the template value of the pixel to 0) if P (x, y) is colorbg1, otherwise setting the non-statistical range (specifically, it can be realized by setting the template value of the pixel to 255); and setting the pixel position in the non-interesting area as a non-statistical range to obtain a final webpage template with the interesting area.
In an embodiment, the step S10 may further include:
judging whether the pixels P (x, y) in the first line of the character part in the normal webpage page image all meet P (x, y) ═ colorbg2, if so, checking the next line, otherwise, recording P (x, y) as a first starting position;
when any pixel P (x, y) in the next row satisfies P (x, y) ═ colorbg2, then the first end position is recorded; and repeating the steps to obtain a second starting position and a second ending position, stopping searching after the second starting position and the second ending position are obtained, and setting the line from the first starting position to the first ending position and the line from the second starting position to the second ending position as the interested area, namely, the found position is the appearance position of the caption.
For the region of interest obtained in the above steps, a text occurrence region is searched from the bottom, that is, for each line of pixels from the bottom, if any pixel P (x, y) therein has P (x, y) ═ colorbg2, the next line is checked, otherwise, the first start position is recorded, and any pixel P (x, y) in one line is searched upward, and P (x, y) ═ colorbg2 therein has P (x, y) ═ colorbg2 and is marked as the first end position. And in the same way, after the second starting position and the second ending position are found, stopping finding, wherein the found position is the position where the character title appears. In the embodiment of the present invention, please refer to fig. 4 for examples of a page template and a normal web page.
Referring to fig. 5, in a preferred embodiment, the step S20 includes:
step S21, if the height of the target webpage is smaller than the normal webpage height, calculating the difference between the color of the target webpage pixel and the background color colorbg1 at the first interested area position of the target webpage;
in the embodiment of the invention, if the height of the input target webpage is larger than the height of the normal webpage or the widths of the input target webpage and the normal webpage are not equal, the output style is disordered, and the algorithm is ended. If the height of the input target webpage is smaller than the normal webpage height, the difference between the pixel color of the target webpage and the background color colorbg1 at the position of the first interested area (the title bar in the webpage) of the webpage is calculated. The difference is defined as follows: and the number of target webpage page pixels P (x, y) in the region of interest is equal to the number of colorbg 1. If the difference is larger than a certain threshold value, the wrong mode pattern is output, the coordinates of the area are output, the algorithm is ended, and otherwise, the step S22 is executed.
Step S22, calculating the difference between the target webpage page pixel color and the background color colorbg1 in each interested area in each line from top to bottom by using horizontal line units, and judging the problem type according to a preset rule;
in the embodiment of the invention, the problem can be as follows:
(1) diff > Thhigh, the region position is recorded, and the problem of the missing of the content of the whole column occurs in the recording.
(2) diff > Thmedia, region position, recording image missing problem.
(3) diff > Thlow, the number of areas where this problem occurred is recorded.
Step S23, stopping the forward comparison when the problem is found in the forward comparison, and starting the reverse comparison;
and (3) after the comparison of the regions of each line is finished, if all the regions of the line have the problems in the step (1), determining that the column is missing, terminating the forward comparison, carrying out reverse comparison, and continuing to search for other problems, otherwise, turning to the step S22 until the comparison of all the lines is finished. And during reverse comparison, the interested areas in the last line of the template are compared upwards, the difference between the pixel color of the target webpage page and the background color colorbg1 in each interested area in each line is calculated, the calculation method is the same as the step S22, and the problem is judged according to the rule. After the comparison of the regions in each row is completed, if the problems in (1) appear in all the regions, the comparison is determined to be missing, the reverse comparison is terminated, and all the error problems are output, otherwise, the step S23 is switched to continue the comparison of each row. And (4) if the problem in the step (3) appears in each line after comparison, the output pattern is disordered. And finally outputting all error conditions. In the embodiment of the present invention, the output process of the comparison and the error condition can be seen in fig. 6.
Referring to fig. 7, in an embodiment, the step S30 may include:
step S31, calculating the difference between the color of the target webpage page pixel and the background color colorbg1 in each interested area of each line according to the horizontal line unit, and judging the problem type according to the preset rule;
for example, in horizontal line units, the difference between the target webpage page pixel color and the background color colorbg1 in each region of interest of each line is calculated, and according to the rule, what kind of problem occurs is judged:
(1) diff > Thhigh, the region position is recorded, and the problem of the missing of the content of the whole column occurs in the recording.
(2) diff > Thmedia, region position, recording image missing problem.
(3) diff > Thlow, the number of areas where this problem occurred is recorded.
If the problem described in (1) occurs in all the rows, the recording error problem is the column missing. If the complete ROI is not aligned, the procedure continues to step S31 until all alignments are completed. And if the problems in 1- (3) occur in all the interested areas in all the rows after the comparison is finished, outputting the disordered pattern and outputting each error position, otherwise, turning to the step S32.
And step S32, if no problem is found in the comparison, performing a second comparison on each interested area in the horizontal line, and recording the difference between the color of the target webpage page pixel and the background color colorbg2 in each interested area.
And if no problem is found in the comparison, performing a second comparison on each region of interest in the horizontal line, recording the difference diff between the pixel color of the target webpage page and the background color colorbg2 in each region of interest, and if the diff is greater than Thhigh, recording the position of the region, and recording the occurrence of the picture unloaded problem. And recording the appearance position of each caption, recording the difference diff between the pixel color of the target webpage and the background color colorbg2, and recording the area position if the diff is greater than Thhigh, wherein the problem of caption missing is recorded. If no problem is found, the output webpage is a normal webpage, and the algorithm is ended.
The invention also provides a device for monitoring the layout of the webpage, which is used for realizing the method. The web page layout monitoring device of the present invention is implemented by a computer program, and the implementation process thereof refers to the embodiments shown in fig. 1 to 7, and the functions and principles of the modules may correspond to the steps in the embodiments, which are not described in detail herein. Referring to fig. 8, in an embodiment, the apparatus for monitoring a layout of a webpage includes:
the normal webpage template generating module 10 is used for extracting the background color of a normal webpage, and segmenting the normal webpage by using the background color to generate a webpage template with an interested area;
a page comparison module 20 with unequal size, configured to, when the image sizes of a target webpage and the normal webpage are different, perform forward and reverse bidirectional comparison on the target webpage according to the webpage template, and calculate a difference in an area of interest between the target webpage and the normal webpage to obtain a state of the target webpage;
and the page comparison module 30 with the same size is used for performing one-to-one forward comparison on the target webpage according to the webpage template when the image sizes of the target webpage and the normal webpage are the same, and calculating the difference in the region of interest between the target webpage and the normal webpage to obtain the state of the target webpage.
Referring to fig. 9, in an embodiment, the normal web page template generating module 10 includes:
a conversion unit 11, configured to convert an input image of the normal webpage into a gray scale space or an arbitrary luminance and color separation space;
an arithmetic unit 12, configured to perform convolution with the gray scale space or any luminance-color separation space by using a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge map Eh and a vertical edge map Ev;
a histogram obtaining unit 13, configured to perform horizontal projection on each line of the horizontal edge map Eh to obtain a horizontal histogram Hedge;
a counting unit 14, configured to count a color histogram Hcolor of an image pixel P (x, y) in the histogram Hedge;
a background color obtaining unit 15 for obtaining two background colors colorbg1, colorbg2 of the image;
a cutting unit 16 for segmenting the image in a first direction by using the background color colorbg 1;
the region-of-interest obtaining unit 17 is configured to perform second-direction segmentation on each segmented region, and segment the image according to the first direction by using the background color colorbg1 again for each second-direction segmented region, so as to obtain a plurality of regions of interest; wherein the first direction is a horizontal direction, and the second direction is a vertical direction; or the first direction is a vertical direction, and the second direction is a horizontal direction
A template value setting unit 18 for comparing any pixel P (x, y) in each region of interest with the background color colorbg1, setting this as a statistical range if P (x, y) is colorbg1, otherwise setting it as a non-statistical range; setting the pixel position in the non-interested region as a non-statistical range to obtain the final webpage template with the interested region
In an embodiment, the normal webpage template generating module 10 may further include:
a title determining unit 19, configured to determine whether all pixels P (x, y) in a first row of a character part in the normal webpage image satisfy P (x, y) ═ colorbg2, if so, check a next row, otherwise, record P (x, y) as a first starting position;
when any pixel P (x, y) in the next row satisfies P (x, y) ═ colorbg2, then the first end position is recorded;
repeating the steps to obtain a second starting position and a second ending position, and stopping searching after obtaining the second starting position and the second ending position;
and setting the line from the first starting position to the first ending position and the line from the second starting position to the second ending position as the region of interest.
In one embodiment, the non-equal size page alignment module 20 is configured to:
if the height of the target webpage page is smaller than the normal webpage page height, calculating the difference between the color of the target webpage page pixel and the background color colorbg1 at the first interested area position of the target webpage;
calculating the difference between the target webpage page pixel color and the background color colorbg1 in each interested area in each line from top to bottom by using horizontal line units, and judging the problem type according to a preset rule;
and stopping the forward comparison when the problem is found in the forward comparison, and starting the reverse comparison.
In one embodiment, the equal-size page-alignment module 30 is configured to:
calculating the difference between the color of the target webpage page pixel and the background color colorbg1 in each interested area of each line by using a horizontal line unit, and judging the problem type according to a preset rule;
and if no problem is found in the comparison, performing a second comparison on each interested area in the horizontal line, and recording the difference between the color of the target webpage page pixel and the background color colorbg2 in each interested area.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (8)

1. A method for monitoring the layout of a webpage is characterized by comprising the following steps:
extracting the background color of a normal webpage, and segmenting the normal webpage by using the background color to generate a webpage template with an interested area;
when the image sizes of a target webpage and the normal webpage are different, carrying out forward and reverse bidirectional comparison on the target webpage according to the webpage template, and calculating the difference in an interested area between the target webpage and the normal webpage so as to obtain the state of the target webpage; the state of the target webpage page comprises: at least one of normal, disordered style, missing of the column, missing of the content of the column, missing of the picture, unloaded of the picture and missing of the characters;
when the image sizes of the target webpage and the normal webpage are the same, performing one-to-one forward comparison on the target webpage according to the webpage template, and calculating the difference in the region of interest between the target webpage and the normal webpage to obtain the state of the target webpage;
the steps of extracting the background color of the normal webpage, segmenting the normal webpage by using the background color and generating the webpage template with the interested area comprise:
converting the input image of the normal webpage into a gray scale space or any brightness and color separation space;
performing convolution with the gray scale space or any brightness and color separation space by using a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge image Eh and a vertical edge image Ev;
projecting each row of the horizontal direction edge graph Eh in the horizontal direction to obtain a horizontal direction histogram Hedge;
counting a color histogram Hcolor of an image pixel P (x, y) in the histogram Hedge;
obtaining two background colors, colorbg1, colorbg2, of the image;
segmenting the image according to a first direction by utilizing the background color colorbg 1;
when the image sizes of the target webpage page and the normal webpage page are different, the target webpage page is subjected to forward and reverse bidirectional comparison according to the webpage template, the difference in the region of interest between the target webpage page and the normal webpage page is calculated, and the step of obtaining the state of the target webpage page comprises the following steps:
if the height of the target webpage page is smaller than the normal webpage page height, calculating the difference between the color of the target webpage page pixel and the background color colorbg1 at the first interested area position of the target webpage;
calculating the difference between the target webpage page pixel color and the background color colorbg1 in each interested area in each line from top to bottom by using horizontal line units, and judging the problem type according to a preset rule;
and stopping the forward comparison when the problem is found in the forward comparison, and starting the reverse comparison.
2. The method for monitoring the layout of a web page as claimed in claim 1, wherein the step of extracting the background color of the normal web page, segmenting the normal web page by using the background color, and generating the web page template having the region of interest further comprises:
performing second-direction segmentation on each segmented region, and segmenting the image according to the first direction by using the background color colorbg1 again for each segmented region in the second direction to obtain a plurality of interested regions; wherein the first direction is a horizontal direction, and the second direction is a vertical direction; or the first direction is a vertical direction, and the second direction is a horizontal direction;
comparing any pixel P (x, y) in each interested area with the background color colorbg1, setting the pixel P (x, y) as a statistical range if the pixel P (x, y) is colorbg1, and otherwise setting the pixel P (x, y) as a non-statistical range; and setting the pixel position in the non-interesting area as a non-statistical range to obtain the final webpage template with the interesting area.
3. The method for monitoring the layout of a web page as claimed in claim 2, wherein the step of extracting the background color of the normal web page, segmenting the normal web page by using the background color, and generating the web page template having the region of interest further comprises:
judging whether the pixels P (x, y) in the first line of the character part in the normal webpage page image all meet P (x, y) ═ colorbg2, if so, checking the next line, otherwise, recording P (x, y) as a first starting position;
when any pixel P (x, y) in the next row satisfies P (x, y) ═ colorbg2, then the first end position is recorded;
repeating the steps to obtain a second starting position and a second ending position, and stopping searching after the second starting position and the second ending position are obtained;
and setting the line from the first starting position to the first ending position and the line from the second starting position to the second ending position as the region of interest.
4. The method for monitoring the layout of a web page according to claim 2, wherein the step of comparing the target web page with the normal web page according to the web template in a one-to-one forward direction when the image size of the target web page is the same as that of the normal web page, and calculating the difference between the target web page and the normal web page in the interest area to obtain the state of the target web page comprises:
calculating the difference between the color of the target webpage page pixel and the background color colorbg1 in each interested area of each line by using a horizontal line unit, and judging the problem type according to a preset rule;
and if no problem is found in the comparison, performing a second comparison on each interested area in the horizontal line, and recording the difference between the color of the target webpage page pixel and the background color colorbg2 in each interested area.
5. A web page layout monitoring device, the web page layout monitoring device comprising:
the normal webpage template generating module is used for extracting the background color of a normal webpage, and segmenting the normal webpage by using the background color to generate a webpage template with an interested area;
the non-equal-size page comparison module is used for carrying out forward and reverse bidirectional comparison on a target webpage according to the webpage template when the image sizes of the target webpage and the normal webpage are different, and calculating the difference in an interested area between the target webpage and the normal webpage so as to obtain the state of the target webpage; the state of the target webpage page comprises: at least one of normal, disordered style, missing of the column, missing of the content of the column, missing of the picture, unloaded of the picture and missing of the characters;
the page comparison module with the same size is used for carrying out one-to-one forward comparison on the target webpage according to the webpage template when the image sizes of the target webpage and the normal webpage are the same, and calculating the difference in the region of interest between the target webpage and the normal webpage so as to obtain the state of the target webpage;
the normal webpage template generating module comprises:
the conversion unit is used for converting the input image of the normal webpage into a gray scale space or any brightness and color separation space;
the operation unit is used for carrying out convolution on the gray scale space or any brightness color separation space by utilizing a horizontal direction edge gradient operator and a vertical direction edge gradient operator to obtain a horizontal edge image Eh and a vertical edge image Ev;
the histogram acquisition unit is used for projecting each line of the horizontal direction edge graph Eh in the horizontal direction to obtain a horizontal direction histogram Hedge;
the statistical unit is used for counting a color histogram Hcolor of an image pixel P (x, y) in the histogram Hedge;
a background color obtaining unit for obtaining two background colors, colorbg1, colorbg2, of the image;
a cutting unit, configured to segment the image according to a first direction by using the background color colorbg 1;
the page comparison module with the non-equivalent size is used for calculating the difference between the color of the target webpage page pixel and the background color colorbg1 at the first interested area position of the target webpage if the height of the target webpage page is smaller than the normal webpage page height;
calculating the difference between the target webpage page pixel color and the background color colorbg1 in each interested area in each line from top to bottom by using horizontal line units, and judging the problem type according to a preset rule;
and stopping the forward comparison when the problem is found in the forward comparison, and starting the reverse comparison.
6. The apparatus for monitoring the layout of web pages as claimed in claim 5, wherein said normal web page template generating module further comprises:
the interesting region acquiring unit is used for segmenting each segmented region in the second direction, and segmenting the image in the first direction by using the background color colorbg1 again in each segmented region in the second direction to obtain a plurality of interesting regions; wherein the first direction is a horizontal direction, and the second direction is a vertical direction; or the first direction is a vertical direction, and the second direction is a horizontal direction;
a template value setting unit, configured to compare an arbitrary pixel P (x, y) in each region of interest with a background color colorbg1, and set this as a statistical range if P (x, y) is colorbg1, otherwise set as a non-statistical range; and setting the pixel position in the non-interesting area as a non-statistical range to obtain the final webpage template with the interesting area.
7. The apparatus for monitoring the layout of web pages as claimed in claim 6, wherein said normal web page template generating module further comprises:
a text header determining unit, configured to determine whether all pixels P (x, y) in a first line of a text portion in the normal webpage image satisfy P (x, y) ═ colorbg2, if so, check a next line, and otherwise, record P (x, y) as a first start position;
when any pixel P (x, y) in the next row satisfies P (x, y) ═ colorbg2, then the first end position is recorded;
repeating the steps to obtain a second starting position and a second ending position, and stopping searching after obtaining the second starting position and the second ending position;
and setting the line from the first starting position to the first ending position and the line from the second starting position to the second ending position as the region of interest.
8. The apparatus for monitoring the layout of a web page of claim 6, wherein the page-of-equal-size comparison module is configured to:
calculating the difference between the color of the target webpage page pixel and the background color colorbg1 in each interested area of each line by using a horizontal line unit, and judging the problem type according to a preset rule;
and if no problem is found in the comparison, performing a second comparison on each interested area in the horizontal line, and recording the difference between the color of the target webpage page pixel and the background color colorbg2 in each interested area.
CN201710047524.XA 2017-01-22 2017-01-22 Webpage layout monitoring method and device Active CN106910195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710047524.XA CN106910195B (en) 2017-01-22 2017-01-22 Webpage layout monitoring method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710047524.XA CN106910195B (en) 2017-01-22 2017-01-22 Webpage layout monitoring method and device

Publications (2)

Publication Number Publication Date
CN106910195A CN106910195A (en) 2017-06-30
CN106910195B true CN106910195B (en) 2020-06-16

Family

ID=59206823

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710047524.XA Active CN106910195B (en) 2017-01-22 2017-01-22 Webpage layout monitoring method and device

Country Status (1)

Country Link
CN (1) CN106910195B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107368690B (en) * 2017-08-09 2022-01-18 贵阳朗玛信息技术股份有限公司 Medical image picture preprocessing method and device
CN111124721B (en) * 2018-10-31 2023-05-05 阿里巴巴集团控股有限公司 Webpage processing method and device and electronic equipment
CN110955369B (en) * 2019-11-19 2022-04-01 广东智媒云图科技股份有限公司 Focus judgment method, device and equipment based on click position and storage medium
CN112036147B (en) * 2020-08-28 2024-01-30 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for converting picture into webpage
CN112651942B (en) * 2020-12-28 2023-04-07 三星电子(中国)研发中心 Layout detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1752979A (en) * 2004-09-23 2006-03-29 捷讯研究有限公司 Web browser graphical user interface and method for implementing same
CN101331473A (en) * 2005-12-07 2008-12-24 三维实验室公司 Methods for manipulating web pages
CN101433075A (en) * 2006-04-28 2009-05-13 伊斯曼柯达公司 Generating a bitonal image from a scanned colour image
CN104036262A (en) * 2014-06-30 2014-09-10 南京富士通南大软件技术有限公司 Method and system for screening and recognizing LPR license plate
CN106227823A (en) * 2016-07-21 2016-12-14 知几科技(深圳)有限公司 A kind of webpage update detection method, info web capture and rendering method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1752979A (en) * 2004-09-23 2006-03-29 捷讯研究有限公司 Web browser graphical user interface and method for implementing same
CN101331473A (en) * 2005-12-07 2008-12-24 三维实验室公司 Methods for manipulating web pages
CN101433075A (en) * 2006-04-28 2009-05-13 伊斯曼柯达公司 Generating a bitonal image from a scanned colour image
CN104036262A (en) * 2014-06-30 2014-09-10 南京富士通南大软件技术有限公司 Method and system for screening and recognizing LPR license plate
CN106227823A (en) * 2016-07-21 2016-12-14 知几科技(深圳)有限公司 A kind of webpage update detection method, info web capture and rendering method

Also Published As

Publication number Publication date
CN106910195A (en) 2017-06-30

Similar Documents

Publication Publication Date Title
CN106910195B (en) Webpage layout monitoring method and device
US10853565B2 (en) Method and device for positioning table in PDF document
CN110363102B (en) Object identification processing method and device for PDF (Portable document Format) file
CN109886928B (en) Target cell marking method, device, storage medium and terminal equipment
CN110781839A (en) Sliding window-based small and medium target identification method in large-size image
CN106202086B (en) Picture processing and obtaining method, device and system
CN104616275A (en) Defect detecting method and defect detecting device
CN103593120B (en) The method and apparatus that when sectional drawing, screenshot box is adjacent to the border of area-of-interest
CN110399842B (en) Video processing method and device, electronic equipment and computer readable storage medium
WO2017088479A1 (en) Method of identifying digital on-screen graphic and device
CN104573675A (en) Operating image displaying method and device
CN113222913B (en) Circuit board defect detection positioning method, device and storage medium
US11443504B2 (en) Image box filtering for optical character recognition
CN108121648B (en) Interface error monitoring method
CN110120039B (en) Screen detection method, screen detection device, electronic equipment and readable storage medium
CN113902740A (en) Construction method of image blurring degree evaluation model
CN110688889A (en) Image-text content comparison method and device
CN107292892B (en) Video frame image segmentation method and device
CN110458202B (en) Picture processing method, device and equipment and computer readable storage medium
US20160125614A1 (en) Information processing method and electronic device
CN113221742B (en) Video split screen line determining method, device, electronic equipment, medium and program product
CN112084103A (en) Interface test method, device, equipment and medium
CN113807410B (en) Image recognition method and device and electronic equipment
US8086020B2 (en) System and method for analyzing impurities of an object
CN114399645A (en) Multi-mode data expansion method, system, medium, computer equipment and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant